Homologs of genes expressed in Caenorhabditis elegans GABAergic neurons are also found in the developing mouse forebrain

Background In an effort to identify genes that specify the mammalian forebrain, we used a comparative approach to identify mouse homologs of transcription factors expressed in developing Caenorhabditis elegans GABAergic neurons. A cell-specific microarray profiling study revealed a set of transcription factors that are highly expressed in embryonic C. elegans GABAergic neurons. Results Bioinformatic analyses identified mouse protein homologs of these selected transcripts and their expression pattern was mapped in the mouse embryonic forebrain by in situ hybridization. A review of human homologs indicates several of these genes are potential candidates in neurodevelopmental disorders. Conclusions Our comparative approach has revealed several novel candidates that may serve as future targets for studies of mammalian forebrain development.


Background
Proper forebrain patterning and cell-fate specification lay the foundation for complex behaviors. These neurodevelopmental events in large part depend on a series of gene expression refinements (reviewed in [1]) that commit cells to express certain phenotypic features that define circuit formation. Relatively subtle disturbances in development may underlie the etiology of neurodevelopmental disorders, especially when alternative cognitive phenotypes do not have an apparent malformation at the gross anatomical level. In the forebrain, cells producing γ-aminobutyric acid (GABAergic interneurons) have been implicated in neurodevelopmental disorders, including autism and schizophrenia [2][3][4]. These neurons are composed of a diverse class of cells providing a wide range of control of neural activity, and vary in neuroanatomical location, electrophysiological properties, transcriptome/proteome and innervation patterns as either local circuit or long-range projection neurons [5].
As with other cell types, the diversity of GABAergic neurons has its basis in different developmental origins, with timing and location of birth playing key roles in cell fate [1,[6][7][8].
Despite the phenotypic variety of GABAergic neurons, all use GABA as a neurotransmitter. In mammals, GABA is produced by one of two GABA-synthesizing enzymes, glutamic acid decarboxylase (GAD)65 or GAD67. These closely related enzymes are orthologs of the Caenorhabditis elegans protein UNC-25, which is found only in cells that produce GABA. Because UNC-25/GAD and other components of the GABA synthetic pathway are highly conserved, it is likely that mammalian orthologs of some of the genes that specify GABAergic cell fate in C. elegans embryogenesis may also control GABAergic fate specification during mammalian embryogenesis.
We have explored this hypothesis in an effort to define new candidates for regulating forebrain GABAergic cell fate that may be highly conserved across evolutionarily distant taxa. This discovery-based approach ( Figure 1) complements existing analyses of the transcriptomes of subpopulations of mammalian GABAergic cells [9][10][11][12][13]. Thus, by using data from the transcription profiling of GABAergic cells in embryonic C. elegans, in combination with bioinformatics analyses, we report here transcripts with sequence homologs that may also be involved in GABAergic fate specification in mammals. We focused our attention on transcripts with gene regulation ontologies. To probe the potential role of these conserved players in mammalian development, we mapped these gene products in the developing mouse forebrain, with a selective focus on the telencephalon. As a proof of principle, this strategy identified several gene products already known to play a role in the specification of forebrain GABAergic interneurons in mammals. Additionally, our approach identified several previously unexplored gene products that serve as promising candidates for future investigation of forebrain patterning.

C. elegans transcription profiling
A microarray profiling of C. elegans cells (MAPCeL) strategy was used to obtain a transcriptome profile of C. elegans GABAergic neurons [14,15]. A complete description of the methods used for this study and the GABAergic neuron expression profile will be reported elsewhere (S Barlow, L Earls, J Watson, C Spencer, K Watkins, D Miller, manuscript in preparation). Briefly, the unc-25::GFP marker was used to label C. elegans GABAergic neurons. unc-25::GFP-expressing embryos were dissociated with chitinase and cultured for 24 hours and viable unc-25::GFP labeled cells were isolated by fluorescence activated cell sorting (FACS). Total RNA was purified from both the sorted unc-25::GFP positive cells and from the reference sample of all embryonic cells. The RNAs were amplified and hybridized to the Affymetrix C. elegans array. Average signal intensities were calculated from three independent isolates of the unc-25::GFP cells and from four replicates of the reference samples. A comparison of the unc-25:: GFP and reference data sets identified 673 transcripts showing elevated expression (1.7×) in GABAergic neurons at a false discovery rate (FDR) ≤ 1% [14]).

Bioinformatics screen
Genes in the list of GABAergic enriched transcripts with Gene Ontology (GO) terms related to DNA and transcription regulation were analyzed for potential homology to mouse transcripts. Because functional homology is conserved at the protein level, we generated a list of C. elegans proteins from the list of corresponding transcripts and then used BLASTP [16] analysis available at WormBase [17] from June 2005 to November 2008 (wormbase releases WS144 to WS196) to identify the closest matching mouse protein sequence homologs. We then used this list of mouse protein homologs to generate the corresponding catalogue of mouse transcripts for in situ hybridization analysis. We did not distinguish among potential splice variants and/or protein isoforms for a given single gene locus. To further rank potential candidates, we performed BLASTP in the reverse direction; after generating the list of mouse protein sequence homologs, those proteins were used to identify the best sequence homologs in the C. elegans proteome.

Mouse care and use
Timed pregnant C57Bl6j mice were bred in-house from founders originating from Jackson Labs under protocols approved by the Institutional Animal Care and Use Committee of Vanderbilt University. Mice were maintained on a 12:12 light-dark cycle and were permitted food and water ad libitum. Noon on the day following a time-delimited overnight pairing was considered embryonic day 0.5 (E0.5). Pregnant females were readily identifiable at E14.5 and were deeply anesthetized with isofluorane vapors followed by rapid decapitation in order to harvest embryos. Expression patterns of genes at this fetal age were analyzed because it is a mid-point in the age-range for cortical GABAergic neuron production and migration in the mouse forebrain [8]. Thus, we hypothesized that expression patterns related to GABAergic neuron specification and differentiation likely would be apparent at this age.

Riboprobe labeling
I.M.A.G.E. clones were obtained from ATCC (Manassas, VA, USA) and Open Biosystems (Huntsville, AL, USA) for the mouse transcripts (Additional file 1). The identity of each I.M.A.G.E. clone was confirmed by sequencing at the Vanderbilt DNA Sequencing Facility. When necessary, due to cDNA size or the plasmid vector, we subcloned the I.M.A.G.E. clone into a separate vector (Additional file 2). These subclones were also sequenced to confirm identity and orientation. Plasmids were linearized and transcribed using T7, Sp6 or T3 polymerase (Promega, Madison, WI, USA) depending on the plasmid vector, by standard methods. Digoxigenin-11-uridine-5'-triphosphate (0.35 mM; Roche, Indianapolis, IN, USA) was included in the transcription reaction to allow for non-radioactive colorimetric detection of transcripts.

In situ hybridization
Fetuses at E14.5 were harvested into cold phosphatebuffered saline and crown-rump length (11 to 12 mm) confirmed. Whole heads or microdissected brains were immersion fixed for 24 hours in 4% formaldehyde in 0.156 M NaH 2 PO 4 , 0.107 M NaOH, pH 7.12 with HCl. After fixation, brains were cryoprotected in graded 10, 20 and 30% sucrose in phosphate-buffered saline followed by embedding in TFM Tissue Freezing Medium (Triangle Biomedical Sciences, Inc., Durham, NC, USA) over liquid nitrogen. Brains were stored at -80°C until cryostat sectioning into 6 series at 20 microns each. Slides containing the tissue were stored at -80°C until they were fixed, acetylated and dehydrated, and then returned to -80°C until in situ hybridization was performed. In situ hybridization was performed on a Tecan Evo 150 (Tecan Group Ltd, Männendorf, Switzerland) following the Allen Brain Atlas [18] and GenePaint [19] protocols (Additional files 3 and 4). After the machine completed the described protocol, BCIP and NBT (Roche) were applied manually. The time in color development ranged from 30 minutes to 4 hours. After color development, the slides were rinsed four times with double distilled water and then twice with 4% formaldehyde. Slides were removed from the machine, dehydrated through a series of alcohols and coverslipped with VectaMount (Vector Laboratories, Burlingame, CA, USA).

Light microscopy
Microscopy was performed using an Axioplan II microscope (Zeiss, Jena, Germany), and micrographs were acquired with a Zeiss AxioCam HRc camera (Zeiss) in

Results
Genes expressed in C. elegans GABAergic neurons C. elegans embryonic GABAergic neurons were profiled by the MAPCeL approach in which unc-25::GFP labeled cells were isolated by FACS for microarray analysis. Comparison to a reference data set obtained from all embryonic cells revealed 673 transcripts with enriched (1.7×) expression in GABAergic neurons. Strong enrichment of established GABAergic neuron markers, such as unc-25 (glutamic acid decarboxylase; 61×), unc-47 (vesicular GABA transporter; 7×) and acr-9 (nicotinic acetylcholine receptor; 25×) [20,21] indicate that other transcripts in this data set are also likely to be highly expressed in embryonic C. elegans GABAergic neurons in vivo (S Barlow, L Earls, J Watson, C Spencer, K Watkins, D Miller, manuscript in preparation). Seventy five percent of the highly expressed transcripts had defined gene ontologies and of those, 17 transcripts (2.5%) in this list met criteria for DNA regulation-related gene ontologies (Table 1).

Bioinformatics assessment of mouse homologs
The original list of 17 C. elegans candidate transcription factors was used to identify 68 mouse homologs by BLASTP with an expectation cut off of ≤ E-3 ( Table 2). The average number of mouse homologs was 3.8 for each C. elegans protein, with a mode of 3, a minimum of 2 and a maximum of 8 sequence homologs. Because of the similarity among certain C. elegans transcripts, three mouse proteins (Hnf4A, Hnf4G and Ezh2) appeared on the list more than once. When considering these duplications, there were 62 unique gene products to pursue for expression analysis. This analytical strategy appears to be suitable for identifying neurodevelopmental candidates, as we found that several mouse orthologs with homology to C. elegans transcripts have a known role in forebrain patterning. In particular, genes with selective roles in determining GABAergic phenotype in mammals were identified, including known players in the forebrain (Nkx2. 1 [22], Arx [23], Cux2 [24]), midbrain (Pitx2 [25]) and spinal cord (Cux2 [26]).
Performing the reverse BLASTP from mouse proteins to worm proteins informed the strength of the sequence homology for the mouse and worm proteins relative to the other potential homologues in C. elegans. This reverse BLASTP can help rank-order candidates for further functional assessment in the future. If the reverse BLASTP returned the original C. elegans as the hit with the highest E-value, then 'yes' was entered in the R BLASTP column in Table 2. If the reverse BLASTP had a different C. elegans protein as the top hit, then a value of 'no' was entered in Table 2. Of the 68 mouse proteins, 22 had the original worm protein as the top reciprocal hit for sequence homology in the reverse BLASTP.

In situ hybridization mapping of mouse sequence homologs
Our criterion for potential relevance of mouse gene products in the specification of telencephalic interneurons was that transcripts must be present in known GABAergic proliferative zones (such as the medial, lateral and caudal subdivisions of the ganglionic eminence), although they need not be exclusively expressed in those brain areas. Representative expression patterns are depicted in Figure 2 with complete results summarized in Table 3. In addition to the expression data generated here, other sources for assessment and/or confirmation of expression were used, including GenePaint [19], Brain Gene Expression Map (BGEM) [27] and the Allen Brain Atlas [18].
Of the 62 unique transcripts, 57 have sufficient data to ascertain brain expression (Table 3). Of these, 52 (91%) exhibited brain expression. We narrowed our focus to known areas of cortical interneuron generation, migration and maturation, particularly the ganglionic eminences. In particular, we closely examined the proliferative ventricular zone (VZ), subventricular zone (SVZ), mantle of the subpallium and the pallium. A majority (38 of 52, 73%) of transcripts from our list were detected in the VZ, although this expression was not restricted to ventral proliferative zones. Rather, these transcripts were more broadly expressed throughout the dorsal and ventral VZ. Sixty percent (31 of 52) of transcripts were expressed in the cortex, 35% (18 of 52) in the mantle and 33% (17 of 52) in the SVZ. Expression patterns that included multiple embryonic histogenic forebrain areas were evident for the majority of transcripts.
We observed three general patterns of expression (Table 3 and Figure 2): pattern 1, expression throughout the forebrain (for example, Ctnnb1, Tcfap4); pattern 2, expression in post-mitotic cells based on location in the mantle zone and cortical plate (for example, Cux2, Fox1, Myst3); and pattern 3, expression mainly in proliferative zones (for example, Hist1h1a, Ncl, Ezh2, Suv39h1). For patterns 2 and 3, expression was generally mosaic and limited to subsets of cells. Although more rare, we did observe expression of some transcripts in discrete areas, such as the well known pattern of Nkx2.1 in the medial ganglionic eminence (MGE; Figure 2) and Pitx2 (data not shown) in discrete nuclei outside of established forebrain GABAergic proliferative zones.

OMIM and disease linkage meta-analysis
Human orthologs of the mouse genes were identified through NCBI Homologene. Only one mouse gene, Refbp2, does not yet have an identified human ortholog. Manual pBLAST of non-redundant protein entries also revealed no significant human homology to mouse Refbp2. The genes identified in this work are scattered throughout the human genome ( Figure 3; Additional file 5).
In order to assess any potential bias in the distribution of the homologs, we tallied the genes on each chromosome as a percentage of the genes in this study. We then compared those fractions with the distribution of  Pitx1

C130039O16Rik
- all the genes in the genome (data were obtained from NCBI Homo Sapiens build 37.1). A difference score of observed-expected was calculated for each chromosome. We then standardized the difference scores and estimated confidence intervals (degrees of freedom 23). In general, the human homologs of transcripts enriched in worm embryonic GABAergic cells were distributed evenly throughout the genome. The only exception was chromosome 14, in which the standardized difference score fell outside of the 98% confidence interval. Chromosome 6 was just inside the 95% confidence interval, although several of the genes (IP6K3, TAF11, TRERF1, RXRB, HIST1H1A) cluster near 6p21, a known site of suppressed recombination [28]. This region is associated with reading disability [29] and schizophrenia [30]. To identify known diseases or disorders associated with the identified genes from the C. elegans screen, each human gene was used as a search term in Online Mendelian Inheritance in Man (OMIM). Of the 62 transcripts, 17 had OMIM entries. Of these, only three were relevant to neurocognitive phenotypes (Table 4). Mutations in ARX are causal for X-linked mental retardation [31], PHOX2B mutations are associated with congenital central hypoventilation syndrome [32], and mutations in NKX2.1 are associated with congenital chorea [33].
In addition to OMIM analysis, we surveyed the literature for gene association studies that may implicate any of the genes identified in this study with neurocognitive disruption as evident in autism spectrum disorders (ASDs), mental retardation, schizophrenia, seizure disorders or bipolar disorder. These findings are presented in Table 4. ARX (reviewed in [34]) is the best-known contributor to phenotypic disturbances among the transcription factors in our list. A2BP1 (human FOX1) appears to have a similar level of pleiotropy. While A2BP1 is relatively understudied, it has been associated with ASD [35], mental retardation and seizure activity [36].
Finally, the hypothesis that disturbances in GABAergic interneurons may play a role in ASD, combined with the emerging interest in endophenotype analysis in trait genetics in ASD, prompted a comparison of the 62 genes to chromosomal regions associated with ASD endophenotypes, rather than association with full ASD diagnosis. Specifically, we relied on summarized evidence from the literature of chromosomal association with autism endophenotype data reviewed by Losh et al. [37]. The chromosomal positions of selected genes are presented in Table 4 along with the associated autism endophenotypes for those chromosomal positions. There are several potential candidates for further analysis of autism endophenotypes. In particular, EZH2 stands out, as it is located at 7q35-36, within a replicated linkage peak for ASD genetics, including language, communication and developmental regression endophenotypes [38][39][40]. Additionally, A2BP1(FOX1) is included in a chromosomal position associated with autism [35,36].

Discussion
In this report, we adopted a conservation-based bioinformatic approach to identify potential molecular regulators of GABAergic identity in the mammalian telencephalon. GFP-marked GABAergic neurons from the nematode, C. elegans, were isolated by FACS for microarray profiling. These data revealed enrichment (≥ 1.7×) of 17 transcripts encoding conserved proteins with potential roles in gene regulation in the nematode. BLASTP of these C. elegans proteins identified mouse homologs and 62 independent transcripts corresponding to these mammalian transcription factors were assessed for expression in E14.5 mouse brain. The data generated in our comparative strategy revealed several highly conserved players in GABAergic interneuron differentiation, including Arx, Nkx2.1 and Cux2 [22][23][24]. The positive identification of these transcripts supports the utility of our bioinformatic approach as a productive strategy for identifying conserved determinants of neuronal fate. Of the reciprocal BLASTP top hits, 14 unique transcripts showed relevant in situ hybridization patterns for telencephalic GABAergic neurogenesis, with 3 having known roles (Arx, Cux1, Cux2). Indeed, mutations in ARX have been associated with human brain function and interneuron pathology as identified in OMIM [41]. The 11 remaining top reciprocal hits with relevant expression patterns serve as novel candidate genes (Ip6k1, Ip6k2, Trerf1, C130039O16Rik, Ezh2, Taf11, Med6, Thoc4, Refbp2, Med8, Tcfap4). While not top reciprocal hits, based on striking expression pattern alone, Hist1h1a, Fox1, Myst3 and Suv39h1 warrant further attention. This is especially true as reciprocity is not a perfect predictor of candidacy, as two proteins with known function in GABAergic specification were not top reciprocal hits (NKX2.1 and beta-Catenin).
Mammalian GABAergic cells are generated in the preoptic area and ganglionic eminence of the ventral pallium during embryogenesis [8,[42][43][44]. The three main subdivisions of the ganglionic eminence-lateral (LGE), medial (MGE) and caudal (CGE)-generate a diverse portfolio of GABAergic cells. The LGE produces GABAergic projection neurons of the striatum and interneurons of the amygdala and the olfactory bulbs whereas the MGE and CGE produce the majority of cortical and striatal interneurons, although each contributes a different repertoire of cell types. Cells from the MGE (for example, Nkx2.1-expressing cells) settle in cortical layers in an inside-out fashion based on cell birth date, whereas the most ventral MGE cells generate neurons of the globus pallidus and striatal cholinergic neurons [45]. In contrast, cells from the CGE tend to migrate to upper layers, independent of birthday, and comprise 15 to 30% of all cortical interneurons [46]. It is curious that of all of the transcription factors that we mapped, Nkx2.1 was the only one that was limited to one of the three progenitor pools.
It is clear that the gene regulatory transcripts identified in our study, with the exception of Nkx2.1, do not delineate these well-known pools of progenitor populations. The absence of tissue specificity could mean that these transcription factors exercise general roles in neuronal differentiation as opposed to functioning as selective determinants of GABAergic fate. However, the broader expression beyond the boundaries of these defined progenitor zones does not preclude a role for the protein products of these transcripts in contributing to the development of a selective neuronal type. For example, these candidates may be permissive for a particular fate or act in combination with other gene products with more limited expression patterns. The data generated by our comparative approach blend with and add to the existing data on mammalian transcription factors that could play a role in the full development of GABAergic fates. There have been several efforts in mouse embryogenesis to use transcription profiling of microdissected GABAergic proliferative zones or fluorescent sorting of enhanced GFP (EGFP)positive interneurons in dissected embryonic brain. For example, Batista-Brito et al. [9] used FACS to isolate embryonic interneurons from presumptive neocortex of E13.5 and E15.5 Dlx5/6 Cre-IRIS-EGFP mice. They contrasted the transcriptomes of EGFP-positive (interneurons) and EGFP-negative cells (all other cell types) and identified several enriched transcripts, including Arx and Cux2, as in our study. Because of the region dissected, Nkx2.1 was not enriched, as its expression wanes as interneurons leave the medial ganglionic eminence. They also identified several other candidate transcription factors, including some with association with neurological disorders. Faux et al.  [9], migration [10] and maturation [12]. Indeed, contrasting mRNA pools from CGE, LGE and MGE can provide candidates for specifying interneuron subtype [13].
While the comparative approach used here has identified novel potential candidates in the specification of interneurons, there are limitations. The experimental design would not detect elements of chromatin structure or microRNAs, for example, as mechanisms of transcriptional regulation. Our analysis was limited to transcripts that encode proteins involved in gene regulation; other protein classes (for example, receptor tyrosine kinases, ion channels) could also be involved.
Moreover, the results are correlational; the expression patterns of these novel candidates overlap with areas that produce GABAergic cells, but do not show that these transcripts participate in GABA fate. Functional studies will be necessary to determine a role for these potential novel players. Additionally, while the comparative data used in this study are based on protein sequence homologies, the ultimate goal is to identify functional orthologs across species. Because true functional orthology is determined over time with experimental methods outside of the scope of this manuscript, we implore the reader to view these data as a first step on the path to identifying potential functional orthologs in conserved gene regulation networks to specify a GABAergic fate.
While this comparative approach revealed several highly conserved players in GABAergic neurogenesis, including Nkx2.1, Arx and Cux2, we failed to identify some known factors in mammalian forebrain specification, including Olig-2, although we did identify other basic helix-loop-helix (bHLH) transcription factors, such as Tcfap4. Also noticeably absent from the list were Lhx6 (lim-4 in C. elegans), Mash1 and Dlx1/2, all of which have been demonstrated to play a role in GABAergic differentiation in the mammalian forebrain. We note that a related LIM homeodomain protein, LIM-6, is required for differentiation and expression of UNC-25/GAD in a subset of C. elegans GABAergic neurons [47].
While unc-30 is the top candidate with the highest enrichment in GABAergic cells in the worm data set, none of the mammalian homologs (Pitx1, Pitx2, Pitx3) revealed expression in known GABAergic proliferative zones of the forebrain, even though there was expression in other brain areas at E14.5. Pitx2 is highly expressed in GABA neuron progenitors in diencephalon/mesencephalon [48], where it is known to drive Gad67 expression [25]. This role is also conserved in the C. elegans homolog, unc-30 [49]. In fact, both mammalian Pitx2 and C. elegans unc-30 can both be used to activate Gad67 transcription in vitro and in vivo [25]. While Pitx2 and unc-30 clearly give rise to a GABA phenotype, based on the absence of Pitx2 expression in the forebrain, there are other mechanisms that regulate GABA phenotype in the interneurons of the telencephalon. More than one type of transcription factor or combination of transcription factors likely can drive the GABAergic fate. Indeed, GABAergic fate regulation in the worm offers a striking parallel to the mouse: unc-30 drives GABAergic fate in ventral cord motor neurons but not in GABAergic motor neurons in the head where the LIM homeodomain lim-6 is required; similarly, Pitx2 is highly expressed in diencephalon/mesencephalon GABAergic progenitors and drives Gad67 expression but is not required for differentiation of forebrain GABAergic interneurons that depend on ARX. Additionally, alr-1, the worm homolog of ARX, regulates gene expression in worm GABA motor neurons [50].

Conclusions
Comparative transcription profiling across diverse taxa is a fruitful approach for generating candidate genes for brain development. Our comparative analysis has pointed to several interesting candidates for the specification of GABAergic cells in the mammalian telencephalon during embryogenesis based on their expression in regions known to produce or contain interneurons. While not exclusively expressed in these regions, Hist1h1a, Ezh2, A2bp1 (Fox1), Suv39h1 and Myst3 are all novel candidates for interneuron development. Furthermore, these candidates represent two relatively understudied classes of gene regulatory proteins in the context of interneuron development, including histone interacting proteins (Hist1h1a, Ezh2, Suv39h1 and Myst3) and RNA regulators (Fox1/A2bp1). As novel candidates for interneuron development, these transcripts may also be candidate genes for, or participate in, pathways giving rise to neurodevelopmental disorders such as autism, mental retardation and schizophrenia. Variation in function of these proteins and their interacting partners might also play a role in brain evolution. These hypotheses remain to be explored.