guanine and cytosine — the A, T, G and C make up the DNA code’ is the most
basic concept of molecular biology known to every person related to this field.
But this is not the whole story which was unraveled with time due to the rise
of epigenetics in the past decade. Modifications of the base cytosine in the
DNA double helix into 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC),
5-formylcytosine (5fC) and many other variants have been identified and studied
for their role in regulation of gene expression. Biologists
revealed these additional characters in the mammalian DNA code, paving way for
an entirely new arena of research related to these variances and its connection
to epigenetics which is the study of heritable changes in gene function which
do not involve changes in the DNA sequence.
These modifications then
became an area of paramount interest, creating several speculations regarding
their numerous roles in the genome and consecutive experimentation to support
the beliefs. The most interesting comment was on how these methylated cytosines,
believed to be transient products, might be stable DNA modification in mammals,
giving the world some new nucleotides additional to the four accepted ones.
This can change the basics of genetics that has been taught since an eternity
now. Even though this issue is yet to be entirely proven with appropriate
evidences, there are already some impressive works and results which might
change the accepted facts.
This review revolves
around some of these supposedly new nucleotides produced by methylation of
cytosine in the DNA and their role in epigenetics along with some speculations
regarding their stability in the genome.
Discovery of DNA modifications:
A lot of reasons lead to the discovery of
variants of cytosine in the DNA of organisms in the first place. Several
investigators, observing the absence of regeneration and mitosis in adult
neurons, suggested a difference in the genetic make-up of brain from other
tissues which needed to be deciphered. 10.. Another contributing event was
reprogramming of somatic cells or nuclei through induced pluripotent stem cell
(iPS) generation, cell fusion or somatic cell nuclear transfer (SCNT). After
thorough inspection, the failure to carry out these techniques was attributed
to absence of proper methylation patterns in the DNA of new cells. 5.. Thus,
to obtain reasons for the difference in the genetic make-up within an
individual and to formulate potential molecular pathways implicated in DNA
methylation and demethylation, extensive research was initiated.
Upon thorough studies and experimentations, it
was confirmed that the DNA of all mammalian cells and tissues is methylated at
specific loci, mainly in the 5?-cytosine-phosphate-guanine-3? (CpG) site, to control
the expression of genes. In the genomic DNA of mouse embryonic stem (mES) cells
and several adult mouse tissues, using S-adenosyl methionine (SAM) as a source
of the methyl group, 5-methylcytosine (5mC) is produced from cytosine (C) by
DNA methyltransferases. Further hydroxylation of 5-methylcytosine (5mC), catalyzed
by ten-eleven translocation (TET) enzyme, produces 5-hydroxymethylcytosine
(5hmC) with global levels ranging between 0.005% and 0.7% of all Cs. The
iron(II)- and 2-oxoglutarate-dependent TET enzymes can also oxidize 5hmC
5-formylcytosine (5fC) and 5-carboxycytosine (5caC), which were found at levels
below 0.002% of all Cs. 11.. The latter two are excised by thymine DNA glycosylase
(TDG) 2.., followed by the activation of base excision repair (BER) to
restore an unmodified base, defining the demethylation aspect in mammals.
Despite this, these are found in the genome, creating some doubts of some
additional functionality. 5.. Figure 1 gives a basic flow of how the cytosine methylation and
hydroxylation. Also, some recently discovered DNA base modifications in adenine
and thymine are illustrated.
The presence of these transient oxidative
products in the genome in-spite of the repair mechanisms intrigued the
biologists for further exploration. It made them believe that there was more to
the role for these oxidized cytosine bases than just serve as intermediates of
enzyme-mediated DNA demethylation initiated by oxidation of 5mC. Thus,
techniques of sequencing and mapping these methylations was were used for
analysis. Issues related to their low abundance, transient nature and
similarity between chemical structures made the identification, mapping and
study of characteristics of these variants a tedious task. But with time and
innovation, this was overcome by a number of methods discovered with increasing
efficiency and specificity.
The very first basic method
for global assessment of DNA content was thin layer chromatography (TLC) which
has undergone a lot of modifications since its first use like combination with radio-labelling.
Other methods that followed are antibody-based techniques like immunofluorescence
staining but with uncertainty in the sensitivity and cross-reactivity, use of
T4-?-glucosyltransferase (T4-BGT) and liquid chromatography (LC) coupled with
mass spectroscopy (MS). LC/MC was considered the gold standard till the under-estimated
flaw of ion suppression occurrence came into picture 5… These techniques
helped in the global DNA assessment, but to understand the mechanism of dynamic
balance between cytosine methylation and demethylation, novel single-cell,
single base resolution techniques were needed to be developed. With advent of
technology, this was made possible with methods like CLEVER sequencing
(chemical-labelling-enabled C-to-T conversion sequencing) 2.., bisulfite
sequencing (BS-seq)- based methods 8.. and nano high- performance liquid
chromatography–tandem high-resolution mass spectrometry (Nano HPLC-MS/HRMS)
analysis of single-base resolution methylation maps by BS-seq methods showed a
very high amount of variation in the methylation of CpG sites among different
animal species. This opened a new door beyond mammalian methylomes, defining
three major categories:
1. Mammalian methylomes –
These have a prevalence of methylated sites in their genome with the rest of
the region required for the binding of active regulatory proteins. Humans, for
instance, have more than 80% of CpG nucleotides methylated into various
oxidative states. The default state of genome appears to be “methylated”.
2. Honeybee methylome –
Here, the default state appears to be “unmethylated”. They have 60,000 CpG
cites only which is enriched with exons.
3. Absence of cytosine
methylation patterns has been observed in some organisms such as S. cerevisiae, C.elegans and D. melanogaster. This indicates that
5mC is not essential in any development processes in these organisms.
2 illustrates the
whole-genome bisulfite sequencing analysis of these three methylomes showing
the ubiquitous, sporadic and absence of cytosine methylation pattern in them.
These methods of sequencing
and mapping aided in getting a better insight into the functionality of the
newly discovered cytosine-derived nucleotides. They were then studies for their
roles and stability in the mammalian genome.
The genome of mammals contains epigenetic marks
which maintain the inheritable information of gene functions and are accessible
to either transcription factors and activators, or repressor complex recruiting
proteins resulting in a closed chromatin structure that prevents activated
transcription. The methylation of the carbon-5 of cytosine to 5-methylcytosine
(5Mc, Figure 1) at
a CpG site is an example of such a mark, preventing transcription when present
near gene regulatory regions by modulating the binding of specific 5mC-binding
proteins hiring co-repressor complexes to methylated sites.
The key feature of cytosine methylation is the
homology of methylation marks, i.e. the presence of methylation marks on both
the strands of the paternal DNA and its intact passage through DNA replication,
confirming the stability and heritability of epigenetic information. It is said
to be enriched with “symmetric” CpG dinucleotides which basically permits the
inheritance of methylation patterns through DNA replication and is successfully
carried out by the Dnmt1 DNA methyltransferase. An unusual observation made
during studying this modification at CpG sites was the rare occurrence of some
non-CpG methylation in mouse embryonic stem cells (mESCs). On working with
this, it was enlisted that in ESCs non-CpG methylation correlates with gene
expression, whereas in the neurons it is the inverse due to the recruitment of
the methyl-CpG binding protein 2 (MeCP2) 12..
Due to the high abundance of methylation at
CpG sites, 5-methylcytosine has been termed the “fifth base” of the human
genome. 12.. It represents an epigenetic modi?cation that plays a fundamental
role in embryonic development, transcriptional regulation 3.. and numerous
biological phenomena, such as genetic imprinting, genetic silencing,
and X chromosome inactivation 8… In some cases, it is also considered as an
origin of mutations, termed as a mutation hotspot, in CpG dinucleotides as a
result of spontaneous hydrolytic deamination of 5mC to thymine. 3..
While, the classical epigenetic mark,
5-methylcytosine, was being exploited for its abundance and role in
epigenetics, discovery of additional cytosine modifications led to an increased
interest in this field. The discovery of
enzymes that catalyse the hydroxylation of 5-methylcytosine to
5-hydroxymethylcytosine, believed to be the sixth nucleotide, gave rise to a
new epigenetic mark associated with activated transcription 12..
This novel epigenetic DNA modification emerged
with the discovery of the catalytic dioxygenase activity of Ten eleven
translocation (Tet) proteins which hydroxylated 5-methylcytosine (5mC) to 5-hydroxymethylcytosine
(5hmC, Figure 1)
which was originally discovered in mammalian DNA. Its prevalence was already
established earlier in bacteriophages but due to less sensitive equipments, it
took years to identify this oddity in mammals.
Cytosine hydroxy methylation
levels are often around 0.1% in mammalian tissues, but can vary greatly, with
highest values in the brain, where up to 1% of the cytosines can be
hydroxymethylated. The three mammalian Tet homologues generate 5hmC from
existing 5mC, which they can further process to 5- formylcytosine (5fC) and
5-carboxylcytosine (5caC, Figure 1).
12.. The positional abundance of 5-hmC, contribution to epigenetics and
further oxidation into new products was explored using experimentation on
Table1. Composition of rat DNA
Formic acid hydrolysates were subjected to
chromatography, phosphorus determinations were performed. Determinations were
made on two preparations.
The maximum content of 5-hmC exists in the
brain was confirmed by performing formic acid hydrolysis and two-dimensional
chromatography of DNA components of the rat genome 10… Identification of
5-hydroxymethylcytosine from the chromatographic products was based on its
identity with standards by spectrophotometric analysis. The standard separated
poorly from cytosine but the extinction peaks at 261 nm and 276nm respectively,
sharply differentiated the two compounds. The results from this analysis is
given in Table 1. It
can be inferred that the sum of the molar percentages of cytosine and
5-hydroxymethylcytosine is required for reasonable correspondence to the values
for guanine, indicating that the 5-hydroxymethylcytosine is in fact a DNA
component. The brain DNA content revealed the presence of
5-hydroxymethylcytosine which constituted 15% of the total cytosine bases.
Identical results were also observed for mouse and frog brain analysis. Application
of the same preparative method to rat liver and rat spleen gave a similar DNA
fraction, although in low yield. The presence of 5-hydroxymethylcytosine in DNA
suggested an examination of RNA for this base. A very high percentage of
5-hydroxymethylcytosine appeared to be present in the crude RNA fraction of
The discovery of 5-hmC modifications is a
recent phenomenon due to the thorough study of variance in behavior of neurons
of an adult brain and other tissue cells and to formulate proper re-programming
techniques as mentioned in the very start. Previous failure to observe
5-hydroxymethylcytosine in DNA preparations from animal tissue can be attributed
to the indigenous and basic chromatographic systems which failed to distinguish
between cytosine and 5-hydroxymethylcytosine due to minimal differences. The
new high-throughput and sensitive methods allows proper identification of this
5hmC was also found as a relative stable base
at a subset of mammalian promoters and active enhancers indicating some role in
mediating epigenetic regulation 12.. The
role of 5hmC as an active mark was supported by mass spectrometric analyses of
isotope labelled DNA form mammalian cell culture and mice showing that 5hmC is
mostly a stable modification and not a transient intermediate, hence the
possible sixth nucleotide. 12..
Thus, with all this evidence and comparison
with bacteriophage DNA depicted some kind of a trend giving rise to a new
possibility of a glucosylated DNA preparations suggesting a similarity to the
of bacteriophage DNA. Thus the high concentration of 5-hydroxymethylcytosine
in brain nucleic acids is thought to have a relation to the central nervous
system’s dependence on glucose for primary metabolic processes. 5-methylcytosine
as well as 5-hydroxymethylcytosine and other oxidative products are lost by
current techniques for preparing DNA from brain. Eventually, there is a
possibility that the native structure could be more complex than current
concept suggests.10.. This is what lead to the identification of 5-formylcytosine
(5fC), 5-carboxylcytosine (5caC), 5-hydroxymethyluracil (5hmU) and N6-methyladenine
as some new nucleotides to be considered.
While mapping the 5hmC regions, a subset of
marked regions showed the presence of 5fC, suggesting its role as an
independent epigenetic mark which engaged researchers on a new task to explore the
stability of 5fC.12..
Single-cell 5fC sequencing technique called CLEVER-seq
(chemical-labelling-enabled C-to-T conversion sequencing) was used to map the
development or presence of 5fC sites in the genome of various stages pre- and
post-implantation of embryo to study its dynamics. It was observed that these
methylated sites are inherited as well as newly generated after fertilization.
2..Also, these sites show a high level of heterogeneity except at the
promoters and exons. In these regions they show the least heterogeneity
establishing DNA demethylation activity and thus upregulation of gene expression.
5-formylcytosine (5fC) is a rare base found in
mammalian DNA which was originally thought to be involved just in active DNA
demethylation. But apart from this, studies have directed towards the stability
of this modification in the genome as a controlling factor of epigenetics.
Also, it is assumed to have a higher integration in the DNA than 5hmC levels.
This was experimentally proven by monitoring developmental
dynamics of 5fC and 5hmC levels in mouse with results suggesting that 5fC has
functional roles in DNA that go beyond being a demethylation intermediate
11… It included DNA quantification of these rare modifications (5fC and
5caC) with the highest possible sensitivity and accuracy, employing a nano
high- performance liquid chromatography–tandem high-resolution mass
spectrometry (Nano HPLC-MS/HRMS) method, which is able to resolve genuine rare modified
bases from potential impurities of the same nominal mass and retention time,
and can detect down to 0.1 ppm of total Cs in as little as 100 ng of digested
genomic DNA along with stable isotope labeling in vivo which substantially
improved the quality of the measurements, ensuring excellent reproducibility
between technical replicates and excluding spontaneous oxidation of 5hmC as an
additional source of 5fC or 5caC and giving
The experimentation initially involved
analyzing global levels of all cytosine modifications in the genomic DNA of
C57BL/6 mouse (inbred strain of laboratory mouse) tissues to establish a
relationship between 5fC and its precursors 5mC and 5hmC or its metabolite
5caC. A range of postnatal tissues from newborn (1 d old), adolescent
(21-d-old) and adult (15-week-old) mice were under study, along with embryos at
11.5 d post-fertilization and mES cells for comparision. 11..
5mC and 5hmC were found to be present in all
tissues. 5fC was also present in all studied tissues at levels ranging between
0.2 p.p.m. and 15 p.p.m. of all Cs. Whereas, 5caC was not detected in any
postnatal tissue. Overall, no possible link between the levels of 5fC and the
levels of its precursors 5mC or 5hmC was noticed. In turn it was observed that
the mechanism varied with different tissues. They can retain the levels of 5fC
while gaining 5hmC (e.g., brain), lose 5fC while retaining the levels of 5hmC
(e.g., heart) or even lose 5fC while gaining 5hmC 11..
To elucidate the stability of 5fC in genomic
DNA toward turnover in vivo, The isotope-labelled oxidative products gave an
output as labeling ratios which change according to the dynamics and half-life
of the given modification in the genomic DNA 11.. The absence of 5fC in the
brain, where 5fC is most abundant, indicates minimal or no further generation
of 5fC once placed in postmitotic tissues. Moreover, if 5fC was involved in
cycles of methylation and demethylation, its labeling ratio would be similar to
that of 5mC in RNA. This can be summarized as, even though the production of
5fC depends on Tet3, the removal of 5fC is independent of TDG. The
possibilities of 5fC absence can be attributed to either further oxidation into
5caC or its deformylation and decarboxylation. 2..
The further study of 5fC modification revealed its role in forming a mutational hotspot at CpG
dinucleotides, i.e. it can induce G·C to A ·T transition mutations
during in vitro DNA replication as a result of spontaneous hydrolytic
deamination. The X-ray crystal structure of DNA containing 5fC shows that 5fC
alters the helical coiling and trajectory of the canonical B-form of DNA 37. It
can arrest RNA polymerase II transcription elongation, thereby reducing the
number of transcripts produced 38. Base pairings of guanine with imino
tautomers of 5fC display a G·T mismatch geometry 3…
Aside from the most
efficient way of getting rid of 5hmC and salvaging C, the trace molecules 5fC
and 5caC are adopted in the DNA methylation interplay, which means those trace
elements have a high possibility of functionality, rather than acting as mere
intermediates. it is tempting to speculate on their specific function as
determinants in lineage specification in embryonic brain development. This
topic of research has recently sprung up and with more experimental evidence,
5fC may prove to be the new, stable nucleotide in the DNA sequence.
Recently, some unusual
variances were observed in the thymine and adenine bases of the genetic code.
With intense studies in these developments, we might probably get the seventh,
eight or innumerable new stable DNA bases.
Very recently, it has been shown that Tet
proteins can also oxidize thymine to 5-hydroxymethyluracil (5hmU,
Figure 1). Tet-dependent 5hmU has been shown to have similarities with
5caC in matters of abundance, distribution according to tissue and age and
functions as a protein recruiter. Thus, with this relation, a correlation that
5hmU also might be a contributor to epigenetics is speculated. Apart from this
new role, it is natively destined to get targeted by Smug1 DNA glycosylase when
base paired to adenine and undergo base excision repair or promote active
demethylation by recruiting repair factors to Tet targets in absence of Smug1.
Most recently, N6-methyladenine was described
as an additional eukaryotic DNA modification with epigenetic regulatory
potential. Interestingly, this modification is found to be present in genomes
that lack canonical cytosine methylation patterns (type three methylomes
probably), suggesting independent functions. This newfound diversity of DNA
modification and its potential for combinatorial interactions it yet to be fully
studied and understood12…
Specific readers for 5hmC, 5fC and 5caC have
been identified that function in transcription regulation and chromatin
remodeling, mostly promoting the active state. In addition, 5fC, 5caC and 5hmU
might primarily function in the recruitment of DNA repair-associated complexes
and thus enhance demethylation. Finally, these marks might also directly
contribute to gene regulation by triggering “scheduled” DNA repair, which has
been suggested to be coupled with activated transcription. The discovery of 6mA
in eukaryotes recently identified an additional methylation mark 12..
One of the biggest
questions is why vertebrates do not have 5mC-speci?c glycosylases similar to
Dme and Ros1 that are found in plants and can directly excise 5mC ef?ciently.
One possible explanation could be that 5mC oxidized products create additional
epigenetic codes. These codes, in turn, allow for a diverse layer of
regulation. However, it should be noted that generation of 5fC, 5caC, or a
single-stranded gap during excision repair could expose cells to deleterious effects
unless the processes are completed perfectly and with high ?delity. In this
aspect, it would appear that acquiring these epigenetic codes poses some risks
for vertebrates. 3..
The cytosine is constantly
undergoing a cycle of methylation and demethylation which plays a role in the
regulation of gene expression. Although passive demethylation occurs mainly
through a DNA replication dependent process, active demethylation is achieved
through several players in a DNA replication-independent manner 8..
The chemical modification of DNA bases plays a
key role in epigenetic gene regulation. 12..
In particular, given the
highly enriched level of 5-hmC in brain relative to many other tissues and cell
types (for example, in Purkinje cells of the cerebellum, 5-hmC is approximately
40% as abundant as 5-mC), here we highlight the potential functional roles of
this cytosine modification, and others, in brain development.