Long-Read Epigenetics with Microbiome Standards

Is m6A the new 5-mC?

Of the several chemical and structural modifications to DNA that are known to influence gene expression, 5-methylcytosine (5-mC), has garnered the most attention because of its role in transcriptional silencing and the readily available techniques to investigate it. For instance, bisulfite sequencing via short-read Next Generation Sequencing (NGS) has become the gold-standard for 5-mC detection due to its single-nucleotide resolution and the ever-decreasing cost of whole genome sequencing. The pairing of bisulfite to NGS has greatly expanded our understanding on the extent of cytosine methylation influence on gene expression.

But, the application of NGS technologies to analyze the genomes of complex microbial communities (e.g. microbiome and metagenomic samples) has caused other DNA modifications to gain more attention. For example, N6-methyladenine (m6A) plays important roles in bacterial survival and interactions with hosts 1. Like other epigenetic base modifications, m6A contributes to the regulation of gene expression as well other house-keeping functions in bacteria. Unfortunately, methylation detection by short-read bisulfite sequencing is based on cytosine to uracil conversion and cannot detect adenine modifications, such as m6A.

This is why many researchers are turning to long-read sequencing platforms (3rd generation sequencing technology), since they use differences in electronic signals as bases to pass through nanopores and detect other types of base modifications. Like any new technology, methylation detection with 3rd generation sequencing needed benchmarking with the use of well-defined standards and controls.

Not one, not two, but four sequencing platforms!

While the use of one sequencing platform is sufficient for most studies, some researchers incorporate two sequencing methods when verifying a new bioinformatics tool. However, to validate a new N6- methyladenine (m6A) detection tool named mCaller, McIntyre, et al. sequenced the ZymoBIOMICS Microbial Community Standard using PacBio, Oxford Nanopore, MeDIP-seq and whole-genome bisulfite sequencing 1. Single-molecule sequencing techniques, such as PacBio 2 and Oxford Nanopore Technologies (ONT) 3 have been used to detect m6A, but no method has had cross-validated results by using a well-characterized reference material on several sequencing platforms – until now.

Older m6A detection methods based on immunoprecipitation were limited in terms of nucleotide resolution, while previous single-molecule sequencing detection tools suffered from lower base modification calling (~70%) 3, 4. To improve accuracy, mCaller employs a neural network to learn and test different classifiers to detect m6A in Nanopore data generated from the E. coli MG1655 (a K-12 strain) genome.

To validate the tool, the bacterial genomes of the ZymoBIOMICS microbial community standard were sequenced via ONT, PacBio, and MeDIP-seq, and mCaller detection accuracy was compared across the different data sets. Remarkably, detection accuracy increased to 84.2% for high quality reads, and even 95.4% for single sites with at least 15x coverage, when compared to immunoprecipitation methods. Additionally, the methylome of the standard was verified with the use of the TruSeq DNA Methylation Kit.

Regulatory-Grade Genomes

As an additional control, the PacBio sequencing of the ZymoBIOMICS Microbial Community Standard was performed at two separate locations: the University of Florida and the Database for Reference Grade Microbial Sequences (FDA-ARGOS). The goal of FDA-ARGOS is to create a database of reference-grade microbial sequences available to the public. Because the sequencing of the ZymoBIOMICS Microbial Community Standards met the FDA-ARGOS quality requirements, they have been accepted as regulatory-grade genomes with the designations of FDARGOS_606 and FDAARGOS_612 under the BioProject number PRJNA477598.

“Overall, our results demonstrate a need for tool evaluation at a variety of sequence contexts, for which we propose the continued use of this well-validated microbial reference community.” – McIntyre, et al.

Team work makes the dream work

Although the ZymoBIOMICS Microbial Community Standard was developed as microbiome reference material, its well-defined and characterized composition has made it an excellent control for epigenetic sequencing. Such a cross-discipline use of reference materials and sequencing techniques demonstrates the power of inter-lab cooperation and the ability for different field leaders to push the capabilities of current technology.


[1] McIntyre ABR, Alexander N, Grigorev K, Bezden D, Sichtig H, Chiu CY, Mason CE. Single-molecule sequencing detection of N6-methyladenine in microbial reference materials. Nature Communications. 2019 10 (579).
[2] Fang G, Munera D, Friedman DI, Mandlik A, Chao MC, Banerjee O, Fengg Z, Losic B, Mahajan MC, Jabado OJ, Deikus G, Clark TA, Khai L, Murray IA, Davis BM, Keren-Pas A, Chess A, Roberts RJ, Korlach J, Turner SW, Kumar V, Waldor MK, Schadt EE. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nature Biotechnology 2012 30(12)
[3] Stoiber M, Quick J, Egan R, Lee JE, Celniker S, Neely RK, Loman N, Pennacchio LA, Brown J. De novo Identification of DNA Modifications by Genome-Guided Nanopore Signal Processing. BioRxiv 2017. doi: https://doi.org/10.1101/094672
[4] Rand AC, Jain M, Eizenga JM, Musselman-Brown A, Olsen HE, Akeson M, Paten B. Mapping DNA methylation with high-throughput nanopore sequencing. Nature Methods 2017 14(4)