Next generation sequencing (NGS) has enabled assessment of variants in numerous genes in a single assay. However, when it comes to low-frequency variant analysis (less than 1%), true variants are difficult to be distinguished from background noise introduced during sample preparation, target enrichment and sequencing. By tagging both ends of the double-stranded DNA molecule and incorporating with the error correction algorithm, molecular identifiers enable reduced error rate and higher accuracy in low-frequency mutation analysis.
Library Preparation |
Blocking |
Target capture |
Sequencing |
NanOnCT Panel
Custom panels |
MGISEQ |
1% (0.5%) and 0.3% (0.15%) variants were mimicked by spiking plasma cfDNA from one healthy donor into that of the unrelated other at 1% and 0.3% proportion. 10 ng and 25 ng cfDNA mixture were used for BMI library preparation. Targeted SNPs were enriched by in-house panels and sequenced on DNBSEQ-G400 (PE100) (Fig. 1). Bi-molecular Identifiers (BMIs) allow the error correcting algorithms to filter out false positive calls.
Fig 1. Low-frequency variant model.
Table 1. Different consensus reads filtering setting and requirements
Algorithms |
Description |
No BMI |
Without bi-molecular identifier |
SSCS |
Single strand consensus sequence |
DCS211 |
Duplex consensus sequence with both top and bottom reads ≥ 1 |
DCS633 |
Duplex consensus sequence with both top and bottom reads ≥ 3 |
DCS211 (≥ 2) |
≥ 2 DCS211 reads supporting the variant |
DCS633 (≥ 2) |
≥ 2 DCS633 reads supporting the variant |
Fig 2. A. Sensitivity and B. positive predictive value for 1% and 0.5%; C. sensitivity and D. positive predictive value for 0.3% and 0.15%.
Table 2. Analysis of low-frequency mutations by incorporating NanOnCT Panel v1.0 and molecular identifiers
Variants |
MGISEQ-G400 |
|
SSCS |
DCS211 |
|
EGFR_L858R |
0.75% |
0.82% |
EGFR_T790M |
1.29% |
1.38% |
EGFR_delE746_A750 |
0.93% |
1.04% |
PIK3CA_E545K |
0.92% |
0.8% |
KRAS_G12D |
0.57% |
0.71% |
KRAS_A146T |
1.01% |
0.67% |
NRAS_Q61K |
1.31% |
0.89% |
EGFR_insV769_D770 |
1.37% |
1.3% |
The DNA libraries were prepared from 1% AF (GeneWell, GW-OCTM009). Libraries were prepared using NadPrep cfDNA Library Preparation Kit. The enriched libraries were sequenced on DNBSEQ-G400 (PE100) to an average raw depth of ~ 30,000.
SSCS: Single strand consensus sequence;
DCS211: Duplex consensus sequence.
Fig 3. Coverage cfDNA library with BMI.The DNA library was prepared from 10 ng plasma cfDNA using NadPrep cfDNA library kits for MGI with BMI adapters. The enriched library was sequenced on DNBSEQ-G400 (PE100) to an average raw depth of ~ 30,000.
SSCS (< 3): Single strand consensus sequence, family size < 3;
SSCS (≥ 3): Single strand consensus sequence, family size ≥ 3;
DCS211: Duplex consensus sequences.