Variations in low complexity regions of monkeypox genome impact virus transmissibility

In a recent study published on bioRxiv* preprint server, researchers investigated whether low complexity regions (LCRs) were more responsible for the current clade IIb 2022 monkeypox virus (MPXV) outbreak rather than single nucleotide polymorphisms (SNPs) .

Study: Changes in a new type of genomic accordion may open the palette to increased transmissibility of monkeypox. Image Credit: Studio MIA/Shutterstock


The ongoing MPXV outbreak is due to infection with the MPXV subclade IIb. Unlike MPX cases caused by clade I and clade IIa-MPXV, the current prognosis for the outbreak is largely favorable, despite considerably more efficient human-to-human MPXV transmission. MPXV evolved due to host selective pressures and losses of host-interacting genes. To date, there have been unsatisfactory genomic explanations for the SNPs explaining the increased transmissibility of MPXV.

About the study

The present study sought to determine whether CSF variations were primarily responsible for the MPXV genome alterations and the unexpected epidemiology of the MPXV 2022 outbreak.

Lineage II subclade B.1 MPXV sequences were assembled de novo using a mapping method that involved the use of shotgun metagenomics and short-read sequencing of ribonucleic acid (RNA) and deoxyribonucleic acid (DNA) extracted from swabs of patient vesicular lesions MPX diagnosed between May 18 and July 14, 2022 in Spain.

LCR resolution was performed based on reference genome mapping, and silicone analyzes were performed. The results were applied to publicly available MPXV NCBI (National Center for Biotechnology Information) SRA (Sequence Readout Archive) datasets (n=35) of single molecule raw reads. To determine the actual LCR sequence, a combination of different sequencing strategies was used.

The distribution of CRLs among different poxvirus orthologous major functional protein (OPG) gene clusters and the extent of diversity among the 21 identified CRLs were compared. For all CRLs in all data samples, allele frequencies were characterized and comparatively assessed.


An MPXV HQS 353R genome that represented the current MPXV outbreak of 2022 was accurately determined based on CSF variations with significant STR variations in CSF. In the MPXV genome, the LCR entropy was significantly higher than the SNP entropy. In silico analyzes indicated that the expression, translation, stability or function of MPXV OPG 153, 204 and 208 could be affected by the genomic evolutionary accordion involving rhythmic expansions and contractions of the genome.

A total of 48 MPXV genome sequences were determined with ≥10X read depth and 39,697,742 HQ reads for each swab. One contig, two contigs, three contigs, and one contig belonged to MPXV with 101%, 97%, and 97% coverage, respectively, of the MPXV-M5312_HM12_Rivers sequence. A total of 21 LCRs were identified with pairs of LCRs 10/11 and 1/4 having similar copies in the reverse complementary formations.

LCR3 contained a TR with ATAT [ACATTATAT]n sequence, and analysis indicated that n = 52. None of the publicly available MPXV orthopoxvirus genome sequences had such a long TR. Four MPXV genome sequences of lineage IIb clade B.1 from the current 2022 MPXV outbreak showed n=54 to 62 LCR3 repeats, and the number of STRs differentiated sequences from lineage IIb genome sequences under -clade A (12–42 STRs in LCR3 regions), indicating high genetic variability in LCR3.

Similarly, STR differences specific to the MPXV subclade IIb lineage were detected in the 1/4 LCR pair. The pair contained an STR with the [AACTAACTTATGACTT]n sequence, and the results of the analysis indicated that n = 16. LCR3 appeared to have increased length since viral spillover, whereas the length of the 1/4 LCR pair appeared to have decreased, behaving like a genomic accordion with time.

The genomic sequences MPXV_USA_2022_MA001 and 353R of lineage II of subclade B.1 had 67 SNPs against the reference isolate sequences of lineage II of subclade A. In addition, the 353R HQS contained two other pairs of SNPs on inverted repeats (ITR) on the right and left, resulting in a stop codon of the OPG015 gene. The MPXV_USA_2022_MA001 and 353R sequences also differed by two base indels (insertions-deletions) located at positions 077.133 and 273.173, corresponding to the differences in the LCR2 region and the LCR5 region, respectively.

The 353R HQS differed by 1338 base pairs (bp) and 1342 bp in genomic lengths from the MPXV M5312_HM12_Rivers and MPXV_USA_2022_MA001 sequences, respectively. MPXV CRLs were non-randomly distributed with significant purifying selection strength against the introduction of CRLs into conserved core sites. In 353R HQS, LCR regions 2, 5, 7, 10, 11, and 21 showed within-host genomic diversity, with entropy values ​​between 0.2 and 1.7, with significantly greater variety in CRLs than in SNPs. The average Euclidean distance between samples for LCRs ranged between 0.1 (LCR21) and 0.7 (LCR2), and LCR differences showed statistical significance.

The LCR 10/11 and LCR7 pair showed considerable within-host variation and prominent allelic differences between samples. LCRs 5, 6, and 7 were located in a defined central conserved site of the MPXV genome between genomic positions 130,000 and 138,000, and the site included OPG-152, 153, and 154. LCR7 was located in a functional ORF center , while LCR3 and 21 were located in the promoter/start site, likely altering the ORF start site. The region between positions 170,000 and 180,000 included CRLs 2, 3, 19, 20 and 21 and was another functional impact site.

Overall, the study results showed that most of the MPXV genomic variability occurs in CSF. Therefore, research emphasizing MPXV phenotypic differences should focus on LCR variations rather than SNP variations.

*Important Notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be considered conclusive, guide clinical practice/health-related behaviors, or treated as established information.

Sharon D. Cole