Genomic distribution of npmA-carrying C. difficile isolates
We identified 103 genomes harboring the npmA gene from a combined total of 1,932,812 bacterial genomes in the AllTheBacteria dataset and our dataset of 33,253 non-overlapping C. difficile genomes (Supplementary Fig. 1). Following QC filtering, 71 of these genomes met the high-quality criteria (see Methods). These were predominantly C. difficile isolates (n = 69), as detailed in Supplementary Data 1, alongside a minority of E. faecium (n = 2). We did not identify any npmA-carrying Gram-negative isolates. To date, the sole npmA1-carrying isolate is the E. coli plasmid pARS3 (AB261016.2), absent from our initial dataset. A striking observation was the predominance of the npmA2 variant, present in 66 of the C. difficile genomes, in contrast to the npmA1 variant, differing by 3 nucleotides and a single amino acid, found in only three genomes. The origin of these isolates was diverse, mainly sourced from humans (n = 42), pigs (n = 19), and environmental sources (n = 8), which underscores the cross-niche distribution of the gene. Geographically, npmA was globally distributed and isolated from a total of six countries (Fig. 1a): the UK (n = 26), Germany (n = 25), USA (n = 13), Australia (n = 2), China (n = 2), and France (n = 1). The isolates were collected over a prolonged period of 20 years (2001-2021), with a notable majority (n = 57/69) obtained between 2008 and 2021.

a Global map of npmA-carrying C. difficile isolates included. b Minimum spanning tree (MST) representing the genetic relationships among npmA-carrying C. difficile isolates based on core-genome multilocus sequence typing (cgMLST). Tips are collapsed if there are not allelic differences and colored according to the sequence type (ST), and sizes correspond to the number of isolates. Numbers on the connecting lines indicate allelic differences between them. Colors correspond to sequence types as shown in the panel legend. c Bar graphs showing the distribution of isolates according to their source (human, livestock, or environmental), geographic origin, and the year of isolation, categorized by ST. The first two bar graphs are oriented horizontally, with the number of isolates on the x-axis and the STs/countries on the y-axis. The third bar graph is oriented vertically, with the collection year on the x-axis and the number of isolates on the y-axis. Source data are provided as a Source Data file.
The MLST and cgMLST analyses of npmA-carrying C. difficile genomes uncovered limited genetic diversity. This was evident from the few predominant clusters among the 69 genomes analyzed (Fig. 1b). ST11 emerged as the most common ST, since 69.5% (n = 48) of the isolates belonged to this ST. We further demonstrated that the npmA2 gene significantly increases resistance to gentamicin (>256 mg/L) in ST11 strains (SAMEA5846557 and SAMEA5846561), relative to the npmA-negative lab strain 630Δerm (Supplementary Table 1A). Its broad geographical and ecological presence make ST11 as a pivotal carrier of npmA across niches, with the earliest npmA2-carrying ST11 isolate dating back to 2001 from a pig sample (Fig. 1c). Conversely, ST161 (n = 8 isolates) and ST36 (n = 4) exhibited a more localized distribution, as all these isolates were collected from pigs in Germany within a narrow time frame (Fig. 1c). ST54 (n = 8 isolates) showed a more diverse distribution but restricted to humans, predominantly in the USA (n = 6), but also in China (n = 1) and Australia (n = 1). The first identification of a C. difficile npmA-carrying ST54 isolate was in 2016 (Fig. 1c). There was also a single detection of one ST109 isolate from an Australian pig in 2008.
To contextualize these findings, we queried the C. difficile EnteroBase database (30,415 genomes, accessed January 15, 2025). Ribotype (RT) 078, the dominant RT within ST11, was among the most prevalent lineages (7.85%, n = 2387), whereas ST36, ST54, and ST161 represented < 1% of the dataset (Supplementary Fig. 2). npmA was not detected in other well-represented lineages (e.g., RT027, RT014, RT001, RT002), underscoring the particular significance of ST11. Source analysis indicated that ST11 and ST54 isolates were derived from multiple distinct studies (n = 13 and n = 4, respectively), locations, and timepoints (Fig. 1c). In contrast, ST161 and ST36 originated from a single German study13, likely representing a single farm population, suggesting potential under-sampling of these lineages (Supplementary Data 1).
In-depth phylogenetic analysis of major C. difficile clusters
To determine the genomic context of npmA and its potential events of integration and mobilization of this gene, we incorporated contextualized genomes lacking npmA in our analysis that shared identical core-genome alleles with npmA-carrying isolates and created maximum likelihood phylogenies for each ST (see Methods). In ST11, our study of 48 npmA-carrying isolates and 520 related genomes revealed extensive dissemination of this lineage, especially in humans and pigs, across 19 countries, particularly in the United Kingdom and Germany. The phylogenetic analysis of the ST11 cluster showed the distribution of npmA-carrying isolates across multiple clades, suggesting multiple independent npmA gene acquisition events within the ST11 lineage (Fig. 2a). Applying a 10-SNP threshold commonly used to infer recent transmission14, we identified clonal expansions within certain npmA-positive clades, indicating genetic relatedness and likely recent transmission events (Supplementary Data 1).

a Maximum likelihood phylogeny of 568 ST11 isolates (48 npmA-positive and 520 contextual genomes) based on non-recombinant core single-nucleotide polymorphisms (SNPs) across 1954 core-genome alleles. b Maximum likelihood phylogeny of 105 ST54 isolates (8 npmA-positive and 97 contextual genomes) based on non-recombinant core SNPs across 2096 core-genome alleles. Both trees are midpoint rooted. Metadata columns show isolation niche, country of origin, npmA variant and presence of mobile genetic elements Tn7734 and ICE Tn7740. Source data are provided in the Source Data file.
ST54 isolates, which were primarily sourced from human clinical specimens, were mainly restricted to China (n = 41), the United Kingdom (n = 34), and the United States (n = 21). A distinct clade of npmA2-carrying isolates in the USA, separate from sporadic npmA1 and npmA2-carrying isolates in Australia and China, indicates local circulation of npmA2 within a defined timeframe (Fig. 2b). Contrarily, ST161 and ST36 clusters exhibited a restricted ecogeographic distribution linked to pigs from Germany. The ST161 cluster was particularly restricted, with no additional related genomes identified, revealing a highly exclusive lineage. The ST36 cluster, albeit slightly more expanded, included just two additional genomes. Despite their geographical and ecological diversity, these ST36 genomes showed remarkable genetic similarity, with only 7-8 SNPs differentiating them, indicating a close genetic relatedness (Supplementary Fig. 3). There is a notable scarcity of information regarding these STs, and existing literature lacks evidence of their association with either pigs or humans.
Genomic characterization and integration dynamics of npmA2-carrying MGEs
We conducted an in-depth analysis of the genomic context of npmA2 to identify the MGEs carrying and mobilizing this gene (see Methods). We found an identical ~ 5 kb genomic region across all isolates, irrespective of ST, source, or geographic origin, flanked by two identical sequences of a newly characterized IS30 family member, named ISCld1. This arrangement suggested mobilization through a putative composite transposon, which we termed Tn7734 (Fig. 3a). Most npmA2-carrying C. difficile isolates (64 out of 66) also shared a common and broader mobilizable entity of ~ 33 kb that contained Tn7734, annotated as a complete Integrative and Conjugative Element (ICE) by ICEscreen15, designated ICE Tn7740. Intriguingly, ICE Tn7740 often co-occurred with the remnants of an Integrative and Mobilizable Element (IME) harboring aadE, an aminoglycoside nucleotidyltransferase, indicating a putative point of integration of ICE Tn7740 (Fig. 3b, c).

a Genetic context using a 5 kb upstream and downstream window to npmA2. Blocks of the same color indicate homology, and the gray blocks represent areas of non-homology within the wider dataset. b Genetic context in the 30 kb flanking regions of npmA2. The top panel displays the inverse cumulative distribution function (cdf), which represents the decay of structural similarity with distance from npmA2, helping to identify common positions where non-homologous structural variation is introduced. The bottom panel illustrates the homologous blocks and their structural arrangement. Each block is colored according to homology, and the gray blocks indicate regions of non-homology. The bar plot on the left-hand side shows the number of contigs sharing each structure, colored by the source of the isolate. c Genetic structure and comparison of the ICE Tn7740 variants. d Schematic representation of the npmA2 integration sites using as a reference C. difficile 630 (GenBank accession AM180355.1) by mapping the npmA2-carrying contigs. Key integration sites include a helix-turn-helix transcriptional regulator gene (CD630_15730, coordinates: 1,821,726-1,822,940) and a distinct site in German ST11 isolates (CD630_19010, 2,203,327-2,203,836). In addition, an integration into a metallophosphoesterase gene (CD630_06890, 833,702-835,627) in an ST11 isolate and within a helix-turn-helix transcriptional regulator gene (CD630_02920, 353,054-354,148) for ST161 and ST54 clusters, along with a unique site in the ST36 cluster within a fic family protein gene (CD630_06470, 769,216-769,530).
The structure of ICE Tn7740 varied slightly among isolates, with additional insertions of IS110, disruptions by ltrA gene or both, giving rise to diverse ICE Tn7740 variants (Fig. 3c). Core components to all ICE Tn7740 included a serine recombinase, a mobP family relaxase, and the conjugative system proteins virB4 and virD4, which allowed its classification within the Tn5252 superfamily and Tn1549 family. Adjacent to the ICE, the IME featured a mobV family relaxase, and notably, a serine recombinase, often compromised structurally due to proximity and interaction with ICE Tn7740. Using the complete ICE Tn7740 as a reference, we compared and annotated isolates with shorter contigs harboring incomplete ICE sequences, thereby assigning them to specific ICE Tn7740 variants.
The analysis revealed a consistent integration pattern of ICE Tn7740 variants and associated IMEs across various clusters, despite the diverse phylogenetic distribution of npmA2 (Fig. 2). Notably, ST11 isolates frequently showed integration into genes encoding a helix-turn-helix transcriptional regulator. A subset of German ST11 environmental isolates exhibited a distinct integration site, while a complex genetic interplay was observed with ICE Tn7740 integrated into another ICE within a metallophosphoesterase gene in one ST11 isolate. Similarly, ST161 and ST54 clusters shared an integration site within a helix-turn-helix transcriptional regulator gene, whereas the ST36 cluster presented a unique site within a fic family protein gene, highlighting the specificity of ICE integration patterns across different clusters (Fig. 3d). The analysis of the integration sites showed low information content across all positions (mean < 0.5 bits) and only a modest A/T bias at a few sites, with no discernible core motif (Supplementary Fig. 4).
Once the mosaic structure of the multiple embedded MGEs carrying npmA2 (i.e., Tn7734, the ICE Tn7740 and IME) was characterized, we established the order of the integration events that generated such structure (Fig. 4). In the C. difficile M120 chromosome (FN665653.1), we observed a helix-turn-helix transcriptional regulator protein (CDM120_RS08225, 1,726,966-1,728,177) which may have served as the original integration site of the IME carrying aadE (SAMN25516600). Subsequently, ICE Tn7740 seems to have been integrated into the serine recombinase of this IME, leading to the consistent co-localization of these two elements (Fig. 4a). The phylogenetic analysis of the ST11 cluster revealed that npmA2-negative isolates carrying the same ICE variants but lacking Tn7734 and npmA2 yet retained a copy of ISCld1 (Fig. 2a). This pattern is consistent with the behavior of IS30 family members, where npmA2 could potentially be mobilized via a circular intermediate, leaving a remnant ISCld1 from where Tn7734 was initially located (Fig. 4b). Moreover, the discovery of isolates with the ICE Tn7740 but lacking Tn7734 suggests an ancestral state of the ICE prior to the integration of Tn7734 and npmA2, highlighting the inherent mobility of Tn7734 and its pivotal role in npmA2 mobilization (Fig. 4a).

a Genetic comparison of the ICE Tn7740 integration site using npmA2-carrying and npmA-negative ST11 contextual isolates. Labels on the left of the alignment indicate C. difficile BioSample IDs. b Schematic diagram of the potential of ISCld1 to retain a copy at the integration site through the generation of a putative circular intermediate.
Genomic characterization of npmA1-carrying MGEs
We conducted a thorough analysis of the genomic surroundings flanking the npmA1 gene variant, too, identified as a rarer variant in C. difficile (n = 3, two ST11 and one ST54). The same genetic context was found in all npmA1-carrying isolates (Supplementary Fig. 5a). The npmA1 gene was found integrated within a putative complete ICE, belonging to the Tn5252 superfamily and Tn1549 family, as well as the ICE Tn7740. This putative ICE exhibited structural variations primarily due to the insertion of the ltrA gene, expanding its size to an estimated 40,000-45,000 bp (Supplementary Fig. 5b). This ICE, while sharing the same family as the ICE Tn7740 carrying npmA2, presented a divergent structural composition beyond its similar conjugative machinery (Supplementary Fig. 6). A comparative analysis with the initially described genetic context of npmA1 in the pARS3 plasmid of E. coli (AB261016.2) revealed a distinct genomic environment, suggesting disparate origins of npmA1 in C. difficile and E. coli (Supplementary Fig. 6).
Cross-species distribution of npmA2 and associated MGEs
To investigate the capacity of npmA for cross-species transmission, we investigated the genetic context of npmA2 in the two vancomycin-resistant, clonal E. faecium genomes identified (out of the 71 npmA-carrying isolates identified), both isolated from a single Dutch hospital16. The isolates were provided by the Microbiology Department at Onze Lieve Vrouwe Gasthuis, Amsterdam. These strains were derived from two patients who had undergone heart valve replacements. Resequencing both isolates using Oxford Nanopore technology, we found that npmA2 also appeared within the same Tn7734 and ICE Tn7740 structure observed in C. difficile isolates with a 99.9% identity (Fig. 5), demonstrating a common MGE responsible for the genetic mobilization of npmA2 across these two Gram-positive bacterial species. ICE Tn7740 was integrated into the chromosomal dinB gene (E6A31_14200, coordinates: 2,753,333-2,753,857; reference E. faecium strain SRR24, GenBank accession CP038996.1). Despite this shared genomic architecture, conjugation assays failed to detect transfer of ICE Tn7740 from C. difficile to either C. difficile or E. faecalis recipients. In contrast, control transfer of Tn6194 from C. difficile to E. faecalis occurred at a similar frequency to previous studies17. Similarly, intra-enterococcal mating experiments resulted in vanA-positive plasmid transfer at 1.82 ± 0.016×10⁻² per donor, but no npmA2-positive ICE Tn7740 transconjugants were recovered under the same conditions.

Comparison of the genetic environment surrounding npmA2 in C. difficile isolate SAMEA1710827 and E. faecium isolate SAMEA4885232, showing conserved synteny and shared mobile genetic elements.
To further characterize the clinical isolates recovered in the Netherlands, we determined the antimicrobial susceptibility profile of the two E. faecium isolates in vitro, summarized in Supplementary Table 1B, which revealed high minimum inhibitory concentrations (MICs) (≥ 1024 mg/L) for all the aminoglycosides tested. High levels of resistance to gentamicin, kanamycin and streptomycin can be explained in these E. faecium isolates by the presence of other aminoglycoside-modifying enzymes (Supplementary Tables 2 and 3). However, the high MIC values observed in the npmA2-carrying E. faecium isolates against apramycin, not explained by these enzymes, suggest an association between the presence of the npmA2 gene and aminoglycoside resistance (see Discussion).
In scrutinizing the distribution of MGEs (Tn7734 and ICE Tn7740) in the combined dataset, which included both the Blackwell dataset and the dataset of 33,253 C. difficile genomes (Supplementary Fig. 1b), we observed distinct patterns. ISCld1, a component of Tn7734, was found exclusively in association with C. difficile ST11, either co-existing with ICE Tn7740 (n = 8) or in ST11 genomes lacking the ICE (n = 8). Conversely, ICE Tn7740 exhibited a broader presence, being in various C. difficile STs (detailed in the Supplementary Table 4) and in genomes from the Lachnospiraceae family. We queried a comprehensive dataset of 147,851 high-quality metagenome-assembled and isolate genomes from the Unified Human Gastrointestinal Genome (UHGG) catalog18 for the presence of ISCld1 and ICE Tn7740. While ISCld1 was only discovered in a Roseburia hominis genome, ICE Tn7740 showcased extensive dissemination across multiple Lachnospiraceae family genomes (n = 14), including species such as Sellimonas intestinalis, Clostridium bolteae, and Dorea spp., and even Ruminococcus bromii genomes, belonging to the Acutalibacteraceae family (Supplementary Table 5).
Stability of npmA2 in different hosts
To evaluate the intrinsic stability of npmA2 without selective pressure, we conducted serial passaging experiments in various host backgrounds. In its native context within ICE Tn7740, C. difficile ST11 (SAMEA5846557) retained full gentamicin resistance in 100% checked colonies across six consecutive daily passages (~200 generations) (Supplementary Fig. 8). Similarly, the chromosomally integrated ICE in E. faecium E7474 remained stable over ~ 200 generations, with no loss of apramycin resistance. To assess the stability of npmA2 with its native promoter in a different mobile element, the gene was cloned into the shuttle vector pTG262. In E. coli HST08 and MG1655, plasmid maintenance remained complete (100%) over ~ 60 generations. In E. faecalis JH2-2, plasmid stability was slightly more variable. One biological replicate showed a marked loss (~50%) on day 2, while the other two remained > 90% stable. This single event reduced the day-2 group average to 83 ± 17%, but plasmid carriage recovered by day 6 (94 ± 4%), consistent with transient instability rather than sustained plasmid loss (Supplementary Fig. 9). These results indicate that npmA2 imposes minimal or no detectable fitness cost in Gram-positive hosts and remains fully stable in standard E. coli strains under laboratory conditions.