Description of study site and outbreak
The outbreak of malaria began in Dire Dawa, Ethiopia, in late November 2021, continuing into the dry season of 2022, ultimately concluding around May [8, 20]. Dire Dawa is located 515 km east of Addis Ababa in central eastern Ethiopia [21]. Dire Dawa University is located in the northwest corner of Dire Dawa, with both institutional buildings, a health clinic, and dormitories on campus [8]. Mosquitoes were collected specifically at Dire Dawa University, where there seemed to be a concentration of cases [8].
Collection of Anopheles stephensi
Anopheles stephensi mosquitoes were collected in Dire Dawa, Ethiopia, between 19 and 31 March 2022, using a mouth aspirator. Adult mosquitoes were caught from inside offices or outdoors in manholes. Mosquitoes were then sorted and categorized as culicines and anophelines. The latter were identified according to a morphological key [22]. An. stephensi mosquitoes were sexed, and females stored individually in 1.5 mL tubes in a bag containing silica gel. Blood fed mosquitoes were recorded.
Anopheles stephensi species identification
All mosquitoes that were identified as recently blood fed were extracted (n = 5), and the rest of the mosquitoes were selected at random (n = 155). For the blood fed mosquitoes, DNA was extracted separately for the head/thorax and abdomens. Head and thoraces (n = 51), abdomens (n = 51), and whole mosquitoes (n = 58) were used to extract DNA using the DNeasy Blood and Tissue Kit or the DNA Micro Kit (Qiagen, Valencia, CA). Polymerase chain reactions (PCR) were conducted for each mosquito, targeting the An. stephensi specific nuclear internal transcribed spacer 2 (ITS2) locus and the universal mitochondrial cytochrome c oxidase subunit 1 (COI) [2, 23]. The final reagent components and concentrations for PCR were 1× Promega G2 HotStart Master Mix (Promega, Madison, WI), 0.5 mM for both primers, and 1 uL of DNA template. The endpoint assay targeted the ITS2 locus of An. stephensi [2]. The PCR temperature protocol was 95 °C for 1 min, 30 cycles of 95 °C for 30 s, 48 °C for 30 s, and 72 °C for 1 min; followed by 72 °C for 10 min. The PCR product was visualized via gel electrophoresis and a 522-base pair (bp) band was identified. Only An. stephensi samples would contain a band, so any samples that did not produce a band were not included in the sample set. The PCR temperature protocol consisted of 95 °C for 1 min, 30 cycles of 95 °C for 30 s, 48 °C for 30 s, and 72 °C for 1 min; followed by 72 °C for 10 min. COI PCR products were sequenced using Sanger technology by a commercial laboratory (Eurofins Genomics LLC, Psomagen).
Analysis of Plasmodium vivax Pv47 database sequences for geographically informative SNPs
An analysis of all available Pv47 sequences was performed using CodonCode aligner v9.01 software to identify any SNPs or amino acid substitutions. Sequences of Pv47 were first downloaded from the National Center for Biotechnology and Information (NCBI) GenBank, and previously generated Pv47 sequences from Ethiopia were included (Additional file 1). The sequences were combined into a single FASTA file (opened in CodonCode) and the sequences were assembled into a contig. Sequences were organized by continent and differences in the sequences were visualized. Sequences were then translated into amino acids and the reading frame was selected on the basis of the reference amino acid sequence in NCBI (XM_001614197.1). Differences in amino acids were then visualized.
Plasmodium falciparum Pfs47 and Plasmodium vivax Pv47 primer design
The Pfs47 gene from the reference P. falciparum genome (NCBI: taxid36329, strain Pf3D7) was used for the primers. The reference gene of Pfs47 from the PF3D7 African strain of P. falciparum was put into Primer3Plus, with a target product indicating the SNPs previously identified as differentiating the continental origin of P. falciparum at bp positions 707 and 725. Primers were tested using a positive control of P. falciparum DNA. Positive control amplicons of Pfs47 were sequenced to confirm the target region at 559 bp (Additional file 3: Table 2).
For the Pfs47 nested primers, the amplified sequence of the un-nested primers was determined using NCBI Primer Blast. This product was uploaded to Primer3Plus and the area previously identified with SNPs was selected as a target. Potential primers were tested against the nonredundant and RefSeq mRNA databases in NCBI Primer Blast, and primers that did not also target An. stephensi DNA were selected. These primers were tested against the positive control DNA and sequenced to confirm the correct target. This produced a product of 531 bp (Additional file 3: Table 2).
A similar protocol was used to design the Pv47 and Pv47 nested primers as described above. The Pv47 gene from the reference P. vivax genome (NCBI: taxid126793, XM_001614197.1) was used. The gene was put into NCBI Primer Blast, excluding An. stephensi DNA, with a target indicating where SNPs were identified to be in the described alignment. Potential primers were tested against the nonredundant and Ref Seq mRNA databases in NCBI Primer Blast and primers that did not also target An. stephensi DNA were selected. Primers were tested using a positive control of P. vivax DNA. Positive control amplicons were sequenced to confirm the target region at 581 bp (Additional file 3: Table 2).
The Pv47 nested primers were created by first determining the amplified sequence of the un-nested primers using NCBI Primer Blast. The product was uploaded to Primer3Plus and the area previously identified to contain the SNPs was selected as a target. Potential primers were tested against the nonredundant and Ref Seq mRNA databases in NCBI Primer Blast, and primers that did not also target An. stephensi DNA were selected. These primers were tested against the positive control and sequenced for target region confirmation (a product of 414 bp) (Additional file 3: Table 2).
Plasmodium falciparum Pfs47 targeted sequencing
Plasmodium falciparum Pfs47 was genotyped to detect parasite presence in An. stephensi samples. A positive control of positive P. falciparum human blood DNA extractions (provided by Dr. Eugenia Lo at Drexel University) served to verify successful amplification, alongside a negative control lacking genomic DNA to ensure no contamination was present. Un-nested PCR reactions were conducted initially to detect P. falciparum Pfs47 presence before running nested PCR reactions. The Pfs47 primers amplified a 559 bp un-nested fragment in P. falciparum (Additional file 3: Table 2). Un-nested protocol reagents and concentrations consisted of 1× Promega G2 HotStart Master Mix (Promega, Madison, Wisconsin, USA), 0.4 mM of primer, plus 4 uL of isolated DNA template. The cycling protocol was as follows: 95 °C for 1 min, 34 cycles of 95 °C for 1 min, 57 °C for 1 min, 72 °C for 1.5 min, and an extension step of 72 °C for 10 min. The nested protocol called for a second set of primers to selectively amplify a 531 bp fragment of P. falciparum Pfs47 (Additional file 3: Table 2). The nested reaction was performed with 1× Promega G2 HotStart Master Mix (Promega, Madison, Wisconsin, USA) 0.4 mM of primer, plus 2 uL of the PCR product from the initial un-nested reaction. The cycling protocol was as follows: 95 °C for 10 min, 34 cycles of 95 °C for 1 min, 58 °C for 1 min, 72 °C for 1 min, followed by 72 °C for 5 min.
Plasmodium vivax Pv47 targeted sequencing
PCR was conducted to amplify the Pv47 gene for the detection of P. vivax in mosquito samples. A positive control of positive P. vivax human blood DNA extractions (provided by Eugenia Lo at the University of North Carolina at Charlotte) was included, alongside a negative control lacking genomic DNA to ensure no contamination was present. An un-nested protocol was used to detect the presence of P. vivax Pv47, followed by a nested protocol.
Pv47 primers amplified a 581 bp un-nested fragment in Plasmodium vivax (Additional file 3: Table 2). The un-nested protocol reagents and concentrations consisted of 1× Promega G2 HotStart Master Mix (Promega, Madison, Wisconsin, USA), 0.4 mM of primer, plus 4 uL of isolated DNA template. The cycling protocol was as follows: 95 °C for 1 min, 34 cycles of 95 °C for 1 min, 58 °C for 1 min, 72 °C for 1.5 min, and an extension step of 72 °C for 10 min. The nested protocol called for a second set of primers to selectively amplify a 414 bp fragment of P. vivax Pv47 (Additional file 3: Table 2). The nested reaction was performed with 1× Promega G2 HotStart Master Mix (Promega, Madison, Wisconsin, USA), 0.4 mM of primer, plus 2 uL of the PCR product from the initial un-nested reaction. The cycling protocol was as follows: 95 °C for 10 min, 34 cycles of 95 °C for 1 min, 60 °C for 1 min, 72 °C for 1 min, followed by 72 °C for 5 min. All PCR products were run on a 2% agarose gel and visualized. Positive samples were sequenced via Sanger Sequencing at a commercial laboratory (Eurofins Genomics LLC, Psomagen).
Anopheles stephensi P47Rec targeted sequencing
PCR was conducted to amplify the P47 receptor in An. stephensi samples. Overall, 20 mosquitoes were randomly selected from Kebridehar, Ethiopia (2018), 20 from Semera, Ethiopia (2018), 20 from Dire Dawa, Ethiopia (2018), 13 from Lawyacado, Somalia (2021), and 20 more from Dire Dawa (2022). P47Rec primers amplified a 688 bp fragment including the second and third exon of the protein (F: 5′-TGGCAAATGACTAACGTGGA-3′, R: 5′-GTGTTGCCAGTTCGCTGTAA-3′). The cycling protocol was as follows: 95 °C for 1 min, 34 cycles of 95 °C for 1 min, 58 °C for 1 min, 72 °C for 1.5 min, and an extension step of 72 °C for 10 min. All PCR products were run on a 2% agarose gel and visualized. All samples were sequenced via Sanger Sequencing at a commercial laboratory (Psomagen).
Pfs47 and Pv47 sequence analysis
The haplotypes for the sequences that indicated multiple nucleotides read at a single position were phased out using DNAsp v 5.10.01. The FASTA file containing cleaned sequences for both genes were uploaded into DNAsp and phased using Markov chain Monte Carlo (MCMC) standard options (100 iterations, 1 thinning interval, 100 burn-in iterations). A new FASTA file was exported and each sample that had multiple haplotypes was denoted as haplotype 1 or 2 after the sample name. Sequences were also translated using CodonCode v9.0.1 and the reading frame was selected to match the protein coding of Pfs47 and Pv47 on NCBI (PV077639-PV077662).
COI phylogenetic analysis
Phylogenetic analysis was first performed on the COI sequences of Pfs47 or Pv47 positive samples with an outgroup of An. maculatus (KT382822). Phylogenetic analyses were estimated using a maximum likelihood approach with RaxML [24]. The GTR GAMMA option that uses the general time-reversible model of nucleotide substitution with the gamma model rate of heterogeneity was used. In total, 1000 runs were completed with rapid bootstrap analysis. The RAxML output was viewed in FigTree with a root at the outgroup [25].
Isolation of Pfs47, Pv47, and P47Rec sequences from whole genome data
The extraction of Pfs47, Pv47, and P47Rec sequences followed a comprehensive bioinformatics pipeline. First, the raw FASTQ files were obtained from the databases MalariaGen and NCBI [26, 27]. These sequences were aligned to their respective reference genomes (PF3D7 for Pfs47, XM_001614197 for Pv47, and UCI_ANSTEP_V1.0 for P47Rec) using Bowtie2, ensuring precise mapping of reads. The aligned sequences were converted into BAM files using Samtools to create binary alignments. Subsequently, Bcftools was used for variant calling, employing “mpileup” to aggregate aligned reads and “call” to identify potential variants. Variant filtration ensured high-quality data, with filters applied for Phred-scaled quality scores greater than 30, read depths above 10, and variant frequencies above 1%.
The variants were processed to generate FASTA sequences after quality filtering. These sequences were further aligned using MAFFT to prepare them for downstream analyses, including evolutionary or functional assessments [28]. This methodical process ensured accurate extraction and alignment of the target gene sequences from complex whole genome datasets.
Minimum spanning network analysis
Sequences from online databases such as NCBI GenBank and MalariaGen were processed as described and aligned with all Pfs47 or Pv47 sequences in this study using CodonCode Aligner v9.0.1 and exported into a FASTA file. This FASTA file was converted to rphylips and the data was copied into Microsoft Excel. Random haplotype names were given to every unique haplotype for both Pfs47 and Pv47. The frequencies and sequences of each haplotype were formatted into a .nex file to be imported in popart [29]. A minimum spanning network was then created using standard settings and an epsilon value of 0. Haplotypes were colored according to the continent and the presence of An. stephensi.