Databases
Assigning bacterial strains to species via the Internet – Electronic
taxonomy
www.eMLSA.net provides a portal for the electronic taxonomy of bacteria, providing a common format and software for assigning strains to species via the Internet. Electronic taxonomy contrasts with the current approach for distinguishing species within a genus, and for defining new species, which is based on polyphasic taxonomy, an approach that incorporates all available phenotypic and genotypic data into a consensus classification (Vandamme et al., 1996). For further information and instructions please see the Instruction
pages. |
|
- Introduction.
- Primers, PCR Conditions and sequence trimming.
- Entering your sequences.
- Submission of Strains.
- System Requirements.
- Acknowledgements.
- References.
eMLSA.net provides a portal for the electronic taxonomy of bacteria, providing a common format and software for assigning strains to species via the Internet.
Electronic taxonomy contrasts with the current approach for distinguishing species within a genus, and for defining new species, which is based on polyphasic taxonomy, an approach that incorporates all available phenotypic and genotypic data into a consensus classification (Vandamme et al., 1996).
The accepted genotypic method for defining species is based on overall genomic relatedness, such that strains which share approximately 70% of more relatedness using DNA-DNA hybridization, under standard conditions, are considered to be members of the same species (Wayne et al., 1987). In recent years there has been an interest in improving the way molecular data are used to help define species. In particular, there has been a move away from the cumbersome DNA-DNA hybridization procedure and fixed or semi-fixed cut-off values for defining species, and an increased interest in other molecular approaches to identifying and circumscribing bacterial species (Stackebrandt et al., 2002).
One newer approach is to observe the distribution of a large number of strains of closely related species in sequence space and to identify clusters of strains that are well resolved from other clusters. This approach has been developed by using the concatenated sequences of multiple core (house-keeping) genes to assess clustering patterns, and has been called multilocus sequence analysis (MLSA; Gevers et al., 2005; Hanage et al., 2006), or multilocus sequence phylogenetic analysis (Nørskov-Lauritsen et al.,2005). MLSA has been used successfully to explore clustering patterns among large numbers of strains assigned to very closely-related species by current taxonomic methods (Godoy et al., 2003; Hanage et al., 2005a,b, 2006; Hoshino et al., 2005, Bennett et al., 2007; Kilian et al., 2008; Bishop et al., submitted), to look at the relationships between small numbers of strains within a genus (Martens et al., 2008), or within a broader taxonomic grouping (Sawabe et al., 2007), and to address specific taxonomic questions (Postic et al., 2007; Thompson et al. 2007). More generally, MLSA can be used to ask whether bacterial species exist – that is, to observe whether large populations of similar strains invariably fall into well resolved clusters, or whether in some cases there is a genetic continuum in which clear separation into clusters is not observed.
eMLSA is the implementation of the MLSA approach to the assignment of strains to species clusters via the internet. eMLSA requires the generation of a large database of the sequences of the multiple house-keeping loci (typically about seven loci) from multiple strains of a set of related species of interest. It also requires software that can concatenate the sequences and produce a tree, which shows the patterns of clustering of the sequences. By comparing the species assignments of the strains within each sequence cluster, including the position of the type strain of each species, the sequence clusters can be assigned as species clusters. Having generated a database and assigned the species clusters, the sequences of the multiple loci of a query strain are submitted to the relevant eMLSA.net subsite and the position of the strain on the reference tree is returned, with the species assignment. Additional software at eMLSA.net allows the most similar concatenated sequences in the database(and the species assigned to these closest matches) to be returned and for the source of each allele to be assigned as resident to that species or possibly imported from a related species.
Microbiologists are encouraged to submit new strains, and the corresponding
sequences, to the curator at eMLSA.net as this expands the database,
but also has the potential to identify new sequence clusters, which may
be assigned as new species. Our view is that electronic taxonomy, as
implemented at eMLSA.net, provides a way of assigning strains to species
that is at least as valid as other methods. At the very least it provides
a hypothesis about the relationships between sequence clusters generated
using core genes and species clusters that can be tested, revised or refuted
by the addition of phenotypic, genotypic or ecological data.
At present, eMLSA.net databases are available for viridans group streptococci,
a group where species assignment by more traditional means is problematic,
for Burkholderia pseudomallei and related species, and for Borrelia
species.
Please contact David Aanensen (d.aanesen[at]imperial.ac.uk) if you wish to host a MLSA subsite at eMLSA.net.
Primer |
Gene product |
Sequence (5'-3')* |
Primer length (bp) |
Trimmed fragment size (bp) |
Annealing temperature (oC) |
| map-up map-dn |
Methionine aminopeptidase |
GCWGACTCWTGTTGGGCWTATGC TTARTAAGTTCYTTCTTCDCCTTG |
23 24 |
348 |
55 55 |
| pfl-up pfl-dn |
pyruvate formate lysase |
AACGTTGCTTACTCTAAACAAACTGG ACTTCRTGGAAGACACGTTGWGTC |
26 24 |
351 |
55 55 |
| ppaC-up ppaC-dn |
Inorganic pyrophosphatase |
GACCAYAATGAATTYCARCAATC TGAGGNACMACTTGTTTSTTACG |
23 23 |
552 |
50 50 |
| pyk-up pyk-dn |
Pyruvate kinase |
GCGGTWGAAWTCCGTGGTG GCAAGWGCTGGGAAAGGAAT |
19 20 |
492 |
50 50 |
| rpoB-up rpoB-dn |
RNA polymerase beta subunit |
AARYTIGGMCCTGAAGAAAT TGIARTTTRTCATCAACCATGTG |
20 22 |
516 |
50 50 |
| sodA-up sodA-dn |
Superoxide dismutase |
TRCAYCATGAYAARCACCAT ARRTARTAMGCRTGYTCCCARACRTC |
20 26 |
378 |
50 50 |
| tuf-up tuf-dn |
Elongation factor Tu |
GTTGAAATGGAAATCCGTGACC GTTGAAGAATGGAGTGTGACG |
22 21 |
426 |
55 55 |
*Sequences are shown using the IUPAC codes for sites where degeneracy was introduced.
To view a single example of each sequence please click the following links - COMING SHORTLY!!!
After choosing the eMLSA database from the links on the left hand side of the page, you will be presented with a form consisting of a number of text boxes corresponding to the loci used for MLSA within the species group. The gene name can be seen above each textbox. Paste each trimmed sequences into the corresponding textbox as one continuous string (ie including no spaces.)
Error messages are returned should your sequence contain any non-DNA characters or be of incorrect length for that locus.
Once all seven sequences are entered correctly, the borders of the textboxes will turn green indicating that the sequences are correctly entered.
Should the genome sequences be available for strains of some species within a species group, a checkbox is present which, when checked, will include the concatenated sequences from these genomes when producing the reference tree.
By default genome sequences are NOT included in the reference tree.
Click ‘submit’ to proceed.

Click ‘Click to view tree’ which will take you to the Tree View page.
Navigation is undertaken using the links in the ‘Navigation’ Menu and four links are provided as follows.
- Reference Tree View.
- Locus View.
- Database Query.
- Downloading Data.
On submitting your seven sequences, they are concatenated, and the concatenate is added to the database that includes the sequences of all strains within the species group. These sequences are then aligned and a distance matrix is produced. A Neighbor joining tree based on this matrix is then viewable on the website. This process can take time (typically about 30 seconds) and loading indicators are shown on the page while processing of the data takes place. The results window consists of four sections as follows – |
||||||
![]() |
||||||
|
||||||
|
||||||
|
||||||
|
Any node on the tree can be collapsed allowing easier viewing of smaller subsets of sequences. Using the pointer tool, click on the node that you wish to collapse. A context menu appears offering a number of options as in the following screenshot.

Select 'Tree Edit' and then 'Collapse' The node selected will then be collapsed and in its place a yellow triangle will appear as in the following screenshot.

You may need to zoom out to see the full tree and adjust the terminal node and text sizes using the sliders or alternatively you can select 'View' and 'Zoom to full' (shortcut - Ctrl+F) from the Phylowidget menu.
To uncollapse the node, simply click the yellow triangle.
The Locus View contains sections which, once loaded, display
the closest matches within the database to each of your pasted in allele
sequences. Based on a hypothetical set of sequences where the overall species assignment (based on the concatenated sequences) assigns the strain as Streptococcus australis we can envision the following circumstances for each locus. |
Resident : The sequence at the locus is assigned to the same species as that assigned to the concatenated sequence.
The overall species assignment is S.australis and the query sequence for the pfl locus falls into the S.australis cluster, which is well resolved from other species on the single gene tree. You can click the button to view the individual gene tree to see where the query sequence clusters on the pfl gene tree. |
|
Resident Compatible : The sequence at the locus is compatible with being from the same species as that assigned to the concatenated sequence.
The overall species assignment is S.australis. However, even though the top match at the ppaC locus returns S.australis, the S.australis sequences on the individual gene tree are not well resolved from S.infantis. Consequently, the 2nd, 3rd and 4th matches are to S.infantis rather than S.australis. The Query ppaC sequence is assigned as ’Resident compatible’ as it is almost certainly a native S.australis allele, although the lack of resolution from S.infantis makes it a theoretical possibility that it has been imported from the latter species. Clicking the button shows the ppaC gene tree to see where the query sequence clusters. |
|
Foreign : The allele appears to have been introduced from a related species.
The overall species assignment is S.australis. However, the query sequence at the map locus is most similar to S.infantis strains. As the S.infantis sequences are well resolved from S.australis sequences on the map gene tree the map query sequence is assigned as ’foreign’, presumably having been imported from an S.infantis strain. |
The database query allows the strains submitted to the eMLSA scheme to be interrogated and further details returned. Information included will depend on the species group under investigation but a list of searchable columns can be viewed in the dropdown lists of the search form. Enter your search terms in the boxes provided. After submitting, results appear below the search boxes.![]() |
You can download all data used to generate trees and related epidemiological data for all strains within the database.
All sequences (concatenated or for individual loci) can be downloaded in MEGA, FASTA, Tab-delimited or comma separated (CSV) formats, for use in other analysis programs.
Simply click the corresponding arrow to download the sequence data.

From within Phylowidget you can also download the NEWICK data that describes the tree shown. Simply click (on the Phylowidget Menu) 'File' and then 'Save Tree'.
The submission of MLSA data on strains within, or closely allied to, groups of species included in the different MLSA schemes is strongly encouraged as it builds the database and potentially can identify new species clusters.
Please email the correctly trimmed sequences for all loci, and the forward and reverse sequencer trace files, to the curator and your proposed species assignment for the strain. The curator will check the data and will request further details of the strain before entering it in the relevant MLSA database.
The names and email addresses of the curators are given on the front page of each MLSA scheme.
eMLSA.net is built using AJAX, Perl and PHP. Phylowidget is built using Processing and as such requires JAVA to be installed on your computer along with a Javascript enabled Web browser. Realistically, most modern Operating systems come with the correct version of JAVA and a web browser capable of running the site. However, if you experience any difficulties please refer to the following specifics -
JAVA-
PhyloWidget requires Java 1.5 or higher (this is also known as Java 5.0). Most modern systems, including Windows XP andMac OS X should have this installed; if for some reason it appears that Java is not set up correctly on your system, simply visit http://www.Java.com to download the latest version.
Automatic JAVA detection is present on this website and the lower box on the frontpage will detect whether JAVA is installed and which version you have.
Web Browser - Below is a table containing recommended browsers for viewing eMLSA.net. We recommend using Firefox if possible - it is free to download and available for most commonly used Operating Systems.
For specific known browser issues when using Phylowidget please see here
OS |
Browser |
Support(and
known issues) |
Download |
Windows |
Firefox |
3.0+ OK |
|
Internet Explorer |
v5.0+ OK, some issues with JAVA refresh on IE8beta+ |
||
Safari |
3.0+ OK |
||
Mac OSX |
Firefox |
3.0+ OK |
|
Safari |
3.0+ OK |
||
Linux |
Firefox |
3.0+ OK |
We encourage you to let us know if you experience difficulties so that we can improve eMLSA.net.
Should you have any issues please email David Aanensen.
Troubleshooting
- Browser crash / freeze-up
With certain versions of Firefox, Java, and Windows XP or Vista, we have observed the browser crashing entirely when PhyloWidget or any other Java applet is run. If this is happening to you, please visit Java.com and update your version of Java.
- Sluggish / strange behavior
The main point is this: Most strange or sluggish behavior in PhyloWidget can be fixed by restarting your browser!!
Usually refreshing the browser will not fix problems with sluggishness. The Java runtime environment stays loaded for the entire time the browser is running, so the only way to give PhyloWidget a "fresh start" is to completely quit the browser, and then start it back up.
To reiterate: If you're having a problem with PhyloWidget suddenly acting strange, 90% of the time restarting the browser will solve it!
- OutOfMemory Error / Java heap space
If you see either of these error message while using PhyloWidget, or the program appears to putter out when working with large trees or images, then the operation you attempted may be using more memory than is allocated to Java applets by default.
The simplest solution is to download the standalone version of PhyloWidget from the homepage and run the program as an application, where more memory is allocated to the JVM.
Another option is to edit your computer-specific applet settings to allocate more memory: see the relevant Java forum thread, or search Google for java applet heap space. One set of visual instructions for Windows XP can be found here.
- Error pop-ups
Some users, especially on Linux systems, have reported seeing pop-up messages mentioning Null Pointer Exceptions when trying to load PhyloWidget. If you see this, please submit a bug report to David Aanensen
- The server-side version of MEGA4 has been kindly provided by Dr Sudhir Kumar, Arizona State University.
Please click the image to visit the MEGA website.
- eMLSA.net thanks the developers of the following Javascript libraries-
- A special thanks to Greg Jordan for adapting Phylowidget for the purposes of eMLSA.net.
For further information on PhyloWidget please click here
Gevers D, Cohan FM, Lawrence JG, Spratt BG, Coenye T, Feil EJ, Stackebrandt E, Van de Peer Y, Vandamme P, Thompson FL, Swings J (2005). Opinion: Re-evaluating prokaryotic species. Nat Rev Microbiol Click here to toggle abstract | |
Godoy D, Randle G, Simpson AJ, Aanensen DM, Pitt TL, Kinoshita R, Spratt BG (2003) Multilocus sequence typing and evolutionary relationships among the causative agents of melioidosis and glanders, Burkholderia pseudomallei and Burkholderia mallei. J Clin Microbiol Click here to toggle abstract | |
Hanage WP, Fraser C, Spratt BG (2005) Fuzzy species among recombinogenic bacteria. BMC Biol Click here to toggle abstract | |
Hanage WP, Kaijalainen T, Herva E, Saukkoriipi A, Syrjänen R, Spratt BG (2005) Using multilocus sequence data to define the pneumococcus. J Bacteriol Click here to toggle abstract | |
Hanage WP, Fraser C, Spratt BG (2006) Sequences, sequence clusters and bacterial species. Philos Trans R Soc Lond B Biol Sci Click here to toggle abstract | |
Hoshino T, Fujiwara T, Kilian M (2005) Use of phylogenetic and phenotypic analyses to identify nonhemolytic streptococci isolated from bacteremic patients. J Clin Microbiol Click here to toggle abstract | |
Kilian M, Poulsen K, Blomqvist T, Håvarstein LS, Bek-Thomsen M, Tettelin H, Sørensen UB (2008) Evolution of Streptococcus pneumoniae and its close commensal relatives. PLoS ONE Click here to toggle abstract | |
Martens M, Dawyndt P, Coopman R, Gillis M, De Vos P, Willems A (2008) Advantages of multilocus sequence analysis for taxonomic studies: a case study using 10 housekeeping genes in the genus Ensifer (including former Sinorhizobium). Int J Syst Evol Microbiol Click here to toggle abstract | |
Nørskov-Lauritsen N, Bruun B, Kilian M(2005) Multilocus sequence phylogenetic study of the genus Haemophilus with description of Haemophilus pittmaniae sp. nov. Int J Syst Evol Microbiol Click here to toggle abstract | |
Postic D, Garnier M, Baranton G (2007) Multilocus sequence analysis of atypical Borrelia burgdorferi sensu lato isolates--description of Borrelia californiensis sp. nov., and genomospecies 1 and 2. Int J Med Microbiol Click here to toggle abstract | |
Sawabe T, Kita-Tsukamoto K, Thompson FL (2007) Inferring the evolutionary history of vibrios by means of multilocus sequence analysis. J Bacteriol Click here to toggle abstract | |
Thompson FL, Gomez-Gil B, Vasconcelos AT, Sawabe T (2007) Multilocus sequence analysis reveals that Vibrio harveyi and V. campbellii are distinct species. Appl Environ Microbiol Click here to toggle abstract | |
Stackebrandt E, Frederiksen W, Garrity GM, Grimont PA, Kämpfer P, Maiden MC, Nesme X, Rosselló-Mora R, Swings J, Trüper HG, Vauterin L, Ward AC, Whitman WB (2002) Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int J Syst Evol Microbiol Click here to toggle abstract | |
Vandamme P, Pot B, Gillis M, de Vos P, Kersters K, Swings J (1996) Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiol Rev Click here to toggle abstract |









