Phylogeny tutorials - 16S rDNA multiple sequence alignments, model fitting and maximmum likelihood tree searches

This page contanis links to diverse tutorials or "HOWTOs" on performing phylogenetic analyses with sequence data using Internet resources. In addition we introduce the primers4clades web server, which allows you to use phylogenetic trees to aid in the process of designing oligonucleotide PCR primers for particular clusters or clades of sequences.

 

  • Introduction

The aims of the following three tutorials are: i ) to introduce the reader to the key aspects that have to be taken into account in order to make a rigorous phylogenetic analysis of small-subunit ribosomal RNA gene (rrs) sequences and ii ) to illustrate the use of three websites that serve powerful analysis tools to perform these tasks correctly and efficiently.

By far and large, the rrs gene has been the most widely used molecular marker in bacterial molecular systematics and ecology. However, this marker is not easy to analyze properly. Watch out for the following issues:

  1. Obtain correct multiple sequence alignment based on secondary structure motifs.
  2. Check for the presence of intragenic mosaicism or gene chimeras.
  3. Select a realistic (or at least not so unrealistic) nucleotide substitution model.
  4. Perform tree searches using optimality criteria instead of algorithmic distance-matrix based tree reconstruction methods (such as neighbor-joining).

The tutorials will focus on points 1, 3 and 4. We will learn how to perform correct multiple sequence alignments using the GreenGenes and RDPII sites. Once obtained, the best-fitting or approaching nucleotide substitution model(s) will be searched for using Modeltest, as implemented in the FindModel web server. Finally, the reader will learn how to run a maximum-likelihood tree search using the PhyML online tool.

 

 

The Tutorials:

(note: the tutorial on ML tree-searching will be available within the next few days)

 

  • How to align SSU rRNA gene (16S rDNA) sequences properly, taking secondary structure motifs into account? HTML (Contributed by Pablo Vinuesa)

This tutorial shows how to use the GreenGenes and RDPII web servers to perform different tasks related to the analysis of ribosomal gene sequences

These are very useful and great sites when you need to align, retrieve or check your 16S rDNA sequences for chimeric structures !!!

 

  • How to select a realistic (or at least not too unrealistic) nucleotide substitution model for my DNA sequences? HTML (Contributed by Pablo Vinuesa)

This tutorial shortly explains what model fitting is and why it is important to model-based phylogeny inference methods (distance-matrix based methods, ME, ML, Bayesian) and shows how this fundamental task can be performed automaticaly using Modeltest, as implemented in the FindModel web server.

 

  • How to perform a maximum likelihood tree search using best-fit substitution models? HTML (Contributed by Pablo Vinuesa)

Here the reader will get an intuitive notion and learn the most important things to know about phylogeny inference in a ML framework from a practical o user standpoint. After laying the ground, which includes the previous tutorial on model fitting and concepts such as the likelihood function, the likelihood ratio test etc., this tutorial shows how to run a maximum-likelihood tree search using the PhyML online tool.

 

primers4clades is an easy-to-use web server developed for researchers interested in designing PCR primers to amplify novel sequences from metagenomic DNA or from uncharacterized organisms belonging to user-specified phylogenetic clades. It implements complementary primer design strategies based on both DNA and protein multiple sequence alignments of coding sequences. It evaluates a comprehensive set of thermodynamic properties of the oligonucleotide pairs, as well as the phylogenetic information content of the theoretical amplicons computed from the branch support values of maximum likelihood phylogentic trees estimated for each theoretical amplicon. Phylogenetic trees are also used to make it easy for the user to target the primer design for particular clades. It is developed and maintained by Bruno Contreras-Moreira and Pablo Vinuesa and is mirrored in Spain and Mexico.

 

  • Tutorials and courses on bioinformatics and molecular phylogenetics are available from here (only Spanish versions at the present time). Cursos y tutoriales sobre bioinformática y filogenética molecular en español los puedes accesar desde aquí.