BIOC0003 - Introduction to Bioinformatics

In the output from FASTA, you saw an example of a multiple sequence alignment (MSA) - an alignment containing a number of sequences alowing you to make a comparison between them and extract common features.

Here is a set of five pfkA sequences resulting from the BLAST and FASTA searches that you will align and compare. Of course you could use any of the sequences that you like, or indeed all of the sequences identified by BLAST or FASTA (although the multiple alignment program would be very slow).

Generating a multiple alignment

Open one of the several sequence alignment software packages available on the web e.g.

http://www.ebi.ac.uk/Tools/msa/clustalo/
--or--
http://www.bioinformatics.nl/tools/clustalw.html

At the time of writing the Dutch server was not working!

Copy and paste into the form the whole set of pfkA sequences created above. In either case, leave all options at their default settings. Underneath the data input box you will find the Submit button to start the multiple alignment.

Press this button and wait a few seconds or minutes while the remote computer in Cambridge or Holland will align your sequences and give you the alignment with identical residues highlighted. It will also produce a new line beneath the sequences which indicates the Consensus sequence (here a '*' represents a fully conserved residue, ':' a highly conserved residue and '.' a fairly conserved residue).

Why this is useful

This type of analysis is very useful for the following reasons:

One can find conserved residues. Residues conserved throughout a range of species are probably conserved for a reason: functional or structural.
One can examine the evolutionary relationships between sequences
Alignments between more distantly related sequences can be more accurate when performed as part of a multiple alignment rather than as pairwise alignments
Predictions of properties such as secondary structure can be more effective when performed using a multiple alignment rather than a single sequence
One can predict limited structural information - for example, insertions/deletions tend to occur in loop regions and are unlikely to occur in the core structure or as part of the active site

Phylogeny

If you used the EBI server, you can click the Phylogenetic Tree tab at the top of the results and a Phylogram will be displayed at the bottom of the page. By default this displays a Cladogram - you can click the 'Real' button to switch this to a true 'Phylogram'. Click this button and compare the trees.

If you used the Dutch server, you can see two phylograms at the bottom of the page.

Now visit http://en.wikipedia.org/wiki/Phylogenetic_tree to understand the meaning of cladograms and phylograms.

Make sure you understand the difference between a Cladogram and a Phylogram

Multiple sequence alignments

Generating a multiple alignment

Why this is useful

Phylogeny