Multiple sequence alignments

In the output from FASTA, you saw an example of a multiple sequence alignment (MSA) - an alignment containing a number of sequences alowing you to make a comparison between them and extract common features.

Now you will learn how to make a multiple alignment yourself. You will need a set of sequences in the standard FASTA format .

Here is a set of five pfkA sequences resulting from the BLAST and FASTA searches that you will align and compare. Of course you could use any of the sequences that you like, or indeed all of the sequences identified by BLAST or FASTA (although the multiple alignment program would be very slow).

>EcolipfkA
MIKKIGVLTSGGDAPGMNAAIRGVVRSALTEGLEVMGIYDGYLGLYEDRMVQLDRYSVSD
MINRGGTFLGSARFPEFRDENIRAVAIENLKKRGIDALVVIGGDGSYMGAMRLTEMGFPC
IGLPGTIDNDIKGTDYTIGFFTALSTVVEAIDRLRDTSSSHQRISVVEVMGRYCGDLTLA
AAIAGGCEFVVVPEVEFSREDLVNEIKAGIAKGKKHAIVAITEHMCDVDELAHFIEKETG
RETRATVLGHIQRGGSPVPYDRILASRMGAYAIDLLLAGYGGRCVGIQNEQLVHHDIIDA
IENMKRPFKGDWLDCAKKLY
>SaltypfkA
MIKKIGVLTSGGDAPGMNAAIRGVVRAALTEGLEVMGIYDGYLGLYEDRMVQLDRYSVSD
MINRGGTFLGSARFPEFRDENIRAVAIENLKKRGIDALVVIGGDGSYMGAKRLTEMGFPC
IGLPGTIDNDIKGTDYTIGYFTALGTVVEAIDRLRDTSSSHQRISIVEVMGRYCGDLTLA
AAIAGGCEFIVVPEVEFNREDLVAEIKAGIAKGKKHAIVAITEHMCDVDELAHFIEKETG
RETRATVLGHIQRGGSPVPYDRILASRMGAYAIDLLLEGHGGRCVGIQNEQLVHHDIIDA
IENMKRPFKSDWMECAKKLY      
>VibvulpfkA
MIKKIGVLTSGGDAPGMNAAIRGVVRTALGAGLEVYGIYDGYLGLYEGRIKQLDRSSVSD
VINRGGTFLGSARFPEFKEVAVREKAIENLKAHGIDALVVIGGDGSYMGAKKLTEMGYPC
IGLPGTIDNDIAGTDYTIGYLTALNTVIESIDRLRDTSSSHQRISIVEIMGRHCGDLTLM
SAIAGGCEYIITPETGLDKEKLIGNIQDGISKGKKHAIIALTELMMDANELAKEIEAGTG
RETRATVLGHIQRGGRPTAFDRVLASRMGNYAVHLLMEGHGGRCVGIVKEQLVHHDIIDA
IENMKRPVRNDLFKVAEELF
>SpcitpfkA
MLKKIGILTSGGDSQGMNAAIAGVIKTAHAKGLETYIIRDGYLGLINNWIEVVDNNFADS
IMLLGGTVIGSARLPEFKDPEVQKKAVDILKKQEIAALVVIGGDGSYQGAQRLTELGINC
IALPGTIDNDITSSDYTIGFDTAINIVVEAIDRLRDTMQSHNRCSIVEVMGHACGIALYA
GIAGGADIISINEAALSETEIADRVAMLHQAQKRSVIVVVSEMIYPDVHKLAKLESKSGY
ITRATVLGTQRGGNPTAMDRYRAFQMAQFAVEQIIAGVGGLAIGNQGQIIARPIMEALSI
PRSSRKEIWAKFDQLNQNIYQKS
>MlepfkA
MQDEGMRIGILTGGGDCPGLNAVIRAIVRTCDARYGSSVVGFQDGWRGLLENRRMQLCND
DRNDRLLAKGGTMLGTAHVHPDKLRAGLHQIKQTLDDNGIDVLIPIGGEGTLTAAHWLSQ
EDVPVVGVPKTIDNDIDCTDVTFGHDTALTVATEAIDRLHSTAESHQRMLVEVMGRHAGW
IALSSGLASGAHMTLIPEQPFDVEEVCCLVKRRFQRGDSHFICVVAEGAKPVPGSITLRQ
GGMDEFGHERFTGVAAQLGAEVEKRINKDVRVTVLGHVQRGGTPTAFDRVLATRFGVNAA
DASHAGEYGQMVSLRGQDIGRVPLEDAVRQLKLVPESRYDDAAAFFG

Open one of the several sequence alignment software packages available on the web e.g.

http://www.ebi.ac.uk/Tools/msa/clustalo/
--or--
http://www.bioinformatics.nl/tools/clustalw.html

At the time of writing the Dutch server was not working!

Copy and paste into the form the whole set of pfkA sequences created above. In either case, leave all options at their default settings. Underneath the data input box you will find the Submit button to start the multiple alignment.

Press this button and wait a few seconds or minutes while the remote computer in Cambridge or Holland will align your sequences and give you the alignment with identical residues highlighted. It will also produce a new line beneath the sequences which indicates the Consensus sequence (here a '*' represents a fully conserved residue, ':' a highly conserved residue and '.' a fairly conserved residue).

This type of analysis is very useful for the following reasons:

  1. One can find conserved residues. Residues conserved throughout a range of species are probably conserved for a reason: functional or structural.
  2. One can examine the evolutionary relationships between sequences
  3. Alignments between more distantly related sequences can be more accurate when performed as part of a multiple alignment rather than as pairwise alignments
  4. Predictions of properties such as secondary structure can be more effective when performed using a multiple alignment rather than a single sequence
  5. One can predict limited structural information - for example, insertions/deletions tend to occur in loop regions and are unlikely to occur in the core structure or as part of the active site

If you used the EBI server, you can click the Phylogenetic Tree tab at the top of the results and a Phylogram will be displayed at the bottom of the page. By default this displays a Cladogram - you can click the 'Real' button to switch this to a true 'Phylogram'. Click this button and compare the trees.

If you used the Dutch server, you can see two phylograms at the bottom of the page.

Now visit http://en.wikipedia.org/wiki/Phylogenetic_tree to understand the meaning of cladograms and phylograms.

Make sure you understand the difference between a Cladogram and a Phylogram

Continue