Bioplib
Protein Structure C Library
 All Data Structures Files Functions Variables Typedefs Macros Pages
Data Structures | Macros | Functions
align.c File Reference

Perform Needleman & Wunsch sequence alignment. More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "SysDefs.h"
#include "macros.h"
#include "array.h"
#include "general.h"
#include "seq.h"

Go to the source code of this file.

Data Structures

struct  XY
 

Macros

#define MAX3(c, d, e)   (MAX(MAX((c),(d)),(e)))
 
#define DATAENV   "DATADIR" /* Environment variable or assign */
 
#define MAXBUFF   400
 
#define MAXWORD   16
 

Functions

int blAlign (char *seq1, int length1, char *seq2, int length2, BOOL verbose, BOOL identity, int penalty, char *align1, char *align2, int *align_len)
 
int blAffinealign (char *seq1, int length1, char *seq2, int length2, BOOL verbose, BOOL identity, int penalty, int penext, char *align1, char *align2, int *align_len)
 
int blAffinealignuc (char *seq1, int length1, char *seq2, int length2, BOOL verbose, BOOL identity, int penalty, int penext, char *align1, char *align2, int *align_len)
 
BOOL blReadMDM (char *mdmfile)
 
int blCalcMDMScore (char resa, char resb)
 
int blCalcMDMScoreUC (char resa, char resb)
 
int blZeroMDM (void)
 
void blSetMDMScoreWeight (char resa, char resb, REAL weight)
 

Detailed Description

Perform Needleman & Wunsch sequence alignment.

Version
V3.6
Date
04.01.16
Author
Dr. Andrew C. R. Martin
Institute of Structural & Molecular Biology, University College London, Gower Street, London. WC1E 6BT.
andre.nosp@m.w@bi.nosp@m.oinf..nosp@m.org..nosp@m.uk andre.nosp@m.w.ma.nosp@m.rtin@.nosp@m.ucl..nosp@m.ac.uk

This code is NOT IN THE PUBLIC DOMAIN, but it may be copied according to the conditions laid out in the accompanying file COPYING.DOC.

The code may be modified as required, but any modifications must be documented so that the person responsible can be identified.

The code may not be sold commercially or included as part of a commercial product except as described in the file COPYING.DOC.

Description:

A simple Needleman & Wunsch Dynamic Programming alignment of 2 sequences.

A window is not used so the routine may be a bit slow on long sequences.

Usage:

First call ReadMDM() to read the mutation data matrix, then call align() to align the sequences.

Revision History:

Definition in file align.c.

Macro Definition Documentation

#define DATAENV   "DATADIR" /* Environment variable or assign */

Definition at line 148 of file align.c.

#define MAX3 (   c,
  d,
 
)    (MAX(MAX((c),(d)),(e)))

Definition at line 146 of file align.c.

#define MAXBUFF   400

Definition at line 150 of file align.c.

#define MAXWORD   16

Definition at line 151 of file align.c.

Function Documentation

int blAffinealign ( char *  seq1,
int  length1,
char *  seq2,
int  length2,
BOOL  verbose,
BOOL  identity,
int  penalty,
int  penext,
char *  align1,
char *  align2,
int *  align_len 
)
Parameters
[in]*seq1First sequence
[in]length1First sequence length
[in]*seq2Second sequence
[in]length2Second sequence length
[in]verboseDisplay N&W matrix
[in]identityUse identity matrix
[in]penaltyGap insertion penalty value
[in]penextExtension penalty
[out]*align1Sequence 1 aligned
[out]*align2Sequence 2 aligned
[out]*align_lenAlignment length
Returns
Alignment score (0 on error)

Perform simple N&W alignment of seq1 and seq2. No window is used, so will be slow for long sequences.

Note that you must allocate sufficient memory for the aligned sequences. The easy way to do this is to ensure that align1 and align2 are of length (length1+length2).

  • 07.10.92 Adapted from original written while at NIMR
  • 08.10.92 Split into separate routines
  • 09.10.92 Changed best structure to simple integers, moved SearchForBest() into TraceBack()
  • 21.08.95 Was only filling in the bottom right cell at initialisation rather than all the right hand column and bottom row
  • 11.07.96 Changed calls to calcscore() to CalcMDMScore()
  • 06.03.00 Changed name to affinealign() (the routine align() is provided as a backwards compatible wrapper). Added penext parameter. Now supports affine gap penalties with separate opening and extension penalties. The code now maintains the path as it goes.
  • 07.07.14 Use bl prefix for functions By: CTP
      NOTE AND CHANGES SHOULD BE PROPAGATED TO affinealignuc()   ******

Definition at line 275 of file align.c.

int blAffinealignuc ( char *  seq1,
int  length1,
char *  seq2,
int  length2,
BOOL  verbose,
BOOL  identity,
int  penalty,
int  penext,
char *  align1,
char *  align2,
int *  align_len 
)
Parameters
[in]*seq1First sequence
[in]length1First sequence length
[in]*seq2Second sequence
[in]length2Second sequence length
[in]verboseDisplay N&W matrix
[in]identityUse identity matrix
[in]penaltyGap insertion penalty value
[in]penextExtension penalty
[out]*align1Sequence 1 aligned
[out]*align2Sequence 2 aligned
[out]*align_lenAlignment length
Returns
Alignment score (0 on error)

Perform simple N&W alignment of seq1 and seq2. No window is used, so will be slow for long sequences.

Note that you must allocate sufficient memory for the aligned sequences. The easy way to do this is to ensure that align1 and align2 are of length (length1+length2).

  • 07.10.92 Adapted from original written while at NIMR
  • 08.10.92 Split into separate routines
  • 09.10.92 Changed best structure to simple integers, moved SearchForBest() into TraceBack()
  • 21.08.95 Was only filling in the bottom right cell at initialisation rather than all the right hand column and bottom row
  • 11.07.96 Changed calls to calcscore() to CalcMDMScore()
  • 06.03.00 Changed name to affinealign() (the routine align() is provided as a backwards compatible wrapper). Added penext parameter. Now supports affine gap penalties with separate opening and extension penalties. The code now maintains the path as it goes.
  • 27.02.07 Exactly as affinealign() but upcases characters before comparison
  • 07.07.14 Use bl prefix for functions By: CTP
      NOTE AND CHANGES SHOULD BE PROPAGATED TO affinealign()    ******

Definition at line 583 of file align.c.

int blAlign ( char *  seq1,
int  length1,
char *  seq2,
int  length2,
BOOL  verbose,
BOOL  identity,
int  penalty,
char *  align1,
char *  align2,
int *  align_len 
)
Parameters
[in]*seq1First sequence
[in]length1First sequence length
[in]*seq2Second sequence
[in]length2Second sequence length
[in]verboseDisplay N&W matrix
[in]identityUse identity matrix
[in]penaltyGap insertion penalty value
[out]*align1Sequence 1 aligned
[out]*align2Sequence 2 aligned
[out]*align_lenAlignment length
Returns
Alignment score (0 on error)

Perform simple N&W alignment of seq1 and seq2. No window is used, so will be slow for long sequences.

A single gap penalty is used, so gap extension incurrs no further penalty.

Note that you must allocate sufficient memory for the aligned sequences. The easy way to do this is to ensure that align1 and align2 are of length (length1+length2).

  • 06.03.00 Implemented as a wrapper to affinealign() which is the old align() routine, plus support for affine gap penalties, plus new traceback code based on storing the path as we go
  • 07.07.14 Use bl prefix for functions By: CTP

Definition at line 214 of file align.c.

int blCalcMDMScore ( char  resa,
char  resb 
)
Parameters
[in]resaFirst residue
[in]resbSecond residue
Returns
score

Calculate score from static globally stored mutation data matrix

If both residues are set as '\0' it will simply silence all warnings

  • 07.10.92 Adapted from NIMR-written original
  • 24.11.94 Only gives 10 warnings
  • 28.02.95 Modified to use sMDMSize
  • 24.08.95 If a residue was not found was doing an out-of-bounds array reference causing a potential core dump
  • 11.07.96 Name changed from calcscore() and now non-static
  • 07.07.14 Use bl prefix for functions By: CTP
  • 04.01.16 Added special call with both residues set to '\0' to silence warnings. Also warnings now go to stderr

Definition at line 1220 of file align.c.

int blCalcMDMScoreUC ( char  resa,
char  resb 
)
Parameters
[in]resaFirst residue
[in]resbSecond residue
Returns
score

Calculate score from static globally stored mutation data matrix

  • 07.10.92 Adapted from NIMR-written original
  • 24.11.94 Only gives 10 warnings
  • 28.02.95 Modified to use sMDMSize
  • 24.08.95 If a residue was not found was doing an out-of-bounds array reference causing a potential core dump
  • 11.07.96 Name changed from calcscore() and now non-static
  • 27.02.07 As CalcMDMScore() but upcases characters before comparison
  • 07.07.14 Use bl prefix for functions By: CTP
  • 04.01.16 Added special call with both residues set to '\0' to silence warnings. Also warnings now go to stderr

Definition at line 1293 of file align.c.

BOOL blReadMDM ( char *  mdmfile)
Parameters
[in]*mdmfileMutation data matrix filename
Returns
Success?

Read mutation data matrix into static global arrays. The matrix may have comments at the start introduced with a ! in the first column. The matrix must be complete (i.e. a triangular matrix will not work). A line describing the residue types must appear, and may be placed before or after the matrix itself

  • 07.10.92 Original
  • 18.03.94 getc() -> fgetc()
  • 24.11.94 Automatically looks in DATAENV if not found in current directory
  • 28.02.95 Modified to read any size MDM and allow comments Also allows the list of aa types before or after the actual matrix
  • 26.07.95 Removed unused variables
  • 06.02.03 Fixed for new version of GetWord()
  • 07.04.09 Completely re-written to allow it to read BLAST style matrix files as well as the ones used previously Allow comments introduced with # as well as ! Uses MAXWORD rather than hardcoded 16
  • 07.07.14 Use bl prefix for functions By: CTP

Definition at line 871 of file align.c.

void blSetMDMScoreWeight ( char  resa,
char  resb,
REAL  weight 
)
Parameters
[in]resaFirst residue
[in]resbSecond residue
[in]weightWeight to apply

Apply a weight to a particular amino acid substitution

  • 26.08.14 Original By: ACRM

Definition at line 1408 of file align.c.

int blZeroMDM ( void  )
Returns
Maximum value in modified matrix

Modifies all values in the MDM such that the minimum value is 0

  • 17.09.96 Original
  • 07.07.14 Use bl prefix for functions By: CTP

Definition at line 1358 of file align.c.