Bioplib
Protein Structure C Library
 All Data Structures Files Functions Variables Typedefs Macros Pages
Data Structures | Macros | Functions
NumericAlign.c File Reference

Perform Needleman & Wunsch sequence alignment on two sequences encoded as numeric symbols. More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "SysDefs.h"
#include "macros.h"
#include "array.h"
#include "general.h"
#include "seq.h"

Go to the source code of this file.

Data Structures

struct  XY
 

Macros

#define DATAENV   "DATADIR" /* Environment variable or assign */
 
#define MAXBUFF   2048
 

Functions

BOOL blNumericReadMDM (char *mdmfile)
 
int blNumericCalcMDMScore (int resa, int resb)
 
int blNumericAffineAlign (int *seq1, int length1, int *seq2, int length2, BOOL verbose, BOOL identity, int penalty, int penext, int *align1, int *align2, int *align_len)
 

Detailed Description

Perform Needleman & Wunsch sequence alignment on two sequences encoded as numeric symbols.

Version
V1.3
Date
07.07.14
Author
Dr. Andrew C. R. Martin
Institute of Structural & Molecular Biology, University College London, Gower Street, London. WC1E 6BT.
andre.nosp@m.w@bi.nosp@m.oinf..nosp@m.org..nosp@m.uk andre.nosp@m.w.ma.nosp@m.rtin@.nosp@m.ucl..nosp@m.ac.uk

This code is NOT IN THE PUBLIC DOMAIN, but it may be copied according to the conditions laid out in the accompanying file COPYING.DOC.

The code may be modified as required, but any modifications must be documented so that the person responsible can be identified.

The code may not be sold commercially or included as part of a commercial product except as described in the file COPYING.DOC.

Note, the code herein is very heavily based on code written by Dr. Andrew C.R. Martin while self-employed. Some modifications were made to that original code while employed at University College London. This version which handles sequences encoded as arrays of numbers rather than as character arrays was modified from the original version(s) while employed at Reading University.

Description:

A simple Needleman & Wunsch Dynamic Programming alignment of 2 sequences encoded as numeric symbols. A window is not used so the routine may be a bit slow on long sequences.

Usage:

First call NumericReadMDM() to read the mutation data matrix, then call NumericAffineAlign() to align the sequences.

Revision History:

Definition in file NumericAlign.c.

Macro Definition Documentation

#define DATAENV   "DATADIR" /* Environment variable or assign */

Definition at line 110 of file NumericAlign.c.

#define MAXBUFF   2048

Definition at line 112 of file NumericAlign.c.

Function Documentation

int blNumericAffineAlign ( int *  seq1,
int  length1,
int *  seq2,
int  length2,
BOOL  verbose,
BOOL  identity,
int  penalty,
int  penext,
int *  align1,
int *  align2,
int *  align_len 
)
Parameters
[in]*seq1First sequence of tokens
[in]length1First sequence length
[in]*seq2Second sequence of tokens
[in]length2Second sequence length
[in]verboseDisplay N&W matrix
[in]identityUse identity matrix
[in]penaltyGap insertion penalty value
[in]penextExtension penalty
[out]*align1Sequence 1 aligned
[out]*align2Sequence 2 aligned
[out]*align_lenAlignment length
Returns
Alignment score (0 on error)

Perform simple N&W alignment of seq1 and seq2. No window is used, so will be slow for long sequences.

The sequences come as integer arrays containing numeric tokens

Note that you must allocate sufficient memory for the aligned sequences. The easy way to do this is to ensure that align1 and align2 are of length (length1+length2).

Identical to align.c/affinealign(), but uses integer arrays

  • 08.03.00 Original based on align.c/affinealign() 06.03.00 By: ACRM
  • 07.07.14 Use bl prefix for functions By: CTP

Definition at line 412 of file NumericAlign.c.

int blNumericCalcMDMScore ( int  resa,
int  resb 
)
Parameters
[in]resaFirst token
[in]resbSecond token
Returns
score

Calculate score from static globally stored mutation data matrix

Identical to align.c/CalcMDMScore(), but uses a different static score array and takes integer parameters. These are used as direct lookups into the score array rather than being searched.

  • 08.03.00 Original based on align.c/CalcMDMScore() 11.07.96 By: ACRM
  • 07.07.14 Use bl prefix for functions By: CTP

Definition at line 342 of file NumericAlign.c.

BOOL blNumericReadMDM ( char *  mdmfile)
Parameters
[in]*mdmfileMutation data matrix filename
Returns
Success?

Read mutation data matrix into static global arrays. The matrix may have comments at the start introduced with a ! in the first column. The matrix must be complete (i.e. a triangular matrix will not work). A line describing the residue types must appear, and may be placed before or after the matrix itself

Identical to align.c/ReadMDM() but reads into a different static 2D array and doesn't read a symbol identifier line from the file as the symbols are numeric and always start from 1 (0 is used as the insert character)

  • 08.03.00 Original based on align.c/ReadMDM() 26.07.95 By: ACRM
  • 06.02.03 Fixed for new version of GetWord()
  • 07.07.14 Use bl prefix for functions By: CTP

Definition at line 258 of file NumericAlign.c.