Bioplib
Protein Structure C Library
 All Data Structures Files Functions Variables Typedefs Macros Pages
Functions
ReadPIR.c File Reference

Read a PIR sequence file. More...

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "SysDefs.h"
#include "macros.h"
#include "seq.h"

Go to the source code of this file.

Functions

int blReadPIR (FILE *fp, BOOL DoInsert, char **seqs, int maxchain, SEQINFO *seqinfo, BOOL *punct, BOOL *error)
 

Detailed Description

Read a PIR sequence file.

Version
V2.8
Date
07.07.14
Author
Dr. Andrew C. R. Martin
Institute of Structural & Molecular Biology, University College London, Gower Street, London. WC1E 6BT.
andre.nosp@m.w@bi.nosp@m.oinf..nosp@m.org..nosp@m.uk andre.nosp@m.w.ma.nosp@m.rtin@.nosp@m.ucl..nosp@m.ac.uk

This code is NOT IN THE PUBLIC DOMAIN, but it may be copied according to the conditions laid out in the accompanying file COPYING.DOC.

The code may be modified as required, but any modifications must be documented so that the person responsible can be identified.

The code may not be sold commercially or included as part of a commercial product except as described in the file COPYING.DOC.

Description:

Usage:

int blReadPIR(FILE *fp, BOOL DoInsert, char **seqs, int maxchain,
SEQINFO *seqinfo, BOOL *punct, BOOL *error)

This version attempts to read any PIR file following the PIR specifications. It also accepts a few non-standard features: lower case sequence, no star at end of last chain, dashes in the sequence to indicate insertions.

See also:

Revision History:

Definition in file ReadPIR.c.

Function Documentation

int blReadPIR ( FILE *  fp,
BOOL  DoInsert,
char **  seqs,
int  maxchain,
SEQINFO seqinfo,
BOOL punct,
BOOL error 
)
Parameters
[in]*fpFile pointer
[in]DoInsertTRUE Read - characters into the sequence FALSE Skip - characters
[in]maxchainMax number of chains to read. This is the dimension of the seqs array. N.B. THIS SHOULD BE AT LEAST 1 MORE THAN THE EXPECTED MAXIMUM NUMBER OF SEQUENCES
[out]**seqsArray of character pointers which will be filled in with sequence information. Memory will be allocated for any sequence length.
[out]*seqinfoThis structure will be filled in with extra information about the sequence. Header & title information and details of any punctuation.
[out]*punctTRUE if any punctuation found.
[out]*errorTRUE if an error occured (e.g. memory allocation)
Returns
Number of chains in this sequence. 0 if file ended, or no valid sequence entries found.

This is an all-singing, all-dancing PIR reader which should handle all legal PIR files and some (slightly) incorrect ones. The only requirements of the code are that the PIR file should have 2 title lines per entry, the first line starting with a > sign.

The routine will handle multiple sequence files. Successive calls will return information on the next entry. The routine will return 0 when there are no more entries.

Header line: Must start with >. Will handle files which don't have the proper P1; or F1; parts of the header as well as those which do.

Title line: Will read the name and source fields if correctly separated by a -, otherwise copies all information into the name.

Sequence: May contain allowed puctuation. This will set the punct flag and information on the types found will be placed in seqinfo. White space and line breaks are ignored. Each chain should end with a *, but the routine will accept the last chain of an entry with no . While the standard requires upper case text, this routine will handle lower case and convert it to upper case. While the routine does pretty well at last chains not terminated with a *, a last chain ending with a / not followed by a * but followed by a text line will be identified as incomplete rather than truncated. If the DoInsert flag is set, - signs in the sequence will be read as part of the sequence, otherwise they will be skipped. This is an addition to the PIR standard.

Text lines: Text lines after an entry (beginning with R;, C;, A;, N; or F;) are ignored.

  • 02.03.94 Original By: ACRM
  • 03.03.94 Added / and = handling, upcasing, strcpy()->strncpy(), header lines without semi-colon, title lines without -
  • 07.03.94 Added sequence insertion handling and DoInsert parameter.
  • 11.05.94 buffer is now 504 characters (V38.0 spec allows 500 chars) Removes leading spaces from entry code and terminates at first space (V39.0 spec allows comments after the code).
  • 28.02.95 Added check that buffer doesn't overflow. Check on nseq changed to >=
  • 06.02.96 Removes trailing spaces from comment line
  • 07.07.14 Use bl prefix for functions By: CTP

Definition at line 180 of file ReadPIR.c.