next up previous
Next: Multiple Structure Fitting Up: ProFit Version 3.1 Previous: Specifying Atom Subsets

Subsections

Specifying Zones

The ZONE command is used to specify zones in the two structures which are considered equivalent. The complete syntax for the command is:

   ZONE CLEAR|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])

where X... is an amino acid sequence, n is a number of residues, m is the occurrence number, j and k are residue specifications of the form [chain][.]resnum[insert]. Items in square brackets are optional and alternatives are marked by a | and grouped in parentheses.

ZONE commands are cumulative. Thus each zone you specify is added to those currently active. To clear all zones (i.e. fit all residues), the ZONE CLEAR or ZONE * command may be given. To clear a single zone, the DELZONE command can be used (see the end of this section).

When a new zone is added, a warning message is displayed if the new zone overlaps an existing zone. Overlapping zones will be flagged with * when using the STATUS command.

Although it appears complex, the syntax is actually very simple and consists of two identical sections separated by a colon (:). The left half is applied to the reference structure and the right half to the mobile structure. In its simplest form, the right hand half of the expression is absent and the specification is applied to both reference and mobile structures. For example:

   ZONE 24-34

will set the zone to include residues 24-34 in both structures. If you wanted to fit 24-34 in the reference structure with 25-35 in the mobile structure, this simply becomes:

   ZONE 24-34:25-35

Single residues can be specified using the same syntax:

   ZONE 44-44:55-55

You may also specify chain names and insertion codes. The chain name is placed before the residue number and the insertion code afterwards. For example:

   ZONE L25A-L30

fits residues 25A-30 in the L chain of both structures. Optionally, the chain name may be separated from the residue number using a full stop. For example:

   ZONE L.25A-L.30

Using the full stop also makes the statement case-sensitive. In practice, the full stop separator is used with numeric chain names to separate the chain name from the residue number and with lowercase chain names.

   ZONE 1.25-1.30
   ZONE b.1-b.60:A.1-A.60

Simple wildcards may also be used. For example

   ZONE H*:B*

fits the reference H chain with the mobile B chain,

   ZONE -10:50-59

fits from the first residue to residue 10 in the reference structure with 50-59 in the mobile structure.

   ZONE *:1-100

fits all residues in the reference structure with 1-100 in the mobile structure.

If the structure file contains negatively numbered residues and you are using residue numbering, you can escape the minus sign in the residue number using a backslash:

   ZONE \-4-10:\-1-13

will fit residues $-4$ to 10 in one structure with $-1$ to 13 in the other.

Alternatively, you may specify the zones to be fitted by giving a sequence fragment. Together with that fragment, you may specify the number of residues to consider starting at that point. If the fragment occurs more than once in the sequence you may specify which occurrence you wish to consider. For example:

   ZONE CAR:VNS

fits the first occurrence of CAR in the reference set with first occurrence of VNS in the mobile set;

   ZONE CAR,10:VNS,10

fits 10 residues starting at the first occurrence of CAR in the reference set with 10 residues from the first occurrence of VNS in the mobile set;

   ZONE CAR,5/2

fits 5 residues from second occurrence of CAR in both structures;

   ZONE 24-34:EIR,ll

fits 24-34 in the reference set with 11 residues starting at the first occurrence of EIR in the mobile set.

By default, ProFit works in `Residue Number' mode, i.e. the numbers used in zone commands are the numbers seen in the PDB file. The alternative mode is `Sequential' mode where residues are numbered sequentially throughout the structure (including throughout multiple chains). Any chain names appearing in zone specifications will be ignored in Sequential mode. To switch mode, you use the NUMBER SEQUENTIAL or NUMBER RESIDUE commands.

The DELZONE command specifies zones to be deleted from the user-defined list of fit zones. DELZONE uses the same syntax as the ZONE command. The command matches the specified zone with a zone in the user-defined list of fitting zones and deletes the matching zone from the list. Entering either DELZONE ALL or DELZONE * will delete all user-defined zones.

Sequence Alignment

Note: For sequence alignment to work, you must have the mdm78.mat file either in the current directory or in a directory pointed to by the environment variable DATADIR. This is the Dayhoff amino acid similarity scoring matrix. See Section 3 or the INSTALL file for details.

Another way of specifying zones is to let the program do it. ProFit allows you to perform a simple Needleman and Wunsch sequence alignment and to apply zones automatically derived from that sequence alignment. This is done by issuing the ALIGN command. The sequence alignment is displayed, any currently active fitting zones are cleared and replaced by zones derived from the alignment. Additional zones may also be specified in the usual way.

As of Version 3.0, ProFit offers a choice of three alignment options:

  1. The default alignment option is a chain-by-chain alignment where the first chain in the mobile is aligned with first chain in the reference, the second chain in the mobile is aligned with the second chain in the reference, and so on. If the number of chains does not match then a warning is issued.

  2. The ALIGN WHOLE command gives a whole sequence alignment. The whole sequence (regardless of chain ID) is aligned. If the fitting zones assigned in this manner extend over more than one chain the zones are split into smaller zones at the breaks between chains. This may be useful if a sequence has been split into fragments.

  3. If a zone definition is supplied to the ALIGN command then ProFit will perform an alignment over the defined region to assign fitting zones. (See Section 9 for the syntax for defining zones.)

    It is also possible to append new zones onto the end of the zone list (rather than overwriting the current zone list) by adding APPEND after the zone definition.

    For example one could use following commands:

       ALIGN A*:B*
       ALIGN B*:A* APPEND
    

    to align chain A with chain B and then B with A. This is useful when chains appear in different orders in the PDB files.

    When doing multiple fitting, it is not possible use the colon notation to define regions on both the reference and mobile structures. This is the same restriction as the ZONE command (see Section 10.1).

Clearly, it will normally be necessary to use the ATOMS command to specify that only backbone or C$\alpha$ atoms are included in the fitting. The TRIMZONES command can also be used when doing multiple structure fitting to ensure that the fitting zones are identical for all mobile structures. (See Section 10.1)

The GAPPEN command allows you to specify an integer gap penalty and gap extension penalty for the sequence alignment performed by the ALIGN command. The default values for the gap penalty and gap extension penalty are 10 and 2 respectively.

Reading an Alignment

If you have an alignment performed outside ProFit you may use this to specify the equivalent zones. Any previously defined fitting zones are automatically cleared first. As of ProFit V3.0, the READALIGN command can be used with structures having more than one chain.

The alignment should be a file in PIR format using - characters to align the sequences. The two sequences are represented by separate entries, i.e. each must have a header of the form:

   >P1;xxxxxx
   title text .......

When reading an alignment file for aligning a reference structure with a single mobile structure, the first sequence will be assumed to be that of the reference structure and the second is that of the mobile structure. Any other sequences in the file are ignored. Chain Breaks in a sequence are indicated with a *.

   >P1;REFSEQ
   Reference Sequence - first.pdb
   WILLIAM*H-ARTNELL-*

   >P1;M_0001
   Mobile Sequence - second.pdb
   --PATRI-K*TR--GHTN*

The READALIGNMENT command is also used to read in the PIR files containing a multiple sequence alignment. When performing a multiple structure fit, the first sequence must appear twice in the sequence alignment file. This is because it is used as both the initial reference and first mobile set:

   >P1;REFSEQ
   Reference Sequence - first.pdb
   ----WILLIAM*H-ARTNELL-*

   >P1;M_0001
   Mobile Sequence - first.pdb
   ----WILLIAM*H-ARTNELL-*

   >P1;M_0002
   Mobile Sequence - second.pdb
   ------PATRI-K*TR--GHTN*

   >P1;M_0003
   Mobile Sequence - third.pdb
   PERTWEE---------------*

Note that a bug in using the READALIGNMENT with multiple structure fitting was fixed in V2.3. (The bug caused the program to crash if a deletion appeared in the same place in two or more of the sequences.)

Limiting Zones Read From an Alignment

When obtaining fit zones from a sequence alignment, either from ALIGN or from READALIGNMENT, it can be useful to limit the zones of residues used. Normally all aligned residue pairs will be used.

For example, if the alignment were:

                       1         2         3
              123456789012345678901234567890123
              ASAHSTGEHNM--PLELLGHISLAM---NPRTY
              ---HSTADHNLRTPLEVLG--SLAMEDRQPRTY

the zones would normally be taken from the following positions in the alignment: 4-11, 14-19, 22-25, 29-33

By using the command:

      LIMIT 20 28

only the zone from 22-25 would be included.

This is particularly useful in conjunction with the ITERATE command (Section 9.4) and when fitting multiple structures (Section 10).

The LIMIT OFF command restores the default behaviour of deriving the zones from the whole alignment.

Iterative Updating of the Fitting Zones

The ITERATE command switches on the iterative updating of fitted zones during subsequent FIT commands. The ITERATE command may be followed by an optional parameter to specify the cutoff used to include or exclude pairs from the zones. (ITERATE OFF is used to switch it off again.)

Note that this immediately does an ATOMS CA since iteration of zones is only performed on C$\alpha$ atoms. The program gives an informational message to this effect. See notes below if you want to calculate an RMSd over other atoms.

After the initial fit on the specified zones, the zones are updated such that residue pairs with C$\alpha$ atoms within a specified cutoff (default 3.0Å) are included and those more distant are excluded. The optimum set of equivalences is obtained using a dynamic programming method.

After updating the zones, the structures are refitted and the procedure iterates to convergence of $<0.01$Å, (typically 3 or 4 cycles). The RMSd on C$\alpha$ atoms is shown after each cycle unless the QUIET command is given before running ITERATE.

You may specify a minimal initial zone of say 3 amino acids on which to fit first. The zone iteration will expand the zones until as many residues as possible can be equivalenced. Alternatively, this option is particularly useful in conjunction with the ALIGN command. Using ALIGN followed by ITERATE gives a particularly convenient method of fitting two arbitrary structures.

As stated above, the ITERATE command implies ATOMS CA. Having fitted on C$\alpha$ atoms, you can of course display the RMSd over other atom sets in the usual way using the RATOMS command (e.g. RATOMS N,CA,C,O will display the backbone RMSd).

Should you wish to refit on another atom set using the iterated zones, simply use ITERATE OFF to switch off iteration, select the atom set required using the ATOMS command and use FIT to refit the structures in the usual way. For example, to fit on backbone atoms:

   ITERATE OFF
   ATOMS N,CA,C,O
   FIT

Fitting Zones based on the Temperature Factor Column.

Note that this use of the B-value column is not compatible with the commands described in Section 13.

It is possible to define zones by flagging residues in the temperature factor column of the PDB file using the BZONE command. Zones are marked using a positive whole numbers while zeros are ignored. Multiple zones can be marked using additional numbers. So, residues with the B-factor set to 1 will be fitted with one another, residues with the B-factor set to 2 will be fitted with one another, etc.

Assignment of zones is carried out in two ways:

If only the reference structure is marked then the same set of residue numbers will be added as a fitting zone in both the reference and mobile structure.

If both the reference and the mobile structure are marked then fitting zones are assigned by scanning through and setting zones for corresponding continuous stretches of flagged residues in either the reference or mobile structures.

Centre of Fitting

The default method for fitting is to centre the fit around the centre of geometry of the fit atoms. Alternatively, fitting can be centred around the centre of geometry of a residue specified by the SETCENTRE (or SETCENTER) command.

   SETCENTRE CLEAR|(*|i[:j])

where i and j are residue specifications of the form [chain][.]resnum[insert]. Items in square brackets are optional and alternatives are marked by a | and grouped in parentheses.

The command:

   SETCENTRE 24:35

will centre the fit around residue 24 of the reference structure and residue 25 of the mobile structure. The mobile residue number can be omitted. For example:

   SETCENTRE 33

will centre the fit around residue 33 of the reference structure and residue 33 of the mobile structure.

Entering SETCENTRE CLEAR or SETCENTRE * will clear the centre residue.

Distance Cutoff for RMSd Calculations

The DISTCUTOFF command specifies a distance cutoff for ignoring atom pairs outside a specified distance when calculating RMSd.

   DISTCUTOFF [cutoff|ON|OFF]

The DISTCUTOFF command specifies a distance cutoff for ignoring atom pairs outside a specified distance when calculating RMSd. Entering DISTCUTOFF ON or DISTCUTOFF OFF will turn the distance cutoff on or off. Entering DISTCUTOFF 2.5 will set the value of the distance cutoff to 2.5 Angstroms and turn the distance cutoff on. A warning is displayed if the distance cutoff is set to zero and turned on. Note that the cutoff is only applied to the final calculation of RMSD and not to the fitting.


next up previous
Next: Multiple Structure Fitting Up: ProFit Version 3.1 Previous: Specifying Atom Subsets
Andrew Martin 2010-09-28