The ZONE command is used to specify zones in the two structures
which are considered equivalent. The complete syntax for the command
is:
ZONE CLEAR|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])where
X... is an amino acid sequence, n is a number of
residues, m is the occurrence number, j and k are
residue specifications of the form [chain][.]resnum[insert]. Items
in square brackets are optional and alternatives are marked by a
| and grouped in parentheses.
ZONE commands are cumulative. Thus each zone you specify is
added to those currently active. To clear all zones (i.e. fit all
residues), the ZONE CLEAR or ZONE * command may be
given. To clear a single zone, the DELZONE command can be used
(see the end of this section).
When a new zone is added, a warning message is displayed if the new zone
overlaps an existing zone. Overlapping zones will be flagged with *
when using the STATUS command.
Although it appears complex, the syntax is actually very simple and consists of two identical sections separated by a colon (:). The left half is applied to the reference structure and the right half to the mobile structure. In its simplest form, the right hand half of the expression is absent and the specification is applied to both reference and mobile structures. For example:
ZONE 24-34will set the zone to include residues 24-34 in both structures. If you wanted to fit 24-34 in the reference structure with 25-35 in the mobile structure, this simply becomes:
ZONE 24-34:25-35Single residues can be specified using the same syntax:
ZONE 44-44:55-55
You may also specify chain names and insertion codes. The chain name is placed before the residue number and the insertion code afterwards. For example:
ZONE L25A-L30fits residues 25A-30 in the L chain of both structures. Optionally, the chain name may be separated from the residue number using a full stop. For example:
ZONE L.25A-L.30Using the full stop also makes the statement case-sensitive. In practice, the full stop separator is used with numeric chain names to separate the chain name from the residue number and with lowercase chain names.
ZONE 1.25-1.30 ZONE b.1-b.60:A.1-A.60
Simple wildcards may also be used. For example
ZONE H*:B*fits the reference H chain with the mobile B chain,
ZONE -10:50-59fits from the first residue to residue 10 in the reference structure with 50-59 in the mobile structure.
ZONE *:1-100fits all residues in the reference structure with 1-100 in the mobile structure.
If the structure file contains negatively numbered residues and you are using residue numbering, you can escape the minus sign in the residue number using a backslash:
ZONE \-4-10:\-1-13will fit residues
Alternatively, you may specify the zones to be fitted by giving a sequence fragment. Together with that fragment, you may specify the number of residues to consider starting at that point. If the fragment occurs more than once in the sequence you may specify which occurrence you wish to consider. For example:
ZONE CAR:VNSfits the first occurrence of CAR in the reference set with first occurrence of VNS in the mobile set;
ZONE CAR,10:VNS,10fits 10 residues starting at the first occurrence of CAR in the reference set with 10 residues from the first occurrence of VNS in the mobile set;
ZONE CAR,5/2fits 5 residues from second occurrence of CAR in both structures;
ZONE 24-34:EIR,llfits 24-34 in the reference set with 11 residues starting at the first occurrence of EIR in the mobile set.
By default, ProFit works in `Residue Number' mode, i.e. the numbers used in zone commands are the numbers seen in the PDB file. The alternative mode is `Sequential' mode where residues are numbered sequentially throughout the structure (including throughout multiple chains). Any chain names appearing in zone specifications will be ignored in Sequential mode. To switch mode, you use the NUMBER SEQUENTIAL or NUMBER RESIDUE commands.
The DELZONE command specifies zones to be deleted from the user-defined
list of fit zones. DELZONE uses the same syntax as the ZONE
command. The command matches the specified zone with a zone in the user-defined
list of fitting zones and deletes the matching zone from the list. Entering
either DELZONE ALL or DELZONE * will delete all user-defined
zones.
mdm78.mat file either in the current directory or in a
directory pointed to by the environment variable DATADIR. See
the INSTALL file for details.
Another way of specifying zones is to let the program do it. ProFit
allows you to perform a simple Needleman and
Wunsch sequence alignment and to apply zones automatically derived
from that sequence alignment. This is done by issuing the ALIGN
command. The sequence alignment is displayed, any currently active
fitting zones are cleared and replaced by zones derived from the
alignment. Additional zones may also be specified in the usual way.
As of Version 3.0, ProFit offers a choice of three alignment options:
ALIGN WHOLE command gives a whole sequence alignment.
The whole sequence (regardless of chain ID) is aligned. If the fitting zones
assigned in this manner extend over more than one chain the zones are split
into smaller zones at the breaks between chains.
This may be useful if a sequence has been split into fragments.
ALIGN command then ProFit
will perform an alignment over the defined region to assign fitting zones.
(See Section 8 for the syntax for defining zones.)
It is also possible to append new zones onto the end of the zone list (rather
than overwriting the current zone list) by adding APPEND after the zone
definition.
For example one could use following commands:
ALIGN A*:B* ALIGN B*:A* APPENDto align chain A with chain B and then B with A. This is useful when chains appear in different orders in the PDB files.
When doing multiple fitting, it is not possible use the colon notation
to define regions on both the reference and mobile structures. This is
the same restriction as the ZONE command
(see Section 9.1).
Clearly, it will normally be necessary to use the ATOMS command
to specify that only backbone or C
atoms are included in the
fitting. The TRIMZONES command can also be used when doing multiple
structure fitting to ensure that the fitting zones are identical for all
mobile structures. (See Section 9.1)
The GAPPEN command allows you to specify an integer gap penalty
and gap extension penalty for the sequence alignment performed by the
ALIGN command. The default values for the gap penalty and gap
extension penalty are 10 and 2 respectively.
If you have an alignment performed outside ProFit you may use this to
specify the equivalent zones. Any previously defined fitting zones are
automatically cleared first. As of ProFit V3.0, the READALIGN command
can be used with structures having more than one chain.
The alignment should be a file in PIR format using - characters to align the sequences. The two sequences are represented by separate entries, i.e. each must have a header of the form:
>P1;xxxxxx title text .......
When reading an alignment file for aligning a reference structure with a
single mobile structure, the first sequence will be assumed to be that of
the reference structure and the second is that of the mobile structure. Any
other sequences in the file are ignored. Chain Breaks in a sequence are
indicated with a *.
>P1;REFSEQ Reference Sequence - first.pdb WILLIAM*H-ARTNELL-* >P1;M_0001 Mobile Sequence - second.pdb --PATRI-K*TR--GHTN*
The READALIGNMENT command is also used to read in the PIR files
containing a multiple sequence alignment.
When performing a multiple structure fit, the first sequence
must appear twice in the sequence alignment file. This is
because it is used as both the initial reference and first mobile set:
>P1;REFSEQ Reference Sequence - first.pdb ----WILLIAM*H-ARTNELL-* >P1;M_0001 Mobile Sequence - first.pdb ----WILLIAM*H-ARTNELL-* >P1;M_0002 Mobile Sequence - second.pdb ------PATRI-K*TR--GHTN* >P1;M_0003 Mobile Sequence - third.pdb PERTWEE---------------*
Note that a bug in using the READALIGNMENT with multiple
structure fitting was fixed in V2.3. (The bug caused the program
to crash if a deletion appeared in the same place in two or
more of the sequences.)
When obtaining fit zones from a sequence alignment, either from
ALIGN or from READALIGNMENT, it can be useful to limit
the zones of residues used. Normally all aligned residue pairs will be
used.
For example, if the alignment were:
1 2 3
123456789012345678901234567890123
ASAHSTGEHNM--PLELLGHISLAM---NPRTY
---HSTADHNLRTPLEVLG--SLAMEDRQPRTY
the zones would normally be taken from the following positions
in the alignment: 4-11, 14-19, 22-25, 29-33
By using the command:
LIMIT 20 28
only the zone from 22-25 would be included.
This is particularly useful in conjunction with the ITERATE command (Section 8.4) and when fitting multiple structures (Section 9).
The LIMIT OFF command restores the default behaviour of
deriving the zones from the whole alignment.
The ITERATE command switches on the iterative updating of
fitted zones during subsequent FIT commands. The ITERATE
command may be followed by an optional parameter to specify the cutoff
used to include or exclude pairs from the zones. (ITERATE OFF
is used to switch it off again.)
Note that this immediately does an ATOMS CA since iteration of
zones is only performed on C
atoms. The program gives an
informational message to this effect. See notes below if you want to
calculate an RMSd over other atoms.
After the initial fit on the specified zones, the zones are updated
such that residue pairs with C
atoms within a specified cutoff
(default 3.0Å) are included and those more distant are excluded. The
optimum set of equivalences is obtained using a dynamic programming
method.
After updating the zones, the structures are refitted and the
procedure iterates to convergence of
Å, (typically 3 or 4
cycles). The RMSd on C
atoms is shown after each cycle unless
the QUIET command is given before running ITERATE.
You may specify a minimal initial zone of say 3 amino acids on which
to fit first. The zone iteration will expand the zones until as many
residues as possible can be equivalenced. Alternatively, this option
is particularly useful in conjunction with the ALIGN
command. Using ALIGN followed by ITERATE gives a
particularly convenient method of fitting two arbitrary structures.
As stated above, the ITERATE command implies
ATOMS CA. Having fitted on C
atoms, you can of course
display the RMSd over other atom sets in the usual way using the
RATOMS command (e.g. RATOMS N,CA,C,O will display the
backbone RMSd).
Should you wish to refit on another atom set using the iterated zones,
simply use ITERATE OFF to switch off iteration, select the atom
set required using the ATOMS command and use FIT to
refit the structures in the usual way. For example, to fit on backbone
atoms:
ITERATE OFF ATOMS N,CA,C,O FIT
Note that this use of the B-value column is not compatible with the commands described in Section 12.
It is possible to define zones by flagging residues in the temperature
factor column of the PDB file using the BZONE command. Zones are marked
using a positive whole numbers while zeros are ignored. Multiple zones can be
marked using additional numbers.
So, residues with the B-factor set to 1 will be fitted with one another,
residues with the B-factor set to 2 will be fitted with one another, etc.
Assignment of zones is carried out in two ways:
If only the reference structure is marked then the same set of residue numbers will be added as a fitting zone in both the reference and mobile structure.
If both the reference and the mobile structure are marked then fitting zones are assigned by scanning through and setting zones for corresponding continuous stretches of flagged residues in either the reference or mobile structures.
The default method for fitting is to centre the fit around the centre of
geometry of the fit atoms. Alternatively, fitting can be centred around
the centre of geometry of a residue
specified by the SETCENTRE (or SETCENTER) command.
SETCENTRE CLEAR|(*|i[:j])where i and j are residue specifications of the form [chain][.]resnum[insert]. Items in square brackets are optional and alternatives are marked by a
| and grouped in parentheses.
The command:
SETCENTRE 24:35will centre the fit around residue 24 of the reference structure and residue 25 of the mobile structure. The mobile residue number can be omitted. For example:
SETCENTRE 33will centre the fit around residue 33 of the reference structure and residue 33 of the mobile structure.
Entering SETCENTRE CLEAR or SETCENTRE * will clear the centre
residue.
The DISTCUTOFF command specifies a distance cutoff for ignoring atom
pairs outside a specified distance when calculating RMSd.
DISTCUTOFF [cutoff|ON|OFF]
The DISTCUTOFF command specifies a distance cutoff for ignoring atom
pairs outside a specified distance when calculating RMSd. Entering
DISTCUTOFF ON or DISTCUTOFF OFF will turn the distance cutoff on
or off. Entering DISTCUTOFF 2.5 will set the value of the distance
cutoff to 2.5 Angstroms and turn the distance cutoff on. A warning is displayed
if the distance cutoff is set to zero and turned on. Note that the cutoff is
only applied to the final calculation of RMSD and not to the fitting.