ZONE command is used to specify zones in the two structures
which are considered equivalent. The complete syntax for the command
X... is an amino acid sequence,
n is a number of
m is the occurrence number,
residue specifications of the form [chain][.]resnum[insert]. Items
in square brackets are optional and alternatives are marked by a
| and grouped in parentheses.
ZONE commands are cumulative. Thus each zone you specify is
added to those currently active. To clear all zones (i.e. fit all
ZONE CLEAR or
ZONE * command may be
given. To clear a single zone, the DELZONE command can be used
(see the end of this section).
When a new zone is added, a warning message is displayed if the new zone
overlaps an existing zone. Overlapping zones will be flagged with
when using the
Although it appears complex, the syntax is actually very simple and consists of two identical sections separated by a colon (:). The left half is applied to the reference structure and the right half to the mobile structure. In its simplest form, the right hand half of the expression is absent and the specification is applied to both reference and mobile structures. For example:
will set the zone to include residues 24-34 in both structures. If you wanted to fit 24-34 in the reference structure with 25-35 in the mobile structure, this simply becomes:
Single residues can be specified using the same syntax:
You may also specify chain names and insertion codes. The chain name is placed before the residue number and the insertion code afterwards. For example:
fits residues 25A-30 in the L chain of both structures. Optionally, the chain name may be separated from the residue number using a full stop. For example:
Using the full stop also makes the statement case-sensitive. In practice, the full stop separator is used with numeric chain names to separate the chain name from the residue number and with lowercase chain names.
ZONE 1.25-1.30 ZONE b.1-b.60:A.1-A.60
Simple wildcards may also be used. For example
fits the reference H chain with the mobile B chain,
fits from the first residue to residue 10 in the reference structure with 50-59 in the mobile structure.
fits all residues in the reference structure with 1-100 in the mobile structure.
If the structure file contains negatively numbered residues and you are using residue numbering, you can escape the minus sign in the residue number using a backslash:
will fit residues to 10 in one structure with to 13 in the other.
Alternatively, you may specify the zones to be fitted by giving a sequence fragment. Together with that fragment, you may specify the number of residues to consider starting at that point. If the fragment occurs more than once in the sequence you may specify which occurrence you wish to consider. For example:
fits the first occurrence of CAR in the reference set with first occurrence of VNS in the mobile set;
fits 10 residues starting at the first occurrence of CAR in the reference set with 10 residues from the first occurrence of VNS in the mobile set;
fits 5 residues from second occurrence of CAR in both structures;
fits 24-34 in the reference set with 11 residues starting at the first occurrence of EIR in the mobile set.
By default, ProFit works in `Residue Number' mode, i.e. the numbers used in zone commands are the numbers seen in the PDB file. The alternative mode is `Sequential' mode where residues are numbered sequentially throughout the structure (including throughout multiple chains). Any chain names appearing in zone specifications will be ignored in Sequential mode. To switch mode, you use the NUMBER SEQUENTIAL or NUMBER RESIDUE commands.
DELZONE command specifies zones to be deleted from the user-defined
list of fit zones.
DELZONE uses the same syntax as the
command. The command matches the specified zone with a zone in the user-defined
list of fitting zones and deletes the matching zone from the list. Entering
DELZONE ALL or
DELZONE * will delete all user-defined
Note: For sequence alignment to work, you must have
mdm78.mat file either in the current directory or in a
directory pointed to by the environment variable
is the Dayhoff amino acid similarity scoring matrix. See
Section 3 or the
INSTALL file for
Another way of specifying zones is to let the program do it. ProFit
allows you to perform a simple Needleman and
Wunsch sequence alignment and to apply zones automatically derived
from that sequence alignment. This is done by issuing the
command. The sequence alignment is displayed, any currently active
fitting zones are cleared and replaced by zones derived from the
alignment. Additional zones may also be specified in the usual way.
As of Version 3.0, ProFit offers a choice of three alignment options:
WHOLEcommand gives a whole sequence alignment. The whole sequence (regardless of chain ID) is aligned. If the fitting zones assigned in this manner extend over more than one chain the zones are split into smaller zones at the breaks between chains. This may be useful if a sequence has been split into fragments.
ALIGNcommand then ProFit will perform an alignment over the defined region to assign fitting zones. (See Section 9 for the syntax for defining zones.)
It is also possible to append new zones onto the end of the zone list (rather
than overwriting the current zone list) by adding
APPEND after the zone
For example one could use following commands:
ALIGN A*:B* ALIGN B*:A* APPEND
to align chain A with chain B and then B with A. This is useful when chains appear in different orders in the PDB files.
When doing multiple fitting, it is not possible use the colon notation
to define regions on both the reference and mobile structures. This is
the same restriction as the
(see Section 10.1).
Clearly, it will normally be necessary to use the
to specify that only backbone or C atoms are included in the
TRIMZONES command can also be used when doing multiple
structure fitting to ensure that the fitting zones are identical for all
mobile structures. (See Section 10.1)
GAPPEN command allows you to specify an integer gap penalty
and gap extension penalty for the sequence alignment performed by the
ALIGN command. The default values for the gap penalty and gap
extension penalty are 10 and 2 respectively.
If you have an alignment performed outside ProFit you may use this to
specify the equivalent zones. Any previously defined fitting zones are
automatically cleared first. As of ProFit V3.0, the
can be used with structures having more than one chain.
The alignment should be a file in PIR format using - characters to align the sequences. The two sequences are represented by separate entries, i.e. each must have a header of the form:
>P1;xxxxxx title text .......
When reading an alignment file for aligning a reference structure with a
single mobile structure, the first sequence will be assumed to be that of
the reference structure and the second is that of the mobile structure. Any
other sequences in the file are ignored. Chain Breaks in a sequence are
indicated with a
>P1;REFSEQ Reference Sequence - first.pdb WILLIAM*H-ARTNELL-* >P1;M_0001 Mobile Sequence - second.pdb --PATRI-K*TR--GHTN*
READALIGNMENT command is also used to read in the PIR files
containing a multiple sequence alignment.
When performing a multiple structure fit, the first sequence
must appear twice in the sequence alignment file. This is
because it is used as both the initial reference and first mobile set:
>P1;REFSEQ Reference Sequence - first.pdb ----WILLIAM*H-ARTNELL-* >P1;M_0001 Mobile Sequence - first.pdb ----WILLIAM*H-ARTNELL-* >P1;M_0002 Mobile Sequence - second.pdb ------PATRI-K*TR--GHTN* >P1;M_0003 Mobile Sequence - third.pdb PERTWEE---------------*
Note that a bug in using the
READALIGNMENT with multiple
structure fitting was fixed in V2.3. (The bug caused the program
to crash if a deletion appeared in the same place in two or
more of the sequences.)
When obtaining fit zones from a sequence alignment, either from
ALIGN or from
READALIGNMENT, it can be useful to limit
the zones of residues used. Normally all aligned residue pairs will be
For example, if the alignment were:
1 2 3 123456789012345678901234567890123 ASAHSTGEHNM--PLELLGHISLAM---NPRTY ---HSTADHNLRTPLEVLG--SLAMEDRQPRTY
the zones would normally be taken from the following positions in the alignment: 4-11, 14-19, 22-25, 29-33
By using the command:
LIMIT 20 28
only the zone from 22-25 would be included.
This is particularly useful in conjunction with the ITERATE command (Section 9.4) and when fitting multiple structures (Section 10).
LIMIT OFF command restores the default behaviour of
deriving the zones from the whole alignment.
ITERATE command switches on the iterative updating of
fitted zones during subsequent
FIT commands. The
command may be followed by an optional parameter to specify the cutoff
used to include or exclude pairs from the zones. (
is used to switch it off again.)
Note that this immediately does an
ATOMS CA since iteration of
zones is only performed on C atoms. The program gives an
informational message to this effect. See notes below if you want to
calculate an RMSd over other atoms.
After the initial fit on the specified zones, the zones are updated such that residue pairs with C atoms within a specified cutoff (default 3.0Å) are included and those more distant are excluded. The optimum set of equivalences is obtained using a dynamic programming method.
After updating the zones, the structures are refitted and the
procedure iterates to convergence of Å, (typically 3 or 4
cycles). The RMSd on C atoms is shown after each cycle unless
QUIET command is given before running
You may specify a minimal initial zone of say 3 amino acids on which
to fit first. The zone iteration will expand the zones until as many
residues as possible can be equivalenced. Alternatively, this option
is particularly useful in conjunction with the
ALIGN followed by
ITERATE gives a
particularly convenient method of fitting two arbitrary structures.
As stated above, the
ITERATE command implies
ATOMS CA. Having fitted on C atoms, you can of course
display the RMSd over other atom sets in the usual way using the
RATOMS command (e.g.
RATOMS N,CA,C,O will display the
Should you wish to refit on another atom set using the iterated zones,
ITERATE OFF to switch off iteration, select the atom
set required using the
ATOMS command and use
refit the structures in the usual way. For example, to fit on backbone
ITERATE OFF ATOMS N,CA,C,O FIT
Note that this use of the B-value column is not compatible with the commands described in Section 13.
It is possible to define zones by flagging residues in the temperature
factor column of the PDB file using the
BZONE command. Zones are marked
using a positive whole numbers while zeros are ignored. Multiple zones can be
marked using additional numbers.
So, residues with the B-factor set to 1 will be fitted with one another,
residues with the B-factor set to 2 will be fitted with one another, etc.
Assignment of zones is carried out in two ways:
If only the reference structure is marked then the same set of residue numbers will be added as a fitting zone in both the reference and mobile structure.
If both the reference and the mobile structure are marked then fitting zones are assigned by scanning through and setting zones for corresponding continuous stretches of flagged residues in either the reference or mobile structures.
The default method for fitting is to centre the fit around the centre of
geometry of the fit atoms. Alternatively, fitting can be centred around
the centre of geometry of a residue
specified by the
where i and j are residue specifications of the form [chain][.]resnum[insert].
Items in square brackets are optional and alternatives are marked by a
| and grouped in parentheses.
will centre the fit around residue 24 of the reference structure and residue 25 of the mobile structure. The mobile residue number can be omitted. For example:
will centre the fit around residue 33 of the reference structure and residue 33 of the mobile structure.
SETCENTRE CLEAR or
SETCENTRE * will clear the centre
DISTCUTOFF command specifies a distance cutoff for ignoring atom
pairs outside a specified distance when calculating RMSd.
DISTCUTOFF command specifies a distance cutoff for ignoring atom
pairs outside a specified distance when calculating RMSd. Entering
DISTCUTOFF ON or
DISTCUTOFF OFF will turn the distance cutoff on
or off. Entering
DISTCUTOFF 2.5 will set the value of the distance
cutoff to 2.5 Angstroms and turn the distance cutoff on. A warning is displayed
if the distance cutoff is set to zero and turned on. Note that the cutoff is
only applied to the final calculation of RMSD and not to the fitting.