next up previous
Next: Multiple Structure Fitting Up: ProFit Version 2.6 Previous: Specifying Atom Subsets

Subsections

Specifying Zones

The ZONE command is used to specify zones in the two structures which are considered equivalent. The complete syntax for the command is:

   ZONE CLEAR|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])
where X... is an amino acid sequence, n is a number of residues, m is the occurrence number, j and k are residue specifications of the form [chain][.]resnum[insert]. Items in square brackets are optional and alternatives are marked by a | and grouped in parentheses.

ZONE commands are cumulative. Thus each zone you specify is added to those currently active. To clear all zones (i.e. fit all residues), the ZONE CLEAR or ZONE * command may be given. To clear a single zone, the DELZONE command can be used.

When a new zone is added, a warning message is displayed if the new zone overlaps an existing zone. Overlapping zones will be flagged with * when using the STATUS command.

Although it appears complex, the syntax is actually very simple and consists of two identical sections separated by a colon (:). The left half is applied to the reference structure and the right half to the mobile structure. In its simplest form, the right hand half of the expression is absent and the specification is applied to both reference and mobile structures. For example:

   ZONE 24-34
will set the zone to include residues 24-34 in both structures. If you wanted to fit 24-34 in the reference structure with 25-35 in the mobile structure, this simply becomes:
   ZONE 24-34:25-35

You may also specify chain names and insertion codes. The chain name is placed before the residue number and the insertion code afterwards. For example:

   ZONE L25A-L30
fits residues 25A-30 in the L chain of both structures. Optionally, the chain name may be separated from the residue number using a full stop. For example:
   ZONE L.25A-L.30
Using the full stop also makes the statement case-sensitive. In practice, the full stop separator is used with numeric chain names to separate the chain name from the residue number and with lowercase chain names.
   ZONE 1.25-1.30
   ZONE b.1-b.60:A.1-A.60:

Simple wildcards may also be used. For example

   ZONE H*:B*
fits the reference H chain with the mobile B chain,
   ZONE -10:50-59
fits from the first residue to residue 10 in the reference structure with 50-59 in the mobile structure.
   ZONE *:1-100
fits all residues in the reference structure with 1-100 in the mobile structure.

If the structure file contains negatively numbered residues and you are using residue numbering, you can escape the minus sign in the residue number using a backslash:

    ZONE \-4-10:\-1-13
will fit residues $-4$ to 10 in one structure with $-1$ to 13 in the other.

Alternatively, you may specify the zones to be fitted by giving a sequence fragment. Together with that fragment, you may specify the number of residues to consider starting at that point. If the fragment occurs more than once in the sequence you may specify which occurrence you wish to consider. For example:

   ZONE CAR:VNS
fits the first occurrence of CAR in the reference set with first occurrence of VNS in the mobile set;
   ZONE CAR,10:VNS,10
fits 10 residues starting at the first occurrence of CAR in the reference set with 10 residues from the first occurrence of VNS in the mobile set;
   ZONE CAR,5/2
fits 5 residues from second occurrence of CAR in both structures;
   ZONE 24-34:EIR,ll
fits 24-34 in the reference set with 11 residues starting at the first occurrence of EIR in the mobile set.

By default, ProFit works in `Residue Number' mode, i.e. the numbers used in zone commands are the numbers seen in the PDB file. The alternative mode is `Sequential' mode where residues are numbered sequentially throughout the structure (including throughout multiple chains). Any chain names appearing in zone specifications will be ignored in Sequential mode. To switch mode, you use the NUMBER SEQUENTIAL or NUMBER RESIDUE commands.

The DELZONE command specifies zones to be deleted from the user-defined list of fit zones. DELZONE uses the same syntax as the ZONE command. The command matches the specified zone with a zone in the user-defined list of fitting zones and deletes the matching zone from the list. As with the ZONE command, entering either DELZONE CLEAR or DELZONE * will delete all user-defined zones.

Sequence Alignment

Another way of specifying zones is to let the program do it. ProFit does not provide any facilities for calculating structural equivalences, but does allow you to perform a simple Needleman and Wunsch sequence alignment and to apply zones automatically derived from that sequence alignment. This is done by issuing the ALIGN command. The sequence alignment is displayed, any currently active fitting zones are cleared and replaced by zones derived from the alignment.

Currently the ALIGN command may only be used if the structures contain only one chain.

Additional zones may also be specified in the usual way.

Clearly, it will normally be necessary to use the ATOMS command to specify that only backbone or C$\alpha$ atoms are included in the fitting.

The GAPPEN command allows you to specify an integer gap penalty for the sequence alignment performed by the ALIGN command. The default value is 5.

Reading an Alignment

If you have an alignment performed outside ProFit you may use this to specify the equivalent zones. Any previously defined fitting zones are automatically cleared first. As with the ALIGN command, this can currently only be used with structures having a single chain.

The alignment should be a file in PIR format using - characters to align the sequences. The two sequences are represented by separate entries, i.e. each must have a header of the form:

>P1;xxxxxx
title text .......

If the PIR file contains multiple chains, it will be rejected. The first sequence will be assumed to be that of the the reference structure and the second is that of the the mobile structure. Any other sequences in the file are ignored.

The READALIGNMENT command is used to read in the PIR file.

When performing a multiple structure fit, the first sequence must appear twice in the sequence alignment file. This is because it is used as both the first reference and mobile set.

Note that a bug in using the READALIGNMENT with multiple structure fitting was fixed in V2.3. (The bug caused the program to crash if a deletion appeared in the same place in two or more of the sequences.)

Limiting Zones Read From an Alignment

When obtaining fit zones from a sequence alignment, either from ALIGN or from READALIGNMENT, it can be useful to limit the zones of residues used. Normally all aligned residue pairs will be used.

For example, if the alignment were:

                       1         2         3
              123456789012345678901234567890123
              ASAHSTGEHNM--PLELLGHISLAM---NPRTY
              ---HSTADHNLRTPLEVLG--SLAMEDRQPRTY
the zones would normally be taken from the following positions in the alignment: 4-11, 14-19, 22-25, 29-33

By using the command:

      LIMIT 20 28
only the zone from 22-25 would be included.

This is particularly useful in conjunction with the ITERATE command (Section 8.4) and when fitting multiple structures (Section 9).

The LIMIT OFF command restores the default behaviour of deriving the zones from the whole alignment.

Iterative Updating of the Fitting Zones

The ITERATE command switches on the iterative updating of fitted zones during subsequent FIT commands. The ITERATE command ma be followed by an optional parameter to specify the cutoff used to include or exclude pairs from the zones. (ITERATE OFF is used to switch it off again.)

Currently the ITERATE command may only be used if the structures contain only one chain.

Note that this immediately does an ATOMS CA since iteration of zones is only performed on C$\alpha$ atoms. The program gives an informational message to this effect. See notes below if you want to calculate an RMSd over other atoms.

After the initial fit on the specified zones, the zones are updated such that residue pairs with C$\alpha$ atoms within a specified cutoff (default 3.0Å) are included and those more distant are excluded. The optimum set of equivalences is obtained using a dynamic programming method.

After updating the zones, the structures are refitted and the procedure iterates to convergence of $<0.01$Å, (typically 3 or 4 cycles). The RMSd on C$\alpha$ atoms is shown after each cycle unless the QUIET command is given.

You may specify a minimal initial zone of say 3 amino acids on which to fit first. The zone iteration will expand the zones until as many residues as possible can be equivalenced. Alternatively, this option is particularly useful in conjunction with the ALIGN command. Using ALIGN followed by ITERATE gives a particularly convenient method of fitting two arbitrary structures.

As stated above, the ITERATE command implies ATOMS CA. Having fitted on C$\alpha$ atoms, you can of course display the RMSd over other atom sets in the usual way using the RATOMS command (e.g. RATOMS N,CA,C,O will display the backbone RMSd).

Should you wish to refit on another atom set using the iterated zones, simply use ITERATE OFF to switch off iteration, select the atom set required using the ATOMS command and use FIT to refit the structures in the usual way. For example, to fit on backbone atoms:

   ITERATE OFF
   ATOMS N,CA,C,O
   FIT

Fitting Zones based on Temperature Factor Column.

It is possible to define zones by flagging residues in the temperature factor column of the PDB file using the BZONE command. Zones are marked using a positive whole numbers while zeros are ignored. Multiple zones can be marked using additional numbers.

Assignment of zones is carried out in two ways:

If only the reference structure is marked then the marked section will be added as a fitting zone in both the reference and mobile structure.

If both the reference and the mobile structure are marked then fitting zones are assigned by scanning through and setting zones for corresponding continuous stretches of flagged residues in either the reference or mobile structures.

Centre of Fitting

The default method for fitting is to centre the fit around the centre of geometry of the fit atoms. Fitting can be performed using a residue, specified by the SETCENTRE (or SETCENTER) command, as the center of fitting rather than the centre of geometry of the fit atoms.

   SETCENTRE CLEAR|(*|j[:j])
where j is a residue specification of the form [chain][.]resnum[insert]. Items in square brackets are optional and alternatives are marked by a | and grouped in parentheses.

Entering SETCENTRE CLEAR or SETCENTRE * will clear the centre residue.

Distance Cutoff for RMSd Calculations

The DISTCUTOFF command specifies a distance cutoff for ignoring atom pairs outside a specified distance when calculating RMSd.

   DISTCUTOFF [cutoff|ON|OFF]

The DISTCUTOFF command specifies a distance cutoff for ignoring atom pairs outside a specified distance when calculating RMSd. Entering DISTCUTOFF ON or DISTCUTOFF OFF will turn the distance cutoff on or off. Entering DISTCUTOFF 2.5 will set the value of the distance cutoff to 2.5 Angstroms and turn the distance cutoff on. A warning is displayed if the distance cutoff is set to zero and turned on.


next up previous
Next: Multiple Structure Fitting Up: ProFit Version 2.6 Previous: Specifying Atom Subsets
Andrew Martin 2008-06-16