next up previous
Next: Calculating the RMSd Over Up: ProFit Version 3.1 Previous: Specifying Zones


Multiple Structure Fitting

The MULTI command allows a multiple set of structures to be read in for fitting. The filename specified for MULTI is a `file of files' i.e. it contains a list of filenames which will be read.

MULTI is used in place of REFERENCE and MOBILE to read in a set of structure files. The first structure file is used as a reference set for the first fitting stage, but the coordinates are averaged after each fitting stage to derive an averaged template used for subsequent fitting.

i.e. Given $N$ files to fit, file 2 is fitted to file 1 and an averaged structure, $A$, is calculated, file 3 is then fitted to $A$ and a new average, $A'$ is calculated. This continues until all $N$ structures have been fitted. The whole procedure iterates until convergence (typically 3 or 4 cycles).

ProFit V3.0 changes the default method of calculating the average template. As each new mobile structure is added, the degree of change in the averaged structure is inversely proportional to the total number of mobile structures. Consequently, outlying structures should have less effect on the averaged reference structure.

Normally, the coordinates of the first structure in the MULTI list are taken as the starting point for the averaged reference structure. It is possible however, to select another mobile structure as the initial reference structure using the SETREF command. For example, SETREF 3 will use the third mobile structure as the reference strcture. If no structure number is specified, then the SETREF command carries out an all vs. all comparison and the coordinates of the mobile structure with the least overall RMSD to all the other mobile structures are selected as the initial reference structure.

Multiple structures can be fitted with either the FIT or ORDERFIT command. The ORDERFIT command (new in V3.0) will perform the multiple structure fit in a similar manner to the FIT command but fitting the most similar structures first. As the averaged template is updated with each new structure fitted, the order of fitting has a (small) influence on the template. The ORDERFIT command (possibly along with the SETREF command) can provide a standardized fitting scheme.

Progress and RMSds are reported at each iteration unless the QUIET command is used.

By default, RMSDs, pairwise distances and transformation matrices are given in relation to the first mobile structure. The MULTREF command will set ProFit to give results in relation to the averaged reference structure rather than the first mobile structure (MULTREF OFF restores the default behaviour).

The resulting fitted files are written with the MWRITE command. Note that there is no ``reference'' set in the sense used for normal 2-structure fitting; fitted versions of all $N$ files will be written since the reference set is actually an averaged template used purely as a guide for fitting.

The averaged template can be written to a file using the WRITE REF command. As it is a simple numerical average of the cartesian coordinates however, taking the reference structure generated by ProFit as a representation of an actual geometry/conformation accessible by the structure should be done with caution.

When the MWRITE command is used, the output filenames are the same as the input files, but with the extension replaced by that specified in the MWRITE command. If no extension is specified, then `.fit' will be used. If the input structure files contained no extension, then the extension specified will be appended to the filenames.

Note that since only the extension is changed when writing back the fitted files, you must have permission to write to the directory from which the original files were read.

Multiple-structure fitting is particularly effective in combination with the ITERATE command (see Section 9.4) which refines the fitting zones iteratively. This can lead to extremely good multiple structures fits.

Note that multiple structure fitting and zone iteration can be very slow as these have been added to the earlier pair-wise fitting engine. An increase in speed needs a complete re-design of the code.

Specifying Zones With Multiple Structure Fitting

Currently, the ZONE command may only be used with multiple structure fitting when the same zone specification may be applied to every structure. i.e. You cannot specify a zone for each structure separating the zones with a colon (:)

Thus, the following are legal zones:

  ZONE 20-30
  ZONE C,3

while the following are not:

  ZONE 24-34:25-35
  ZONE 24-34:EIR,11

For normal use, it is recommended that the ALIGN, TRIMZONES and READALIGNMENT commands (possibly in conjunction with the LIMIT command) are used for specifying zones when fitting multiple structures.

As of ProFit V3.0, the TRIMZONES command can be used in conjuction with the ALIGN command. The ALIGN command performs a pairwise alignment for each of the mobile structures with the reference. Although fitting each mobile using an individualized set of zones offers the best fitting for each mobile to the reference, there may be times when a like vs. like comparison is required. If the number of residues used for fitting varies, RMS deviation cannot be directly compared between structures.

To allow for a like vs. like comparison, the TRIMZONES command resets the fitting zones for each mobile structure to include only fitting residues that are common to all the mobile structures. Thus, by ensuring that the fitting zones are the same for each mobile, the TRIMZONES command allows for a like vs. like comparison.

When using READALIGNMENT with multiple structures, the first sequence must appear twice in the alignment file. This is because it is used as both the first reference and mobile set.

Note that a bug in using the READALIGNMENT with multiple structure fitting was fixed in V2.3. (The bug caused the program to crash if a deletion appeared in the same place in two or more of the sequences.)

All Versus All Comparisons

As of ProFit V3.0, it is possible to perform an all versus all comparison of the mobile structures when fitting multiple structures. The ALLVSALL command requires that the fitting zones set are identical for all mobile stuctures and automatically resets the fitting zones using the TRIMZONES command.

Results are presented as tab-delimited text suitable for loading into a spreadsheet. If the optional filename parameter is given, output is directed to the specified file. If a filename is not specified, or the file cannot be opened, output appears on the screen. If the filename begins with a pipe character ($\mid$), the results are piped into the specified program. This is particularly useful with the more (or less) Unix command.

next up previous
Next: Calculating the RMSd Over Up: ProFit Version 3.1 Previous: Specifying Zones
Andrew Martin 2010-09-28