next up previous
Next: Version numbering Up: ProFit Version 3.1 Previous: ProFit Version 3.1

Introduction and Methodology

ProFit (pronounced Pro-Fit, not profit!) is designed to be the ultimate program for performing least squares fits of two or more protein structures. It performs a very simple and basic function, but allows as much flexibility as possible in performing this procedure. Thus one can specify subsets of atoms to be considered, specify zones to be fitted by number, sequence, or by sequence alignment.

Early versions of ProFit did not try to address the question of sorting out equivalent atoms for you beyond doing a sequence alignment. There are other programs such as SSAP and GAFIT which address that problem. You must specify which residues and atoms you consider to be equivalent although the program supports internal sequence alignment to set the zones automatically.

As of ProFit V2.0, iterative updating of fitting zones is now supported. Thus you may give a sequence alignment or just a small fragment to initiate the fitting process (a minimum of 3 amino acids). Fitting is performed on this region and then all residue pairs within 3Å are included in the fitting zones and the fitting is repeated. This iterates until the C$\alpha$ RMSd converges to within 0.01Å. This is particularly useful in conjunction with the initial zone specification based on sequence alignment. Convergence typically takes 3-4 cycles.

ProFit V2.0 also introduced multiple structure fitting. The first structure file is used as a reference set for the first fitting stage but the coordinates are averaged after each stage to derive a template used for subsequent fitting. i.e. Given $N$ files to fit, file 2 is fitted to file 1 and an averaged structure, $A$, is calculated, file 3 is then fitted to $A$ and a new average, $A'$ is calculated. This continues until all $N$ structures have been fitted. The whole procedure iterates until convergence (typically 3 or 4 cycles).

The program will output an RMS deviation and optionally the fitted coordinates. RMS deviations over alternate zones and atoms may also be calculated without performing a new fit. Thus the zones for calculating the RMS deviation can be different from those used for fitting.

While optimised for proteins, non-protein structures may also be fitted if they are stored in the standard Protein Databank (PDB) format.

ProFit is written to be as easily portable between systems as possible and uses a command-driven interface.

ProFit uses the McLachlan fitting algorithm, essentially a steepest descents minimisation, as described in McLachlan, A.D. (1982) Rapid Comparison of Protein Structures, Acta Cryst. A38, 871-873. This part of the code is based on an implementation by Dr. Mike Sutcliffe.

In summary, ProFit has the following features:

  1. Portability between different operating systems
  2. Ability to specify atom subsets
  3. Ability to specify zones:
  4. Output RMS deviation over:
  5. Optionally output fitted coordinates in PDB format
  6. Integrated help facility
  7. Fitting zones derived from sequence alignment
  8. Iterative updating of fitting zones
  9. Multiple structure fitting

next up previous
Next: Version numbering Up: ProFit Version 3.1 Previous: ProFit Version 3.1
Andrew Martin 2010-09-28