Secondary databases

Secondary databanks store information derived from the raw data in primary databanks. Typically they contain characteristic information extracted from families of related protein sequences. They are also often very highly annotated. Examples are:

Each secondary databank generally provides a specialized search tool which exploits the characteristic information for the protein family. You can then scan a protein sequence against the search tools to predict whether the protein matches the characteristics of a protein family.

Secondary databanks with information on functional sites and domains, such as PROSITE, PRINTS, SMART, Pfam, and ProDom, are vital resources for identifying distant relationships in novel sequences, and hence for predicting protein function and structure.

InterPro is a recent effort to amalgamate the annotations from the different databases and provide a unified search interface to the different sets of characteristics.

Practical work

You will now try a search using InterPro.

Follow the link in the menu on the left.

