Andrew C.R. Martin1,
Angelo M. Facchiano2,
Alison L. Cuff1,
Tina Hernandez-Boussard3, Pierre Hainaut3, Janet M. Thornton4,5
1School of Animal & Microbial Sciences
University of Reading
Whiteknights, P.O. Box 228, Reading RG6 6AJ, U.K.
2CRISCEB -- Research Center of Computational
and Biotechnological Sciences
Second University of Naples
via Costantinopoli 16, 80138 Napoli, Italy
3International Agency for Research on Cancer,
150 cours Albert Thomas, Lyon 69372, France
4Biomolecular Structure & Modelling Unit,
Department of Biochemistry & Molecular Biology,
University College London,
Gower Steet, London WC1E 6BT, U.K.
5Department of Crystallography,
Malet Street, London WC1E 7HX, U.K.
p53 consists of 3 domains: an N-terminal transcription domain, a C-terminal oligomerisation domain and a DNA binding core domain. Mutations in p53 are associated with more than 50% of human cancers and 90% are in the core domain. These mutations affect the structural integrity and/or p53-DNA interactions, leading to the partial or complete loss of the protein's function. In some cases, function can be restored using second-site supressor mutations. Since p53 mediates cell killing in chemo-therapy and radio-therapy, the possibililty of designing drugs that restore functional activity of p53 is of obvious significance in cancer therapy.
Here we attempt to classify mutations in the core domain according to their effects on the structure of p53. A structural analysis was performed on the p53 crystal structure and the results stored in a relational database. Raw mutation data were collected and imported into the database, which was then used to correlate mutation with structural effect in an automated manner.
The results of this analysis are published on the web
http://www.rubic.rdg.ac.uk/p53/). In summary, 304 of the 822
distinct muations were explained in structural terms, increasing to
515 when mutations to amino acids 100% conserved between diverse
species were included.
In future, classifying p53 mutations into structural groups may provide an explanation for such properties as dominant-negative activity, temperature senstivity and oncogenic potential. The automated method of structural analysis developed here may also be applied to other mutations such as those of dystrophin, BRCA-I and G6PD.
The mechanism of the p53 mediated suppression of cell cycle progression involves arrest within the G1 phase[6,8] as a consequence of the p53 induced synthesis of p21, an inhibitor of cyclin E/cdk2 and cyclin A/cdk2 kinases. In this way, p53 gives DNA repair mechanisms time to correct damage before the genome is replicated. If damage to the cell is severe, p53 initiates apoptosis by inducing transcription of genes encoding proapoptotic factors[7,9].
Tumour specific p53 mutations were first identified in 1989; point mutations occur in more than 250 codons and are common in many forms of human cancer. Comparisons of p53 sequences from different species indicate 5 blocks of highly conserved residues which coincide with mutation clusters found in p53 in human cancers. 90% of mutations identified in p53 are in the core domain for which a crystal structure is available (note, however, that this value may be overestimated since most workers have concentrated their research on the core domain). 20% of the mutations are concentrated at 5 `hotspot' codons: 175, 245, 248, 249 and 273.
Endogenous processes, including methylation and deamination of cytosine at CpG residues, free radical damage, and errors that may occur during the synthesis or repair of DNA can result in p53 mutations. Mutations can also occur via DNA damage induced by exogenous, physical or chemical carcinogens. In some cases ``mutagen fingerprints'' have been identified where certain carcinogens are responsible for specific mutations[12,13]. For example, cigarette smoke causes G:C to T:A transversions in lung cancers while aflatoxin B1 (AFB1) in the diet, particularly in China and Africa, causes G:C to T:A transversions specifically at the third base pair of codon 249 (AGG AGT) and is associated with liver cancers. Similarly, UVB exposure is associated with CC:GG to TT:AA dipyrimadine transitions in skin cancers.
Inherited p53 mutations are rare. Li et al. suggest 0.01% in the normal population and 0.1-1% in various cancer patients while Guinn and Padua state that only 5% of p53 mutations are inherited. Germ-line mutations in the p53 gene have been observed in several families with Li-Fraumeni syndrome[18,19]. This results in an inherited predisposition to a broad spectrum of cancers including breast cancer, osteosarcomas, soft tissue sarcoma, melanoma, adenocortical carcinomas and leukemias all of which appear at an early age.
More than 50% of all cancers involve the decreased or total loss of function of p53. This is caused, in most cases, by point mututions in one p53 allele. These mutations assert a dominant-negative effect over the remaining wild-type allele, resulting in genetic instability, loss-of-hetrozygosity and a deterimental effect on the function of p53. Some may also exert their own oncogenic activity. Correct functioning of p53 is critical to radiation and chemotherapy since both rely on causing DNA damage which triggers apoptosis via p53.
Raw mutation data have been collected over a number of years by groups in Germany and France. The databank of mutations, maintained by Hainaut, now in Release 4, consists of more than 14000 mutations affecting over 300 residues and linked with more than 60 different tumours. This collection of data is now being expanded with information on the pathology and clinical outcome of different mutations and tumours.
The open reading frame of human p53 codes for 393 amino acids with a central DNA-binding core domain (from approximately residue 100-300). The three-dimensional structure of this domain, complexed with DNA has been determined and is shown in Figure 1. The N-terminal domain contains a strong transcription activation signal while the C-terminal domain mediates oligomerisation. The core domain consists of a large -sandwich of two anti-parallel sheets of 4 and 5 strands, respectively. This acts as a scaffold supporting 3 loop-based regions -- a loop/-sheet/-helix motif (L1), and two large loops (L2 and L3). L2 and L3 are stabilised by zinc coordination and side-chain interactions[21,23]. DNA is bound by L1 and L3 -- the helix and loop for L1 slot into the major groove and L3 binds in the minor groove. The L2 loop stabilises L3 by packing against it. It has been proposed that p53 binds as a tetramer and Pavletich et al. stated the interactions occur through the C-terminal domain (residues 325-356).
p53 mutations at or near the core domain are split into two distinct categories. The majority of distinct mutations affect residues essential for the DNA-binding domain's structural integrity (structural mutations). p53 has been shown to be only marginally stable at body temperature, so any mutation which further reduces stability is likely to lead to unfolding/misfolding in vivo. A smaller class of mutations (functional mutations) affect residues involved in p53-DNA interactions[20,27], or in interactions with other proteins.
In theory, it should be possible to restore at least some functional activity to tumour-derived p53 mutants by (1) enhancing the stability of the protein in its folded state and/or (2) providing additional DNA contacts[20,27]. It is possible to rescue some p53 mutations using second-site suppressor mutations. For example, the ``hotspot'' mutation G245S causes structural changes in L2 and L3, suggestive of distortion of the conformation necessary for DNA binding. Nikolova et al. found that the suppressor mutant N239Y restored the stability of G245S and resulted in an improvement in DNA binding. They observed similar results using other second-site suppressors to restore some degree of normal function to other p53 mutations. The marginal stability of p53 suggests that it may be possible to restore wild-type activity through design of drugs which bind the correctly folded form, thus moving the equilibrium through simple mass action[20,23,27].
Michalovitz et al. suggested a genetic classification of mutations based on the dominance of their activity. Here we take a different approach to classifying mutations. We attempt to explain the effects of mutations in structural terms. Each of the observed mutations is classified in terms of the effect it is likely to have on the three-dimensional structure. We can define three categories of structural effect: (a) those which prevent the protein from folding into the correct conformation, (b) those which destabilise the folded protein (and may be temperature sensitive), (c) those which are on the surface of p53 and interfere with the interactions of p53 with DNA or other proteins.
We find we are able to rationalise the effects of 34.4% of distinct mutations on purely structural grounds. If we also consider residues which are 100% conserved across a range of species (and therefore likely to be important for the function of p53), this percentage rises to 58.4%. This actually represents 80.5% of the total observed mutations and those which we cannot explain are thus relatively rare mutation events. Unexplained mutations will fall into one of three classes: (a) those which are not involved in cancer and are non-pathogenic; (b) those which we have genuinely failed to identify, possibly because they have only a slight destabilising effect; (c) those which are on the surface of the p53 core domain and are involved in interactions with the other p53 domains or with other proteins. Mutants in the first category may prove useful as markers to indicate that DNA damage has occurred and this will add to epidemiological information; those in the second category represent a deficiency in the current methodology; those in the third are clearly the most interesting.
We performed a structural analysis of the p53 crystal structure, calculating secondary structure, backbone torsion angles, solvent accessibility and hydrogen bonding parameters and stored these data in a relational database. By also storing mutant data in the database we can correlate structural effects with mutations in a relatively automated fashion.
ftp://ftp.ebi.ac.uk/pub/databases/p53/were imported into a PostgreSQL relational database (
http://www.PostgreSQL.org/) using a script written in Perl to make small changes to the format. The raw data contain p53 mutations associated with human cancers identified by sequencing and published in the literature. These data include mutations found in normal, pre-neoplastic and neoplastic tissues, including metastases, as well as cell lines derived from such tissues. The data file contains 34 columns and includes data on cell-line, codon, DNA base and amino acid substitution, International Classification of Diseases for Oncology (ICD-O) tumour-site, tumour morphology and histology, tumour grade or stage, and risk factors (sex, country of origin, smoking status and alcohol consumption).
We considered both in-frame and out-of-frame insertions in the same manner; in both cases it is clear that the function of p53 could be disrupted. We also flagged silent point mutations. Earlier versions of the p53 data required considerable clean-up during this procedure; the current dataset required minimal clean-up (some frameshift mutants classified as `point' rather than `del' or `ins', minor changes to the page numbering format of references, etc.). For completeness, the citation data were also imported into a second database table.
We considered sequence variability on the basis that residues which are 100% conserved across such a diverse selection of species must be conserved for functional reasons. Thus we may not have direct structural explanations of why mutations to these residues might affect the function of p53, but we know that these residues are critical to the function of p53 and this is likely to be as a result of interactions with other proteins.
At each residue position in the fingerprint regions, the sequence
variability was assessed using a score based on the PET91 mutation
matrix normalised such that all scores on the
diagonal are maximal and equal. The score is calculated as the average
pairwise sum of the matrix scores normalised by the maximum score in
Each unique sidechain replacement is also assessed on the basis of steric acceptability. The current procedure is again very simple; we adopt a minimum perturbation protocol (MPP) to model the new sidechain into the 3D crystal structure of the p53 core domain and then count any bad clashes with the substituted sidechain. MPP proceeds as follows:
A bad contact is defined as two atoms whose centres are closer than 2.5Å -- this is a simple good/bad assessment; no degree of bad contact is calculated. We take 3 clashes as being indicative of a sidechain replacement which cannot be accommodated. Again this is a conservative decision; it appears that 2 clashes are sufficient to disrupt the structure in many cases.
By using the ability of PostgreSQL to allow user-defined functions, the clash assessment can be performed on-the-fly. In practice, for speed reasons, it is useful to cache the results of all unique sidechain replacements into a column in another database table. This can be achieved by performing a single SQL query on the database. These data were stored in a fourth table keyed by residue (codon) number and replacement residue type.
Table I summarizes the mutation data from Release 4 of the p53 mutation databank. For the purposes of this investigation, we have concentrated on analyzing the distinct mutations which result in a simple amino acid substitution in the core domain for which a crystal structure is available. As the table shows, there are 882 of these. This is approximately 51% of the total number of the distinct mutations; the remaining 49% are either more complex mutations, insertions, deletions, or occur outside the core domain. These simple substitution mutations in the core domain represent 69.8% of the total number of observed mutations.
As described by Baker and Hubbard the following residues are classified as able to donate a hydrogen bond: H,K,N,Q,R,S,T,W,Y while the following residues can accept a hydrogen bond: D,E,H,N,Q,S,T,Y. There is a total of 4703 substitution mutations (309 distinct mutations) involving hydrogen bonding residues. Using our conservative assessment of explaining hydrogen bonding mutations (described in the Methods) where we do not consider the precise geometry and assume that a small local rearrangement can be accomodated, we find that we can explain 43.2% of observed mutations to hydrogen bonding residues (52.5% of distinct mutations). See Table II.
Of the 332 mutations resulting in a substitution by proline, 320 occur in the core at 50 distinct sites. Table III shows these core domain substitutions together with the backbone torsion angles of the parent structure. Those combinations which are disallowed regions for proline are indicated. We define the allowed regions for Proline as and ( or ). 47 of the 50 mutations (94%) are disallowed and will thus result in disruption of the structure. Some of these, however, are borderline and may be accomodated by a very small rearrangement (e.g. Leu137 Pro). The 47 disallowed Proline mutations sites are illustrated in Figure 3.
Because it has no sidechain, glycine is able to adopt conformations which are sterically hindered for other amino acids. Substitution of any native glycine residues which adopt one of these conformations will thus result in disruption of the structure resulting in an incorrectly folded protein.
The allowed regions of the Ramachandran plot for non-glycine/non-proline residues are, for this purpose, defined as: ( ) or ( ) or ( ) or ( ). All non-glycine residues in the p53 crystal structure fall within these limits.
With the exception of the glycine residues at codons 117, 154, 187, 244, 245 and 262, all the others fall in regions allowed for other amino acids. Therefore, only mutations to these 6 glycines will result in disruption of the structure. These sites are illustrated in Figure 4. Table IV shows the substitutions of glycine residues by other amino acids and it can be seen that 32 of 53 core region distinct mutations (60.4%) are disallowed.
At these 14 sites, a total of 2383 mutations is observed, 74 of which are distinct. While mutations at the more peripheral of these sites may, in some circumstances, allow DNA still to bind, the stability of the complex and the specificity of DNA binding is likely to be affected and this will affect the function of the protein.
While we cannot offer a direct structural explanation for many of these, one can assume that they are conserved throughout evolution for a good reason and, in the case of surface residues, this is likely to be that the amino acid is critical for interactions with other proteins. 6169 mutations resulting in amino acid substitutions occur (395 distinct) to these 73 conserved residues.
Some mutations can be explained in multiple ways. This is shown in
detail on the web site (
http://www.rubic.rdg.ac.uk/p53/). In total, we were able to
explain 304 of the 822 distinct mutations resulting in substitutions
in the core domain (34.5%) on purely structural grounds. If mutations
to 100% conserved amino acids are also considered, then this number
rises to 515 of 822 distinct mutations (58.4%).
Of the unexplained mutations, it might be expected that the majority of these will be on the surface. Using a cutoff of 10% accessibility to classify a residue as exposed, we actually find that only 236 of the 367 unexplained distinct mutations (64.31%) are exposed.
Note that our criteria for classifying a mutation as explained are fairly strict. For example we assume that any hydrogen-bonding sidechain substitution will be able to maintain the hydrogen bond if it has donor or acceptor capabilities the same as the parent; in practice, a structural change may be necessary.
Clearly the sidechain replacement assessment could be made much more sophisticated and will be addressed in future work. A minimisation procedure could be incorporated into the sidechain replacement together with a measure of the degree of bad contact rather than a simple yes/no assessment of clashes. In addition we could use X-Site scores to assess the acceptability of sidechain replacements and account for large sidechains being replaced by smaller sidechains thus creating a void in the structure. Similarly rather than simply assessing residues on the basis of ability to donate or accept hydrogen bonds, it would be possible to assess the geometry of replacements which, in principle, are able to maintain the required ability.
Excluding those mutations for which we have genuinely not identified a structural explanation, some mutants may actually have a silent non-pathogenic phentotype. More interesting are those which are on the surface of the p53 core domain and are involved in interactions with the other p53 domains or with other proteins. In future, we intend to apply the patch analysis methodology of Jones and Thornton[38,39] to identify regions of the protein surface likely to be involved in protein-protein interactions.
In the long term, it is hoped that properties of p53 mutations, such as dominant negative activity, oncogenic potential and temperature-sensitivity may be explained by classification of p53 mutations into structural groups whose molecular basis may then be analysed.
We see this approach not only as a useful tool in examination of p53 mutations, but also as a paradigm for the study of many other diseases caused by point mutations. In the near future, when structural data become available, it will become possible to apply the same forms of analysis to dystrophin, BRCA-I and G6PD -- in all cases mutation databanks are available.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 webpaper.tex.
A few hand fixes were then needed! The colour figures weren't imported properly - these were done by hand. The enumerate list for stages in clash detection got corrupted and the following 2 paragraphs were missing. The icons directory was also not correctly specified and had to be created by hand. The translation was done under the name webpaper.html, but the final file was renamed to index.html. This meant it was necessary to change references to webpaper.html in the HTML to index.html
The translation was initiated by Andrew C.R. Martin on 2000-11-16