Information
strucclus is a program to perform clustering of residues in a PDB file.
To run the program, you need a PDB file and a file containing a list of the residues to cluster (in the form [c]nnn[i], where [c] is an optional chain label, nnn is the actual residue number and [i] is an optional insertion code.
By default, the program will display clusters from a single cluster to N clusters (where N is the number of residues being clustered). This can be overridden with the -n flag.
We have used this software to look at clusters that may be related to pathogenic mutations vs. silent mutations. By clustering into different numbers of clusters and then performing a chi-squared test on each to see how well the clutering separates the pathogenic and silent mutations, we can select the best clustering. Distances to those clusters can then be used as part of the input to a machine learning approach to predict pathgenicity..