Major Bioinformatics Sites

(Gateways)

In the first part of the practical you will pay quick visits to three of the major Bioinformatics web-sites world-wide:

You will look at another major resource, KEGG, later in the practical.

Each site provides different facilities - some of these overlap, but some are unique. In addition to these major sites, there are hundreds of other sites worldwide (including both Biochemistry and Computer Science at UCL) which provide additional services.

The National Centre for Biotechnology Information is America's major Bioinformatics site. It is part of the USA's National Institutes of Health (NIH) in Bethesda, Maryland, near Washington, DC.

One particularly useful feature of the NCBI site is access to PubMed, a database of scientific literature.

Visit the NCBI web site: http://www.ncbi.nlm.nih.gov/
Change the All Databases pull down at the top to PubMed and type Martin AC in the search box to look for publications by Dr. Martin.

Note one of the problems with working with databases - the range of topics in papers by Dr. Martin is rather wide... In fact there is more than one 'Martin AC' and currently only 6 of 40 hits on the first two pages are by the UCL Dr. Martin! The first person to guess the correct 6 gets a Mars bar! (Other chocolate bars are available!)

Another major resource on the NCBI site is OMIM - Online Mendelian Inheritance in Man. This is a database of inherited mutations occurring in human disease.

Click the Resources/All Resources link in the navigation bar at the top of any NCBI page and then click Online Mendelian Inheritance in Man (OMIM). Once on the OMIM page, type the name of a disease such as cystic fibrosis in the search box.

You will receive a list of hits to OMIM entries containing the text for which you searched.

Click the link for the first hit which has a * before the numeric identifier

Entries with a * at the start of the identifier represent genes. You will now see the full OMIM entry which provides a detailed textual description of the gene, its sequencing, function, mapping and known mutations. Information on mutations (allelic variants) related to the disease will also be displayed


The NCBI provides many other services including:

  • genome browsers,
  • the GenBank DNA sequence database,
  • GenPept, a protein sequence database derived from GenBank
  • the Single Nucleotide Polymorphism (SNP) database, dbSNP
  • taxonomy browser

They also provide software toolkits including the BLAST and PSI-BLAST software for searching sequence databases.

We will visit the NCBI again later to use BLAST over the web.

The EBI

The EBI is Europe's equivalent to the NCBI, based in Hinxton near Cambridge.

Visit the EBI web site: http://www.ebi.ac.uk/
Click Services at the top right and then Proteins. Now scroll down the list of protein databases until you find UniProt. Click the link (you should now be at the URL http://www.uniprot.org/)
Click the text search link and again click the UniProt text query [preferred] link in the resulting page. This will open a new window. (You should now be at the URL http://www.uniprot.org/)
Type cystic fibrosis in the search box at the top of the page and click the Search button.

You will now see a page of entries which contained the text you typed. You should see a familiar gene name based on your search of OMIM.

Expasy

Expasy is the main site of the Swiss Institute of Bioinformatics (SIB).

Visit the site at: http://www.expasy.org/

Two of the most useful services provided by Expasy are:

  • A quick search of the UniProt protein sequence database
  • Access to the SWISS-MODEL server which allows a three-dimensional model of a protein structure to be generated from its sequence

For the moment we will not use the Expasy site, but you will visit again later in the practical to use the text search facility.

Continue