Major Bioinformatics Sites: NCBI

In the first part of the tutorial you will visit a number of the major Bioinformatics web-sites world-wide:

Each site provides different facilities - some of these overlap, but some are unique. In addition to these major sites, there are hundreds of other sites worldwide (including both Biochemistry and Computer Science at UCL) which provide additional services.

First, open a link to the NCBI web site: http://www.ncbi.nlm.nih.gov/

Click the headings below to try different facilities on the NCBI web site.

One particularly useful feature of the NCBI site is access to PubMed, a database of scientific literature.

Change the Search pulldown at the top of the page from All databases to PubMed and type cystic fibrosis in the search box to look for publications related to cystic fibrosis. You should find that there are tens of thousands of papers! You could refine the search by including the names of authors, or more specific information such as the name of a protein.

Another major resource on the NCBI site is OMIM - Online Mendelian Inheritance in Man.

This catalogues information about hundreds of inherited diseases and, where known, documents the mutations that cause the disease.

Return to the main NCBI page. On the menu at the left of the page, click Genetics & Medicine and click the Online Inheritance in Man - OMIM. Read the description of OMIM.

Now, type cystic fibrosis in the search box.

You will see a list of entries related to cystic fibrosis. Each entry starts with an accession code (e.g. 219721). Some of these have a symbol at the start (e.g. %, # or *). The ones with a '*' at the start indicate a specific gene (often with mutation information). Other symbols represent descriptive entries, unclear data or cases where the molecular basis of the disease is not known.

Each chromosome has two arms known as 'p' and 'q' and is characterized by a set of bands. The position of a gene on a chromasome (the 'gene map locus') can therefore be given as a chromosome number, the arm name, a band number and a position within the band. (e.g. 5q2.3)

Cystic fibrosis is caused by a problem in transmembrane conductance. Record the gene map locus for the cystic fibrosis gene.

The NCBI provides many other services including:

  • genome browsers,
  • the GenBank DNA sequence database,
  • GenPept, a protein sequence database derived from GenBank
  • the Single Nucleotide Polymorphism (SNP) database
  • taxonomy browser

They also provide software toolkits including the BLAST and PSI-BLAST software for searching sequence databases.

We will visit the NCBI again later to use BLAST over the web.

Continue