17 May 2001

(From Nature press release.) The path from identifying a genome sequence to pinpointing the contribution of each gene is beset with pitfalls. In a Brief Communication [in today's Nature], researchers describe discrepancies between what is known about the proteins of the fruitfly Drosophila and predictions based on the genomics company Celera's database of the Drosophila genome. These uncertainties, say Samuel Karlin and colleagues, of Stanford University in California, call for caution and experimental back-up in turning genomics into proteomics.

The Stanford team compared the sequences of 1,049 known Drosophila proteins deposited in the SWISSPROT database with Celera's predictions. While searching Celera's database they failed to find a 99 percent match for nearly half of the SWISSPROT proteins. For some proteins, the version in each database differs by several hundred amino acids. Some of the differences may be due to different forms of the same protein, but others will have resulted from sequencing errors or annotation mistakes.

Mismatch Between Gene and Protein Sequence Data

Quick Links

Tools

Comments

Make a Comment

References

Further Reading

Annotate