SELECTED DATA/TEXT SOURCES, ONTOLOGIES, AND SYSTEMS
Data / Text / Information Sources
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
PubMed comprises more than 23 million citations for biomedical articles from MEDLINE and life science journals. Citations may include links to full-text articles from PubMed Central or publisher web sites.
OMIM ® (Online Mendelian Inheritance in Man ® ) is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes.
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. (See "what we provide" and "site tour".)
- Registry of standard biological parts
The Registry is a continuously growing collection of genetic parts that can be mixed and matched to build synthetic biology devices and systems.
The EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource.
- Worm Database
Online bioinformatics database of the biology and genome of the model organism Caenorhabditis elegans (C. elegans) and related nematodes.
- Saccharomyces Genome Database
SGDTM is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast.
- Medical/Clinical Datasets:
- Gene Ontology
The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases.
PAMGO extends Gene Ontology to include terms describing various processes related to microbe-host interactions.
- See Trends in Microbiology (July 2009 V. 17 Issue 7) for articles about uses and extensions of Gene Ontology in the microbial domain.
Information Source Integration, Platforms, and Existing Software
- GQuery: Global cross-database NCBI search
Simultaneously search multiple life sciences databases at the National Center for Biotechnology Information (NCBI). (Formerly known as "Entrez"?)
- VBI Genome Browser
The VBI Genome Browser is a tool that allows viewing of genomic data that adheres to the Genomics Unified Schema (GUSDB) data storage standard.
GeneCards is a searchable, integrated database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes.
eTBLAST is a unique search engine for searching biomedical literature that lets you input an entire paragraph and returns MEDLINE abstracts that are similar to it.
- iHOP Information Hyperlinked Over Proteins. Gene centric search Engine.
- EBIMed EBIMed is a web application that combines Information Retrieval and Extraction from MEDLINE
- GoPubMed Clusters documents based on Gene/MesH Ontology
- Textpresso The Textpresso project serves the biological and biomedical research community by providing: (1) Full text literature searches of model organism research and subject-specific articles at individual sites. (2) Text classification and mining of biomedical literature for database curation. (3) Linking biological entities in PDF and online journal articles to online databases.
U.S. National Library of Medicine's Medical Subject Headings.
- ABNER: A Biomedical Named Entity Recognizer
ABNER is a software tool for molecular biology text analysis.
- The Stanford Natural Language Processing Group
Their research has resulted in state-of-the-art technology for robust, broad-coverage natural-language processing in many languages. These technologies include a part-of-speech tagger; a high performance probabilistic parser; a competition-winning biological named entity recognition system; and algorithms for processing Arabic, Chinese, and German text.
- ISI Web of Knowledge
ISI Web of Knowledge is an online academic database provided by Thomson Scientific.s Institute for Scientific Information. It provides access to many databases and other resources.
The World Wide Web Consortium (W3C) is an international community where member organizations, a full-time staff, and the public work together to develop web standards.
- Prof. Kellis' Algorithms for Computational Biology course (MIT)
- Profs. Alterovitz's, Kellis', and Ramoni's Bioinformatics and Proteomics course (MIT)
- Prof. Yemini's Computational Genomics course (Columbia Univ.)
- Prof. Mneimneh's Computational Biology course (Hunter College)
- Prof. Moran's Algorithms in Computational Biology course (Technion Univ.)
- Prof. Subramanian's From Sequence to Structure: An Introduction to Computational Biology course (Rice Univ.)
- Rosalind is a joint project between the University of California at San Diego and Saint Petersburg Academic University along with the Russian Academy of Sciences.