BCB Data Mining Resouces

Biological and Biomedical Database Mining
Resources

PROF. CAROLINA RUIZ

Sources & Software |Courses

SELECTED DATA/TEXT SOURCES, ONTOLOGIES, AND SYSTEMS

Data / Text / Information Sources

NCBI
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
GenBank
GenBank^® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
Pubmed
PubMed comprises more than 23 million citations for biomedical articles from MEDLINE and life science journals. Citations may include links to full-text articles from PubMed Central or publisher web sites.
OMIM
OMIM^Ž (Online Mendelian Inheritance in Man^Ž) is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes.
UniProt
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. (See "what we provide" and "site tour".)
Registry of standard biological parts
The Registry is a continuously growing collection of genetic parts that can be mixed and matched to build synthetic biology devices and systems.
EMBL
The EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource.
Worm Database
Online bioinformatics database of the biology and genome of the model organism Caenorhabditis elegans (C. elegans) and related nematodes.

See Wormbase's User Guide.

Saccharomyces Genome Database
SGD^TM is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast.
Medical/Clinical Datasets:

Physionet
Research resource for complex physiologic signals.
UCI's Cardiotocography Data Set

Ontologies

Gene Ontology
The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases.
PAMGO
PAMGO extends Gene Ontology to include terms describing various processes related to microbe-host interactions.
See Trends in Microbiology (July 2009 V. 17 Issue 7) for articles about uses and extensions of Gene Ontology in the microbial domain.

Information Source Integration, Platforms, and Existing Software

GQuery: Global cross-database NCBI search
Simultaneously search multiple life sciences databases at the National Center for Biotechnology Information (NCBI). (Formerly known as "Entrez"?)
VBI Genome Browser
The VBI Genome Browser is a tool that allows viewing of genomic data that adheres to the Genomics Unified Schema (GUSDB) data storage standard.
GeneCards
GeneCards is a searchable, integrated database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes.
eTBLAST
eTBLAST is a unique search engine for searching biomedical literature that lets you input an entire paragraph and returns MEDLINE abstracts that are similar to it.
iHOP
Information Hyperlinked Over Proteins. Gene centric search Engine.
EBIMed
EBIMed is a web application that combines Information Retrieval and Extraction from MEDLINE
GoPubMed
Clusters documents based on Gene/MesH Ontology
Textpresso
The Textpresso project serves the biological and biomedical research community by providing: (1) Full text literature searches of model organism research and subject-specific articles at individual sites. (2) Text classification and mining of biomedical literature for database curation. (3) Linking biological entities in PDF and online journal articles to online databases.
MeSH
U.S. National Library of Medicine's Medical Subject Headings.
ABNER: A Biomedical Named Entity Recognizer
ABNER is a software tool for molecular biology text analysis.
The Stanford Natural Language Processing Group
Their research has resulted in state-of-the-art technology for robust, broad-coverage natural-language processing in many languages. These technologies include a part-of-speech tagger; a high performance probabilistic parser; a competition-winning biological named entity recognition system; and algorithms for processing Arabic, Chinese, and German text.
ISI Web of Knowledge
ISI Web of Knowledge is an online academic database provided by Thomson Scientific.s Institute for Scientific Information. It provides access to many databases and other resources.
W3C
The World Wide Web Consortium (W3C) is an international community where member organizations, a full-time staff, and the public work together to develop web standards.

OTHER USEFUL BIOINFORMATICS COURSE WEBSITES

Prof. Kellis' Algorithms for Computational Biology course (MIT)
Profs. Alterovitz's, Kellis', and Ramoni's Bioinformatics and Proteomics course (MIT)
Prof. Yemini's Computational Genomics course (Columbia Univ.)
Prof. Mneimneh's Computational Biology course (Hunter College)
Prof. Moran's Algorithms in Computational Biology course (Technion Univ.)
Prof. Subramanian's From Sequence to Structure: An Introduction to Computational Biology course (Rice Univ.)
Rosalind is a joint project between the University of California at San Diego and Saint Petersburg Academic University along with the Russian Academy of Sciences.

ruiz@cs.wpi.edu

Biological and Biomedical Database Mining Resources

PROF. CAROLINA RUIZ

SELECTED DATA/TEXT SOURCES, ONTOLOGIES, AND SYSTEMS

Data / Text / Information Sources

Ontologies

Information Source Integration, Platforms, and Existing Software

OTHER USEFUL BIOINFORMATICS COURSE WEBSITES

Biological and Biomedical Database Mining
Resources