External links |
| Nomenclature |
|---|
HUGO :
The Human Gene Nomenclature Database
The Human Gene Nomenclature Database Search tool provides access to the list of currently approved human gene symbols as
maintained by the HUGO gene nomenclature committee. Many previously approved symbols are also listed, with links
directing users to the current symbol. Minor changes to a previously approved symbol, such as adding a number (eg NRAMP
becomes NRAMP1), may not be listed in this way, so users should try a "Symbol begins with" search using the first few letters
of a symbol, instead of an exact search, if they fail to find a specific symbol. Other symbols used in the literature (known as
aliases) are collected and stored by the HUGO Nomenclature Committee, and are now searchable with this tool. The "Find a
gene" facility in GDB may be useful to search for other names/symbols which cannot be found in the Human Gene
Nomenclature Database.
GDB :
The Genome Database
Entrez_Gene
GENECARDS :
GeneCards: human genes, proteins and diseases
GENATLAS
GeneLynx
eGenome
euGene
An international collaboration in support of the Human Genome
Entrez_Gene is a part of Entrez devoted to search informations on genes and links
to other database as RefSeq, maps, OMIM, Unigene, Pubmed....
It is developped and maintained by NCBI.
GeneCards is a database of human genes, their products and their
involvement in diseases.
It offers concise information about the functions of all human genes
that have an approved symbol, as well as selected others [gene
listing]. It is especially useful for those who are
searching for information about large sets of genes or proteins,
e.g. for scientists working in functional genomics and proteomics.
The GENATLAS database compiles the information relevant to the mapping efforts of the Human Genome Project. This information is collected from
original articles in the literature or from the proceedings of Human Gene Mapping and Single Chromosome Workshops. It is repertoried in three
interactive directories GENATLAS/GEN, GENATLAS/ LINK, GENATLAS/REF. A series of graphical maps GENATLAS/ MAP is associated as well
as a Comparative Map database edited by John H Edwards.
GeneLynx is a portal to a collection of hyperlinks for each human gene.
It is implemented as an easily extensible
relational database with a straightforward user interface
eGenome is a comprehensive position based catalog of the human genome. eGenome
catalogs a wide range of genomic landmarks, including transcripts, markers ...
Human Genes: Genomic Information for Homo sapiens .
Genes include 20553 experimentally determined loci and 32657 predicted loci.
Protein coding sequences are available for 46465 of these
| Genomic and cartography |
|---|
GoldenPath : Human Genome Project Working Draft - Human Genome Browser
This page contains links to an assembly of the current draft of the human genome. The human genome is approximately
3.1 billion bases. Roughly 88% of the genome has been sequenced by the International Human Genome Project. The Oct.
7th draft genome is composed of hundreds of thousands of fragments of various sizes. The order and orientation of the
fragments is often not known from the sequencing process itself. In some cases the same part of the genome will be
duplicated in several fragments.
Human Genome Browser
EnSembl : Human Genome Project Working Draft - Ensembl Map view
Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop a software system which produces and
maintains automatic annotation on eukaryotic genomes.
Vega : Human Genome Project Working Draft - Ensembl Map view
Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop a software system which prod
uces and
maintains automatic annotation on eukaryotic genomes.
NCBI Map Viewer : Homo_sapiens genome view
The NCBI Map Viewer provides graphical displays of features on the human genome sequence assembly as well as cytogenetic, genetic, physical, and radiation hybrid maps. Extensive documentation is provided to describe the resource features and methods used, tutorials, and statistics.
Homologene
HomoloGene is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.
| Gene and transcription |
|---|
GenBank
GenBank is the NIH's database of all known nucleotide and protein
sequences including supporting bibliographic and biological information.
Since 1992 it has been based at the National Center for Biotechnology
Information (NCBI), a division of the National Library of Medicine, located
on the NIH campus. NCBI was created by Congress in 1988 and specifically
charged with developing automated information systems to support
molecular biology and biotechnology. Its other mission is to conduct basic
research and as part of the NIH Intramural Program, NCBI scientists pursue
research in genome analysis, molecular structure modeling and prediction,
and mathematical methods for sequence analysis.
RefSeq
The NCBI Reference Sequence project (RefSeq) will provide reference
sequence standards for the naturally occurring molecules of the central dogma,
from chromosomes to mRNAs to proteins. RefSeq standards provide a foundation
for the functional annotation of the human genome. They provide a stable reference
point for mutation analysis, gene expression studies, and polymorphism discovery.
CCDS
AceView
TRASER
Unigene
fast-DB
The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.
AceView offers a comprehensive and non-redundant cDNA-supported annotation of human and nematode genes. Our program co-aligns the million mRNAs and ESTs available from GenBank,
dbEST and RefSeq on the genome sequence, quality-filters the cDNAs and clusters them into alternative transcripts and genes. By construction, the cooperative accuracy of these sequences,
ESTs or mRNAs, is brought up to the exceptional quality of the genome sequence.
The Transcript Sequence Retreiver (TRASER) provides rapid retrieval of transcript and
upstream (putative promoter-containing) sequences for predicted human genome mRNAs.
The underlying database is built using the human genome annotation files provided by the National Center for Biotechnology Information
UniGene is an experimental system for automatically partitioning GenBank
sequences into a non-redundant set of gene-oriented clusters. Each UniGene
cluster contains sequences that represent a unique gene, as well as related
information such as the tissue types in which the gene has been expressed and
map location.
the Friendly Alternative Splicing and Transcripts Database
FAST DB: a website ressource for the study of the expression regulation of human gene products.
Fast DB provides three kinds of analysis: human mRNAs, human mRNAs and ESTs, and mouse mRNAs.
| Protein : pattern, domain, 3D structure |
|---|
GenPept
GENPEPT is a protein database translated from the last release of GENBANK.
SwissProt :
SWISS-PROT Protein Sequence Database
The SWISS-PROT Protein Sequence Database is a database of protein
sequences produced collaboratively by Amos Bairoch (University of Geneva)
and the EMBL Data Library. The data in Swiss-Prot are derived from
translations of DNA sequences from the EMBL Nucleotide Sequencef
Database, adapted from the Protein Identification Resource (PIR) collection,
extracted from the literature and directly submitted by researchers. It contains
high-quality annotation,is non-redundant, and cross-referenced to several
other databases, notably the EMBL nucleotide sequence database, PROSITE
pattern database and PDB. SWISS-PROT is a curated protein sequence
database which strives to provide a high level of annotation (such as the
description of the function of a protein, its domain structure,
post-translational modifications, variants, etc), a minimal level of
redundancy and a high level of integration with other databases. Recent
developments of the database include: an increase in the number and scope
of model organisms; cross-references to seven additional databases; a variety
of new documentation files; the creation of TREMBL, an unannotated
supplement to SWISS-PROT. This supplement consists of entries in
SWISS-PROT-like format derived from the translation of all coding
sequences (CDS) in the EMBL nucleotide sequence database, except CDS
already included in SWISS-PROT.
Prosite :
Protein signatures
The PROSITE database consists of a large collection of biologically meaningful signatures that are described
as patterns or profiles. Each signature is linked to documentation that provides useful biological information
on the protein family, domain or functional site identified by the signature. The PROSITE web page has been
redesigned and several tools have been implemented to help the user discover new conserved regions in their
own proteins and to visualize domain arrangements. We also introduced the facility to search PDB with a
PROSITE entry or a useršs pattern and visualize matched positions on 3D structures. The latest version of
PROSITE (release 18.17 of November 30, 2003) contains 1676 entries. The database is accessible at
http://www.expasy.org/prosite/.
Interpro :
(Integrated Resource of Protein domains and Functionnal sites)
release 1.0 (March 2000) was built from Pfam 5.0, PRINTS 25.0, PROSITE 16 and the
current SWISS-PROT + TrEMBL data. This release of InterPro contains 2990 entries, representing
2373 families, 556 domains, 47 repeats and 14 post-translational modification sites encoded by 4884
different regular expressions, profiles, fingerprints and HMMs.
Interpro is a useful resource for whole genome analysis and has already been used for the proteome
analysis of a number of completely sequenced organisms. A preliminary proteome analysis was also
produced for the human genome.
ClusTr
PFAM - Sanger Center
CDD A Conserved Domain Database and Search Service - NCBI
BLOCKS
PRODOM - Toulouse
PDB - Protein Database
PDBSUM
HPRD - Human Protein Reference Database
IntAct - EBI
DIP
OMIM
GeneTests
dbSNP - NCBI
SNP - NCI
GeneSNPs
The SNP Consortium Ltd
HGBASE
HAPMAP
HGMD
The Human Gene Mutation Database (HGMD) represents an attempt to collate
known (published) gene lesions responsible for human inherited disease. This
database, whilst originally established for the study of mutational mechanisms
in human genes (Cooper and Krawczak 1993), has now acquired a much broader
utility in that it embodies an up-to-date and comprehensive reference source to
the spectrum of inherited human gene. Thus, HGMD provides information of
practical diagnostic importance to (i) researchers and diagnosticians in human
molecular genetics, (ii) physicians interested in a particular inherited condition
in a given patient or family, and (iii) genetic counsellors.
Mitelman Database of Chromosome Aberrations in Cancer
Genetic Association Database
HuGE Navigator
ORPHANET :
Database of rare diseases and orphan drugs
The fields currently covered are:
Gene Sorter
STANFORD - SMD
SAGE
n order to support the public use and dissemination of serial analysis of gene expression (SAGE) data, NCBI has recently
refurbished this website. SAGEmap is a SAGE data resource for the query and retrieval and a
nalysis of SAGE data from any
organism. All of the data present on this website has been accessioned in the Gene Expressi
on Omnibus repository
ENZYME
Amigo
BIOCARTA
KEGG: Kyoto Encyclopedia of Genes and Genomes
TREEFAM : Tree families database
CTD : Comparative Genomics Database
PROBES
In collaboration with the YAC screening Centre (Milan), and the
Haematology Dept., University of Bari. Collaborations to validate
the probes are welcomed. Most of the clones have been identified by
screening PAC or BAC libraries with appropriate primers.
PubGene
PubMed
The CluSTr database offers an automatic classification of UniProt Knowledgebase
Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains
Pfam is a collection of protein families and domains. Pfam contains multiple protein alignments and profile-HMMs of these families. Pfam
is a semi-automatic protein family database, which aims to be comprehensive as well as accurate. This page provides links to various help
documents that are available.
A Conserved Domain Database and Search Service
Proteins often contain several modules or domains, each with a distinct evolutionary origin and function. The CD-Search service may be used to identify the conserved domains present in a protein sequence:
Computational biologists define conserved domains based on recurring sequence patterns or motifs. CDD
currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from
colleagues at NCBI. The source databases also provide descriptions and links to citations. Since
conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible.
ProDom is a comprehensive set of protein domain families automatically generated from the
SWISS-PROT and TrEMBL sequence databases
Protein Interaction databases
IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.
The DIPTM (Database of Interacting Proteins) database lists protein pairs that are known to interact with each other. By interact we mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level.
Polymorphism : SNP, mutations, diseases
Online Mendelian Inheritance in Man
OMIM is a catalog of
human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his
colleagues at Johns Hopkins and elsewhere, and developed for the World Wide Web by NCBI, the
National Center for Biotechnology Information. The database contains textual information,
pictures, and reference information. It also contains copious links to NCBI's Entrez database of
MEDLINE articles and sequence information.
Single Nucleotide Polymorphism
A Database of Single Nucleotide Polymorphisms : A key aspect of research in genetics is associating sequence variations with heritable phenotypes. The most common variations are single nucleotide polymorphisms (SNPs), which occur approximately once every 100 to 300 bases. Because SNPs are expected to facilitate large-scale association genetics studies, there has recently been great interest in SNP discovery and detection.
CGAP-GAI Identified Variation in Genes: Locations of SNPs in genetic and physical maps
University of Utah. This Environmental Genome Project web resource integrates gene, sequence and polymorphism data into individually annotated gene models. The human genes
included are related to DNA repair, cell cycle control, cell signaling, cell division, homeostasis and metabolism, and are thought to play a role in susceptibility to
environmental exposure.
Single Nucleotide Polymorphisms for Biomedical Research. Single nucleotide polymorphisms (SNPs) are common DNA sequence variations among individuals and have great significance for biomedical research.
The SNP Consortium Ltd. is a 501c3 non-profit foundation organized for the purpose of providing public genomic data. Its mission is to develop up to 300,000 SNPs distributed evenly throughout the human genome and to make the information related to these SNPs available to the public without intellectual property restrictions. The project started in April 1999 and is anticipated to continue until the end of 2001.
HGBASE is the SRS version of HGVBASE
HGVbase is an attempt to summarize all known sequence variations in the human genome, to facilitate research into how genotypes affect common diseases, drug responses, and other complex phenotypes.
Sequence variations are presented with details of how they are physically and functionally related to the closest neighbouring gene. Records include SNPs, Indels, simple tandem repeats, and other sequence alternatives, regardless of location, allele frequencies, or known affect upon phenotype. All records are highly curated and annotated, ensuring maximal utility and data accuracy.
The International HapMap Project is a partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom and the United States to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals. See "About the International HapMap Project" for more information.
Human Gene Mutation Database at the Institute of
Medical Genetics in Cardiff
Human gene mutation is a highly specific process, and this specificity has
important implications for the nature, prevalence and therefore diagnosis of
genetic disease. Indeed, the recognition that certain DNA sequences are
hypermutable has yielded clues as to the endogenous mutational mechanisms
involved and provided insights into the intricacies of the processes of DNA
replication and repair (Cooper and Krawczak 1993). In practical terms, a fuller
understanding of the mutational process may prove important in molecular
diagnostic medicine by contributing to improvements in the design and efficacy
of mutation search procedures and strategies in different genetic disorders.
The information in the Mitelman Database of Chromosome Aberrations in Cancer relates
chromosomal aberrations to tumor characteristics, based either on individual cases or associations.
All the data have been manually culled from the literature by Felix Mitelman, Bertil Johansson, and
Fredrik Merten
The Genetic Association Database is an archive of human genetic association studies of complex diseases and disorders. The goal of this database is to allow the user to rapidly identify medically relevant polymorphism from the large volume of polymorphism and mutational data, in the context of standardized nomenclature.
HuGE Navigator provides access to a continuously updated knowledge base in human genome epidemiology, including information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic
tests
This project is the result of a commonly observed fact: rare diseases are difficult to
deal with for medical practitioners. This is due to their restricted knowledge of the
diseases' natural history, the patient care required, treatment, and sometimes even
of its existence. Scientific knowledge exists, or at least partial knowlege, but it is
scattered. Because of the physical media on which it is communicated, the
information is difficult to access for the great majority of physicians, not to
mention patients and their families. Only a very small number of doctors
specialize in these diseases, and their practices are scarcely known, sometimes
even totally unknown to other practitioners.
General Knowledge
SOURCE is a unification tool which dynamically collects and compiles data from many scientific databases, and
thereby attempts to encapsulate the genetics and molecular biology of genes from the genomes of Homo sapiens,
Mus musculus, Rattus norvegicus into easy to navigate GeneReports.
SMD stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, SMD provides
interfaces for data retrieval, analysis and visualization. Data is released to the public at the researcher's discretion or upon publication
The goal of the Gene OntologyTM Consortium is to produce a dynamic controlled
vocabulary that can be applied to all organisms even as knowledge of gene and protein
roles in cells is accumulating and changing.
Pathways on the CGAP web site have been obtained directly from BioCarta
A grand challenge in the post-genomic era is a complete computer representation of the cell
and the organism, which will enable computational prediction of higher-level complexity of
cellular processes and organism behaviors from genomic information. Towards this end we
have been developing a bioinformatics resource named KEGG, Kyoto Encyclopedia of Genes
and Genomes, as part of the research projects in the Kanehisa Laboratory of Kyoto University
Bioinformatics Center.
Pathways on the CGAP web site have been obtained directly from Kegg.
TreeFam (Tree families database) is a database of phylogenetic trees of animal genes. It aims at developing a curated resource that gives reliable information about ortholog and paralog assignments, and evolutionary history of various gene families
The Comparative Toxicogenomics Database (CTD) elucidates molecular mechanisms by which environmental chemicals affect human disease.
Chemical-gene/protein interactions and chemical- and gene-disease relationships are curated from the published literature, and integrated with diverse data (chemicals, genes/proteins, human diseases, references, sequences, vertebrate and invertebrate organisms, and the Gene Ontology) to facilitate environmental health research.
Miscellaneous
Collection of PAC and BAC probes useful for specific tumors.
CloneCards is a new database search interface to the Primary Database.
It was designed to retrieve comprehensive information about clones fast.
The current version is a beta-Test version and we appreciate your
comments on improvements or error messages, should they occur.
Bibliography
Several tools for searching litterature, expression, network, ontology ...
on the PubGene server
PubMed, a service of the National Library of Medicine, includes over 15 million citations from
MEDLINE and additional life science journals for biomedical articles back to the 1950's.
PubMed includes links to full text articles and other related resources