Sequence databases for use with the stand-alone BLAST programs.
Pre-formatted databases for BLAST nucleotide, protein, and translated searches also are available for downloading under the db subdirectory. See the README file in the ftp directory for more information. The RefSeq collection is accessed through the Nucleotide and Protein databases.īLAST executables for local use are provided for Solaris, LINUX, Windows, and MacOSX systems. RefSeqs provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. Reference Sequence (RefSeq)Ī collection of curated, non-redundant genomic DNA, transcript (RNA), and protein sequences produced by NCBI. A subset of these models are used by the Prokaryotic Genome Annotation Pipeline ( PGAP) to assign names and other attributes to predicted proteins. It includes conserved domain architecture, hidden Markov models and BlastRules. Protein Family Models is a collection of models representing homologous proteins with a common function. Protein DatabaseĪ database that includes protein sequence records from a variety of sources, including GenPept, RefSeq, Swiss-Prot, PIR, PRF, and PDB.
The database provides easy access to annotation information, publications, domains, structures, external links, and analysis tools. Protein ClustersĪ collection of related protein sequences (clusters), consisting of Reference Sequence proteins encoded by complete prokaryotic and organelle plasmids and genomes. This resource allows investigators to obtain more targeted search results and quickly identify a protein of interest. Identical Protein GroupsĪ collection of consolidated records describing proteins identified in annotated coding regions in GenBank and RefSeq, as well as SwissProt and PDB protein sequences.
It provides annotated bibliographies of published reports of protein interactions, with links to the corresponding PubMed records and sequence data. HIV-1, Human Protein Interaction DatabaseĪ database of known interactions of HIV-1 proteins with proteins from human hosts. It also includes alignments of the domains to known 3-dimensional protein structures in the MMDB database. Conserved Domain Database (CDD)Ī collection of sequence alignments and profiles representing protein domains conserved in molecular evolution. This resource describes project scope, material, and objectives and provides a mechanism to retrieve datasets that are often difficult to find due to inconsistent annotation, multiple independent submissions, and the varied nature of diverse data types which are often stored in different databases. Databases BioProject (formerly Genome Project)Ī collection of genomics, functional genomics, and genetics studies and links to their resulting datasets.