Tips and tricks

What is RefSeq annotation?

What is RefSeq annotation?

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products. RefSeq was first introduced in 2000.

What is a RefSeq genome?

RefSeq genomes are copies of selected assembled genomes available in GenBank. RefSeq transcript and protein records are generated by several processes including: Computation. Eukaryotic Genome Annotation Pipeline. Prokaryotic Genome Annotation Pipeline.

What is RefSeq Mrna?

A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.

Is RefSeq redundant?

In contrast, RefSeq represents a nearly non-redundant collection that is a synthesis and summary of available information, and represents the ‘current’ view of the sequence information, names and other annotations. RefSeq records can be distinguished from GenBank records by the format of the accession series.

How does RefSeq work?

RefSeq Updates and Removal For other species, the entire RefSeq dataset is generated when a new annotated genome becomes available in the public archives, or the dataset may be updated if a collaborating group provides an update for the annotated genome.

What is RefSeq status?

Curated records can be identified by the RefSeq status code of REVIEWED or VALIDATED. This status is displayed in the COMMENT area of the record. Records that have been supplied by a collaborating group are marked as curated by that group, with the group or database identified.

How many sequences are in RefSeq?

Abstract. NCBI’s reference sequence (RefSeq) database (Author Webpage) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2 879 860 proteins (RefSeq release 19).

How often is RefSeq updated?

The RefSeq Select dataset is refreshed daily as the selection of prokaryote representative genomes is refined and individual genomes are re-annotated. It currently includes about one-third of the prokaryote RefSeq protein dataset.

What is the difference between GenBank and RefSeq databases?

GenBank sequence records are owned by the original submitter and cannot be altered by a third party. RefSeq sequences are not part of the INSDC but are derived from INSDC sequences to provide non-redundant curated data representing our current knowledge of known genes.

What does RefSeq stand for?

RefSeq: NCBI Reference Sequence Database RefSeq: NCBI Reference Sequence Database A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.

What is a version number change in a RefSeq record?

A version number change (e.g., NM_111111.1 -> NM_111111.2) occurs to a RefSeq record when there is any update to the sequence of that record. Sequence updates include the alteration, addition, or removal of nucleotides or amino acids from a record.

What is the difference between a RefSeq and GenBank accession?

Another distinction is that transcripts and proteins annotated on RefSeq genomic records are instantiated as separate records; in contrast, GenBank only instantiates the proteins annotated on genomic sequence records. The sequence of a RefSeq accession is identical to that of a GenBank accession.

What is a RefSeq genome assembly?

The RefSeq archaeal and bacterial genome assemblies are annotated and maintained copies of complete and whole-genome shotgun assemblies submitted to INSDC (Genbank, ENA and DDBJ) that meet sequence and annotation quality criteria. A genome assembly may be excluded from RefSeq for reasons related to sequence or annotation quality.