NCBI Taxonomy: Upcoming Changes to Viruses

NCBI Taxonomy: Upcoming Changes to Viruses

To reflect changes to the International Code of Virus Classification and Nomenclature (ICVCN) made by the International Committee on Taxonomy of Viruses (ICTV), NCBI will add binomial species names to about 3000 viruses. These updates to NCBI Taxonomy are planned for spring 2025, but you can view the changes now in the ICTV’s Virus Metadata Resource. 

We recognize that the former species names like Human immunodeficiency virus 1 (HIV-1) are broadly used in public health, educational institutions, and research. To minimize the impact of this change on those who use NCBI resources, we will add the new binomial species names (e.g. Lentivirus humimdef1) while keeping the former names available in the lineage for each species. The former names will move below the new binomial species name in the taxonomy hierarchy, ensuring continuity. Examples are provided below.   Continue reading “NCBI Taxonomy: Upcoming Changes to Viruses”

AGP Files Will No Longer be Accepted for Genome Submissions

AGP Files Will No Longer be Accepted for Genome Submissions

Effective March 2025 

Do you submit genomes to NCBI’s GenBank? Beginning March 2025, GenBank will no longer accept AGP files for genome submissions. Historically, AGP files were submitted along with contigs as necessary information for constructing assemblies. However, thanks to technology improvements, more and more whole genome shotgun (WGS) sequences submitted to NCBI are gapped assemblies (assemblies with inserted Ns for gaps).   Continue reading “AGP Files Will No Longer be Accepted for Genome Submissions”

Explore 3D Molecular Structures with iCn3D

Explore 3D Molecular Structures with iCn3D

Do you want to analyze three-dimensional structures and highlight important features like active site residues, point mutations, and binding partners? Check out NCBI’s “I see in 3D” (iCn3D) – a free, web-based tool that allows you to explore the structure of a biomolecule at an atomistic level. 

Features & Benefits 
  • Interactively view 3D structure and corresponding sequence data 
  • Align a protein sequence with unknown structure to a sequence-similar 3D structure 
  • Interactively view 3D alignments of similar structures 
  • View the interaction interfaces in a structure 
  • Save your custom display as a short URL or PNG image 
  • Share a link to your customized display of a structure 
  • Incorporate iCn3D into your own pages 

Continue reading “Explore 3D Molecular Structures with iCn3D”

Try Out a Development Version of NCBI’s Publicly Available Annotation Tool, EGAPx

Try Out a Development Version of NCBI’s Publicly Available Annotation Tool, EGAPx

Latest release now available 

Are you generating genomes for vertebrates, arthropods, or plants, and looking for a way to generate high-quality genome annotation? NCBI is working on a public version of the NCBI Eukaryotic Genome Annotation Pipeline (EGAPx), and the latest developmental release is now available for testing and feedback. Continue reading “Try Out a Development Version of NCBI’s Publicly Available Annotation Tool, EGAPx”

RefSeq Release 227 is Available!

RefSeq Release 227 is Available!

Check out RefSeq release 227, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

What’s included in this release?

As of November 4, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 497,549,107 records, including
  • 377,783,847 proteins
  • 66,987,567 RNAs
  • Sequences from 159,324 organisms 

Continue reading “RefSeq Release 227 is Available!”

Update! Improving the Representation of Functional Data in ClinVar

Update! Improving the Representation of Functional Data in ClinVar

We want your feedback! 

As previously announced, NCBI is improving the way that functional data are submitted to ClinVar and how they are represented in the XML format and on the website. We have started enhancing support for functional data and would like your feedback!  

What’s new? 

We have updated the GitHub repository with: 

Continue reading “Update! Improving the Representation of Functional Data in ClinVar”

Expansion of Ortholog Data for RefSeq Arthropods

Expansion of Ortholog Data for RefSeq Arthropods

250K+ new Hymenoptera orthologs added 

NCBI is excited to announce the expansion of ortholog data for RefSeq arthropods. This update expands the breadth of arthropod orthology information, offering new insights into evolutionary biology, gene function, and shared pathways. Whether you’re studying insect genetics, developmental biology, or comparative genomics, the expanded ortholog data opens up new possibilities for research. Check out our previous blog to learn how to access the orthologs using NCBI Datasets.  Continue reading “Expansion of Ortholog Data for RefSeq Arthropods”

MANE v1.4 with MANE Select for Non-Coding Genes

MANE v1.4 with MANE Select for Non-Coding Genes

The next release (v1.4) of Matched Annotation from NCBI and EMBL-EBI (MANE) is here!  MANE is a collaborative dataset produced jointly by NCBI and EMBL-EBI that provides a representative transcript (MANE Select) for human protein-coding genes, to be used as universal standards for variant reporting and browser display. A second transcript, MANE Plus Clinical, is provided for genes where MANE Select alone is not sufficient to report all known variants. The new MANE release adds another important component to this high-value dataset – non-coding genes, some of which are known to be associated with human disease.   Continue reading “MANE v1.4 with MANE Select for Non-Coding Genes”

GenBank Release 263.0 Now Available!

GenBank Release 263.0 Now Available!

GenBank release 263.0 (10/19/2024) is now available on the NCBI FTP site. This release has 36.50 trillion bases and 5.13 billion records.

The current release has: 

  • 251,998,350 traditional records containing 4,250,942,573,681 base pairs of sequence data
  • 3,745,772,758 WGS records containing 31,362,454,467,668 base pairs of sequence data
  • 948,733,596 bulk-oriented TSA records containing 812,661,461,811 base pairs of sequence data
  • 187,349,395 bulk-oriented TLS records containing 77,037,504,468 base pairs of sequence data 
Continue reading “GenBank Release 263.0 Now Available!”
Upcoming Changes to NCBI’s BioSample Database

Upcoming Changes to NCBI’s BioSample Database

Improving Metadata Quality 

Do you rely on high-quality metadata from NCBI’s BioSample database? BioSample is home to metadata associated with the biological source materials used to generate raw reads, sequences, and other data submitted to NCBI. We are improving our submission process to ensure we provide you with more complete and robust information. 

Attention submitters! 

To improve metadata quality, we are introducing new submission validation steps and providing improved help documentation.  Continue reading “Upcoming Changes to NCBI’s BioSample Database”