NCIt Neoplasm Core Terminology Files

The NCIt Neoplasm Core value set provides a core reference set of NCIt neoplasm classification concepts that is designed to facilitate consistent coding, analysis, and data sharing across a broad range of NCI and related resources. These files provide a comprehensive collection of key terms, definitions, simplified hierarchies, mappings to dozens of other terminologies, and molecular characteristics, all linked back to online EVS resources; they are updated with each monthly release of NCIt.

Background: NCIt provides NCI's comprehensive cancer classification terminology and ontology, and NCIm provides cross-mappings to many related terminologies of interest to the NCI community. NCIt includes rigorous human and machine readable definitions of many thousands of distinct types of neoplasm, organized in logic based parent-child hierarchies. NCIt is used in many NCI and other systems around the world for cancer research, care, and reference, and is useful in interpreting and translating information from many other resources. The need has long been felt for a smaller core reference set of cancer classification terms that are most commonly used for research, care, and public health purposes.

Organization: This draft set includes 1,358 neoplasm concepts (16%) out of the 8,399 currently in NCIt. It is intended to include all neoplasms frequently encountered in research and clinical settings. Beyond that, EVS estimates that this set includes approximately 80% of infrequent and rare neoplasms encountered in such settings, as well as 60% of the specific histopathologic variants of malignant neoplasms. Most of these concepts have tracked uses in coding NCI's main clinical trial and metadata repositories.

Set membership and structure, including hierarchy and mapping relationships, will be adjusted and extended in response to user feedback. This set will be made available in multiple views. It is an integral subset of NCIt, so users can easily cross-walk to more detailed classifications, ontology, terminology mappings, and links to key cancer information resources.

This third monthly release includes four components:

  1. Value Set: 1,358 concepts, available from the NCI Term Browser, LexEVS APIs, and the files on this FTP site. An RDF/SPARQL endpoint is also available internally, and is being prepared for public use. The files here are available in these forms:

    Both files have column headers on the first row:

    Column Content Description
    Code The NCIt concept code, a unique identifier assigned to each concept by EVS to permanently track a specific meaning.
    Preferred Term The term chosen by EVS as most unambiguous and widely used in the biomedical community.
    Synonyms Additional term(s) chosen by NCI with meaning equivalent to the Preferred Term, separated by vertical bars ("|").
    Definition A text definition of the term created by EVS subject matter experts.
    Neoplastic Status The morphologic, clinical, and genetic profile of a neoplastic growth that defines it as non-cancerous, cancerous, or of uncertain cancerous potential.

  2. Simplified Hierarchy: The main draft hierarchy organizes Neoplasm Core concepts in parent-child "is-a" relationships under the main NCIt subcategories of Neoplasm by Site and Neoplasm by Morphology. Only the relationships between Core concepts appear, in their most specific placements, yielding a hierarchy much simpler than the full NCIt hierarchy. The hierarchy is available in two main forms, with two "Plus" forms that add an NCIm CUI/link (HTML only) and Neoplastic Status flag to each Core term:

    NCIt users have asked that we also provide the earlier draft hierarchy arrangement under Neoplastic Status headings of Malignant, Benign, and Undetermined. These updated files assign each Core hierarchical group based on the status of the top parent concept, so the Undetermined section includes many malignant and benign child concepts. As with the main hierarchy files, there are html and text files that follow the original format, and "Plus" forms that add an NCIm CUI/link (HTML only) and Neoplastic Status flag to each Core term:

    The original, manually curated draft hierarchy is available in the Archive directory, in two forms:

    Several simplified hierarchies may well be needed; user input will be important in deciding on the most useful organization and scope for these.

    Once there is a stable simplified reference hierarchy, EVS plans to retire the old Common Neoplasm (Code C7077) header concept and hierarchy, including these four child concepts: Common Carcinoma, Common Connective and Soft Tissue Neoplasm, Common Germ Cell Tumor, and Common Hematopoietic Neoplasm. These were created in 2003 to help satisfy such requests when no better solution was readily available, but structured value sets provide a more appropriate representation. Please say if this transition may cause problems for your current use of NCIt.

  3. Mappings: Each NCIt Core concept is shown with its NCIm code and term. 30% (411) do not have equivalent terms in other terminologies, so the remaining cells are empty. Where other sources in NCIm do have equivalent terms, a separate row is created for each unique term mapping. The mappings tables are available in three forms:

    All three files have column headers on the first row, as listed below:

    Column Content Description
    NCIt Code The NCIt concept code, a unique identifier assigned to each concept by EVS to permanently track a specific meaning.
    NCIt Preferred Term The term chosen by EVS as most unambiguous and widely used in the biomedical community.
    NCIm CUI The NCI Metathesaurus unique identifier assigned to each concept by EVS; UMLS Metathesaurus CUIs are used where they exist.
    NCIm Preferred Name The concept name preferred within the Metathesaurus environment.
    NCIm Source The NCIm terminology source with a term that has the same meaning. NCIm short names for each source can be found on the NCIm Browser at https://ncim.nci.nih.gov/ncimbrowser/pages/source_help_info.jsf
    Term Type The term type in that source. NCIm source term types can be found on the NCIm Browser at https://ncim.nci.nih.gov/ncimbrowser/pages/term_type_help_info.jsf
    Source Code The code associated with this term in that source.
    Source Term The term in that source.

    These term mappings are also available as simple lists, with equivalent NCIm source terms in indented lists under each Core term. This is available in two formats:

    An earlier draft Mappings file provided a single row for each Core concept, with a summary list of any NCIm sources with equivalent terms. It is available in the Archive directory:

    We will update this only if users express interest in that approach.

  4. Molecular Relationships: NCIt Neoplasm concepts are characterized by well over 100,000 role relationships to other NCIt concepts, which in turn have potentially revealing connections to many others. This initial draft relationship report provides a simple listing for just 6 roles, with more than 4,900 assertions for Core neoplasm concepts, that are most informative about the molecular characteristics of neoplasms: These roles connect
    1. Specific genes and fusion genes to specific histopathologic types of neoplasms, when these genes and fusion genes are directly involved in the pathogenesis of the specific neoplasms;
    2. Specific histopathologic types of neoplasms to the genes that are involved in their pathogenesis;
    3. Cytogenetic abnormalities to specific histopathologic types of neoplasms, when these abnormalities define the specific neoplasms; and
    4. Molecular abnormalities to specific histopathologic types of neoplasms, when these abnormalities define the specific neoplasms.

    The molecular relationship tables are available in three forms:

    All three files have column headers on the first row, as listed below:

    Column Content Description
    Code The NCIt concept code for the first concept in the relationship.
    Preferred Term The NCIt preferred term for the first concept in the relationship.
    Relationship The role relationship connecting the first concept to the second concept.
    Code The NCIt concept code for the second concept in the relationship.
    Preferred Term The NCIt preferred term for the second concept in the relationship.

    Future releases may extend the coverage of molecular relationships, and may display relationship groupings that show, for example, how a particular chromosomal translocation relates to the corresponding fusion protein, and how these may both relate to prognosis. Feedback is invited on what information would be most useful.

All components of the NCIt Neoplasm Core release can be found in this NCI EVS ftp site directory, which includes an Archive subdirectory for date-labeled copies of all release files:
http://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Neoplasm/
http://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Neoplasm/Archive/

Help requests on these files should go to NCIThesaurus@mail.nih.gov