CPTAC Terminology Files

The National Cancer Institute's Enterprise Vocabulary Services (EVS) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC) are collaborating to produce coded terminology to support data collection and integration efforts of CPTAC.

The National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.

Launched in 2011, CPTAC pioneered the integrated proteogenomic analysis of colorectal, breast and ovarian cancer to reveal new insights into these cancer types, such as identification of proteomic-centric subtypes, prioritization of driver mutations by correlative analysis of copy number alterations and protein abundance, and understanding cancer-relevant pathways through posttranslational modifications.

Data (genomics, proteomics, imaging), assays and reagents are made available to the public as a Community Resource in an effort to accelerate cancer research and advance patient care.

For information on the three ways to access the CPTAC data, please see the presentation: The NCI Thesaurus and CPTAC Terminology.

For more information on CPTAC, please visit: NCI Office of Cancer Clinical Proteomics Research

CPTAC terminology files are available for download from this NCI EVS ftp site (http://evs.nci.nih.gov/ftp1/CPTAC/):
There are three tabs in the file, the Baseline Forms, Follow-up Forms and the Codelists. Each file has column headers on the first row:
Spreadsheet Column Content Description
Subset Code The NCIt concept code attached to the subset concept. NCIt Codes are unique strings that begin with a C and are followed by a series of digits.
Subset Name The name of the subset (a group of terms with a common focus).
Concept Code The NCIt code attached to the concept.
NCIt Preferred Term The preferred term chosen by the NCIt staff that most unambiguously describes the concept.
NCIt Header Term The term chosen as a top node to its children.
NCIt Antiquated Term The outdated term used to describe the concept.
NCIt Definition The definition of the concept.
CPTAC Preferred Term The preferred term chosen by CPTAC for the concept.
CPTAC Synonym(s) Terms chosen by CPTAC that are synonomous to the Preferred Term.
CPTAC terminology is bundled into subsets, (groups of terms with a common focus). These are the names and definitions of the subsets. These are also the tabs in the spreadsheet.
Subset Name Subset Description
CPTAC Baseline Forms Terminology A category of intake forms used in establishing a baseline of clinical patient data of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) with focus on intake forms.
CPTAC Follow-up Forms Terminology A category of intake forms used to collect data in the follow-up period of the data collection efforts of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) with the focus on medical history form data.
CPTAC Codelists Terminology A category of terminology subsets used to support the various lists of data useful for tracking trends by CPTAC.
NCIt Version The version of the NCI Thesaurus that contains the data in the spreadsheet. The format of the version is YY.MO.week. The week is represented by a letter indicating the week in which the data was produced, a=the first week, b=the second week and so on.

Also included on the NCI EVS ftp site (http://evs.nci.nih.gov/ftp1/CPTAC/) are the following additional files:

About (This file)
Version information is contained in the last worksheet of each spreadsheet. The database is reconciled the last Monday of every month. The files will be posted during the following two weeks. The version appears as YR.MOweek. An example is 19.03d which corresponds to the year 2019, the month of March, and the "d" refers to the fourth Monday of the month.)
Archived files are available at:
Archive/ Directory of dated release versions.
Help requests on these files should go to NCIThesaurus@mail.nih.gov