Cellosaurus Disease Terminology Files

The Cellosaurus is a knowledge resource on cell lines. It attempts to describe all cell lines used in biomedical research. It includes immortalized cell lines; naturally immortal cell lines (such as stem cell lines); finite life cell lines when those are distributed and used widely. It encompass vertebrate and invertebrate (insects and ticks) cell lines.

The Cellosaurus disease terminology is the set of terms used to annotate cell lines originating from a diseased patient or animal.

For inherited diseases, the relevant terms are only applied to cell lines originating from individuals suffering or at risk for a disease and not for individuals that are carriers for such disease. For cancers, the relevant terms are only applied to cell lines established from a cancerous tissue.

The Cellosaurus is developed by Dr. Amos Bairoch of the CALIPHO group at the Swiss Institute of Bioinformatics.

Cellosaurus terminology files are available for download from this NCI EVS ftp site in three formats:
Cellosaurus_Disease_Terminology.xls (Microsoft Excel 2003)*
Cellosaurus_Disease_Terminology.xlsx
Cellosaurus_Disease_Terminology.txt (Tab-delimited text)

Each file has column headers on the first row:
Spreadsheet Column Content Description
Subset Code The NCI Thesaurus (NCIt) concept code attached to the Cellosaurus concept. NCIt Codes are unique strings that begin with a C and are followed by a series of digits.
Subset Name The name of the terminology set.
Concept Code The NCI Thesaurus (NCIt) concept code attached to the concept.
Cellosaurus Preferred Term The preferred term chosen by Cellosaurus for the concept; it is identical to the NCIt preferred term.
NCIt Preferred Term The preferred term chosen by the NCIt staff that unambiguously describes the concept.
NCIt Synonym A term that is synonymous with the preferred term.
NCIt Definition A text definition of the term created by subject matter experts at the NCIt.

Also included on the NCI EVS ftp site are the following additional files:

About (This file.)
Changes (A text file of changes between the most recent and the current version of Cellosaurus Disease Terminology. For each change record, the Changes.txt contains a complete row of tab delimited data with the same data elements as described above. An "A" will precede any new concept additions, a "C" will precede any modification to existing concepts, and a "D" will precede any concepts that have been deleted.)
Version (A text file that contains the version of NCI Thesaurus that corresponds to the current spreadsheet data. The database is reconciled the last Monday of every month. The files will be posted during the following two weeks. The version appears as YR.MOweek. An example is 19.06d which corresponds to the year 2019, the month of June, and the "d" refers to the fourth Monday of the month.)
N.B.: If there are no changes to the data for a particular month, the files will not be reposted. Archived files are available at: • Archive Directory of dated release versions. Help requests on these files should go to NCIThesaurus@mail.nih.gov

* If an attempt to view the Excel spreadsheet results in a page of nonsense characters, check the security settings in Excel to permit viewing. This is achievable by clicking the highlighted bar above the data, but below the menu bar in the spreadsheet.