May 21, 2004

Editing of NCI Thesaurus 04.04j was completed on April 30, 2004.  Version 
04.04j was April's 10th build in our development cycle.

The NCI Metathesaurus P040517 was exported May 17, 2004.  Its content
reflects editing completed through this date.  This version of the NCI
Metathesaurus is build M04AB based on the UMLS build 2004 AB.

This directory contains twelve files:

	ReadMe.txt			This file
	NCI_THESAURUS_license.txt	A description of the licensing terms of the NCI Thesaurus
	ThesaurusTermsofUse.htm		As above, in html format for web browsing
	Metathesaurus-P040517.zip	The NCI Metathesaurus version P040517 in RRF format.
	NCI_RRF_Addendum.pdf		Addendum to the NLM's RRF documentation, NCI-specific changes
	MMSYS.jar			NCI-specific extension to the NLM's MetamorphoSys
	mmsys.a.prop			NCI-specific MetamorphoSys configuration file
	mmsys.prop.sav			NCI-specific MetamorphoSys configuration file
	Thesaurus_04.04j.XML.zip	The NCI Thesaurus version 04.04j in Apelon's XML format
	Thesaurus_04.04j.FLAT.zip	The NCI Thesaurus 04.04j in flat file format
	Thesaurus_04.04j.OWL.zip	The NCI Thesaurus 04.04j in OWL
	ontylog.dtd			The Apelon XML's document type definition, from Apelon, Inc.


The NCI_THESAURUS_license.txt file contains the terms of use for the 
NCI Thesaurus.  Please refer to the NCI_THESAURUS_license.txt
file for the exact licensing terms. 


The Metathesaurus-P040517.zip contains the distribution (44 files) of the NCI 
Metathesaurus in the NLM's Rich Release Format (RRF).  Because the NCI Metathesaurus
doesn't contain all the data in the UMLS Metathesaurus, a number of tables and 
fields within tables in the RRF are empty.  This is documented in the 
NCI_RRF_Addendum.pdf file; the full documentation for the RRF can be obtained 
from the UMLSKS Knowledge Source Server page (http://umlsks.nlm.nih.gov).  In 
addition to the table/column differences, the NCI_RRF_Addendum.pdf file documents
the term types, source precedence order, sources, and the relationship attributes
in the NCI Metathesaurus.

The NLM's MetamorphoSys needs to be configured to work with the NCI Metathesaurus.
The mmsys.a.prop and mmsys.prop.sav files should be used instead of the default 
files.  These files take into account the NCI's different list of sources and
precedence order.  In addition, the MMSYS.jar file must be placed in the 
MMSYS\ext directory; this extension allows processing local CUIs (e.g. NCI 
CUIs that start with "CL").
	
The rest of the zip files unpack the following files:

	Thesaurus_04.04j.XML.zip	Thesaurus_04.04j.xml
	Thesaurus_04.04j.FLAT.zip	Thesaurus_04.04j.txt
	Thesaurus_04.04j.OWL.zip	Thesaurus.owl

In all three formats below, the ontology is in a defined state, i.e. 
relations are as stated by the editors, no inferred relations are
specified.

The Thesaurus_04.04j.xml file contains the entire terminology and associated 
ontologic constructions from the NCI Thesaurus, including properties, roles, 
and kinds.  The DTD for the XML is as defined by Apelon, Inc, whose editing 
tools are being used in the construction of the Thesaurus.  Properties of 
use only to the EVS (e.g. editor notes) are absent in the released terminology. 


The Thesaurus_04.04j.txt flat file is in tab-delimited format.  Included in this 
format are all the terms associated with NCI Thesaurus concepts (names and 
synonyms), a text definition of the concept (if one is present), and stated 
parent-child relations, sufficient to reconstruct the hierarchy.  The fields 
are:

	code <tab> concept name <tab> parents <tab> synonyms <tab> definition

The "parents" field contains the concept name(s) of the superconcept(s).
If a "parents" or "synonyms" field contains multiple entries, these 
are pipe-delimited.  For root concepts without "parents", this field
contains the string "root_node".  The first entry in the "synonyms" field 
is the preferred name of the concept.  If no preferred name has been stated
for the concept, this field contains the concept name.  The 
"definition" field contains only one definition if more than one 
definition is associated with the concept; not all concepts contain 
definitions.  

The Thesaurus.owl file contains the entire terminology expressed in the OWL 
web ontology language (http://www.w3.org/TR/owl-ref/), with the exception of
the Ontylog namespace declaration, which was deemed unnecessary.  The Ontylog
Roles where converted to restrictions on OWL properties, and most of the 
concept annotations in Ontylog properties were converted to OWL 
AnnotationProperty; as in the Ontylog xml file, properties of use only to 
the EVS (e.g. editor notes) are absent in the OWL file.  Because 
Roles in Ontylog are mapped from a domain kind to a range kind, the OWL 
version of the Thesaurus has each kind as a root class to facilitate the 
conversion of Roles to OWL properties.

For additional information, please see the Release Notes of caCORE 2.0.