NCIt Subset Code	NCIt Subset Name	NCIt Concept Code	NCIt PT	pFDA PT	NCIt Definition
C188687	pFDA Bioinformatics/Genomics Terminology	C153367	BED Format	Browser Extensible Data	A tab-delimited text file format that allows the specification of the sequence data that is displayed in an annotation track. The minimum required information is chromosome, start position, and end position.
C188687	pFDA Bioinformatics/Genomics Terminology	C153249	Binary Alignment Map	Binary Alignment Map	A binary representation of a sequence alignment map compressed by the BGZF library.
C188687	pFDA Bioinformatics/Genomics Terminology	C17964	Bioinformatics	Bioinformatics	Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine. (M. Nilges and Jens P. Linge, Unite de Bio-informatique Structurale, Institut Pasteur, Paris)
C188687	pFDA Bioinformatics/Genomics Terminology	C116155	Biopolymer Sequencing	Sequencing	A process to identify and determine the primary structure of, and the order of constituents in a biopolymer.
C188687	pFDA Bioinformatics/Genomics Terminology	C188483	Cheminformatics	Cheminformatics	A branch of informatics focused on chemical data.
C188687	pFDA Bioinformatics/Genomics Terminology	C17961	DNA Methylation	DNA methylation	The process by which methyl groups are added to nucleotides in genomic DNA.
C188687	pFDA Bioinformatics/Genomics Terminology	C47845	FASTA Format	FASTA	A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column.  Sequences are represented in the standard IUB/IUPAC single letter amino acid and nucleic acid codes, with a single hyphen or dash being used to represent a gap of indeterminate length; in amino acid sequences asterix ("*") can represent a translation stop.
C188687	pFDA Bioinformatics/Genomics Terminology	C153250	FASTQ Format	FASTQ	A text-based format for storing a biological sequence that encodes the nucleotide calls as well as their quality scores.
C188687	pFDA Bioinformatics/Genomics Terminology	C84343	Genomics	Genomics	The study of the structure, function, expression, evolution, mapping and editing of genomes.
C188687	pFDA Bioinformatics/Genomics Terminology	C45447	Genotyping	Genotyping	The determination of the DNA sequence of an individual.
C188687	pFDA Bioinformatics/Genomics Terminology	C99752	Indel Mutation	INDEL	A mutation class that includes insertion mutations, deletion mutations and mutation events where both an insertion and a deletion has occurred.
C188687	pFDA Bioinformatics/Genomics Terminology	C54683	International Chemical Identifier	InChI	A textual identifier for chemical substances designed to provide a standard and human-readable way to encode molecular information that also facilitates searches in printed and electronic data sources.
C188687	pFDA Bioinformatics/Genomics Terminology	C133910	MDL Molfile Format	MOLFILE	A chemical text file format developed by Molecular Design Limited (MDL) that represent information about molecular atoms, bonds, connectivity and coordinates. The file extension is .mol.
C188687	pFDA Bioinformatics/Genomics Terminology	C133996	MDL Structure-data File Format	Structure Data File	A family of chemical text file formats developed by Molecular Design Limited (MDL) that represent multiple chemical structural records and associated data fields. The file extension is .sd or .sdf.
C188687	pFDA Bioinformatics/Genomics Terminology	C153191	Metagenomics	Metagenomics	The direct study of genetic material recovered from environmental samples, largely dominated by microbial organisms.
C188687	pFDA Bioinformatics/Genomics Terminology	C101293	Next Generation Sequencing	Next-Generation Sequencing	Technologies that facilitate the rapid determination of the nucleotide sequence of large numbers of strands or segments of DNA or RNA.
C188687	pFDA Bioinformatics/Genomics Terminology	C153349	Nucleotide Sequence Read	Sequencing read	The manual or automated determination of the nucleotide order in a nucleic acid fragment obtained after the completion of a sequencing process.
C188687	pFDA Bioinformatics/Genomics Terminology	C20085	Proteomics	Proteomics	The global analysis of cellular proteins. Proteomics uses a combination of sophisticated techniques including two-dimensional (2D) gel electrophoresis, image analysis, mass spectrometry, amino acid sequencing, and bio-informatics to resolve comprehensively, to quantify, and to characterize proteins. The application of proteomics provides major opportunities to elucidate disease mechanisms and to identify new diagnostic markers and therapeutic targets.
C188687	pFDA Bioinformatics/Genomics Terminology	C188688	Sequence Alignment	Alignment	The process of arranging protein, DNA or RNA sequences to identify regions with similar sequences that may elucidate functional, structural, or evolutionary relationships between the sequences.
C188687	pFDA Bioinformatics/Genomics Terminology	C153248	Sequence Alignment Map	Sequence Alignment Map	A tab-delimited, text-based format for storing biological sequences aligned to a reference sequence. A SAM file includes an optional header section and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information, such as mapping position, and a variable number of optional fields.
C188687	pFDA Bioinformatics/Genomics Terminology	C18279	Single Nucleotide Polymorphism	Single-Nucleotide Polymorphisms	A variation of a single nucleotide at a specific location of the genome due to base substitution, present at an appreciable frequency between individuals of a single interbreeding population.
C188687	pFDA Bioinformatics/Genomics Terminology	C129888	Single Nucleotide Polymorphism Profile	SNP genotyping	The analysis of all of the single nucleotide polymorphisms in the genome of a biological sample.
C188687	pFDA Bioinformatics/Genomics Terminology	C164674	Single Nucleotide Variant	Single-Nucleotide Variant	A variation of a single nucleotide at a specific location of the genome due to base substitution, which is found at any frequency in the population.
C188687	pFDA Bioinformatics/Genomics Terminology	C188689	Single Nucleotide Variant Genotyping	SNV genotyping	The measurement of genetic variations of single nucleotide variants (SNVs) between members of a species.
C188687	pFDA Bioinformatics/Genomics Terminology	C153189	Transcriptomics	Transcriptomics	A study of the complete set of RNA transcripts that are produced by the genome, under specific circumstances or in a specific cell.
C188687	pFDA Bioinformatics/Genomics Terminology	C172216	Variant Call File Format	Variant Call Format	A text-based electronic file used for storing gene sequence variation data. The first text section is composed of a header containing the metadata and keywords used in the file. This is followed by the body of the file which is tab-separated into eight mandatory data columns for each sample. Additionally, the body of the file can include an unlimited number of optional columns to record other sample-related data.
C188687	pFDA Bioinformatics/Genomics Terminology	C188690	Variant Calling	Variant calling	Technology that detects differences between an individual's DNA sequence and a reference DNA sequence.
C188687	pFDA Bioinformatics/Genomics Terminology	C101295	Whole Exome Sequencing	Whole-exome sequencing	A procedure that can determine the DNA sequence for all of the exons in an individual.
C188687	pFDA Bioinformatics/Genomics Terminology	C101294	Whole Genome Sequencing	Whole-Genome Sequencing	A procedure that can determine the DNA sequence for nearly the entire genome of an individual.