=================================
Human Cortex
=================================

This data set includes single-nucleus transcriptomes from 76,533 total nuclei derived from 2 post-mortem human brain specimens, to survey 
cell type diversity in the primary motor cortex (M1C or M1). The data were generated as part of a BICCN collaboration to characterize cell type 
diversity in M1 across species and data modalities. In total, 127 transcriptomic cell types were identified. 

Samples were processed using the 10x Chromium Single Cell 3’ Reagent Kit v3. 10x chip loading and sample processing was done according to the 
manufacturer’s protocol. Gene expression was quantified using the default 10x Cell Ranger v3 pipeline except substituting the curated genome 
annotation used for SMART-seq v4 quantification. Introns were annotated as “mRNA,” and intronic reads were included in expression quantification.


Gene expression data matrices (matrix.csv)
    This file csv contains one row for every cell in the dataset and one column for every gene sequenced. The values of the matrix represent total reads (introns + exons) 
	for that gene (column) for that cell (row).

		
Trimmed means (trimmed-means.csv) 
	A table of trimmed means for each gene (rows) in each cluster (columns).  Trimmed means are calculated by first normalizing gene expression as follows: norm_data = log2(CPM(exons+introns)), and then calculating the average expression of the middle 50% of the data (e.g., after excluding the 25% highest and 25% lowest expression values) independently for each gene and each cluster.
	The first row lists the cluster name (cluster_label), which matches the cell type alias shown in the Transcriptomic Explorer.
	The first column lists the unique gene identifier (gene), which in most cases is the gene symbol.


Cell metadata (metadata.csv)
* Each item of this table (except "sample_name") has three columns:
	[item]_label
		Name of the item (e.g., "V1C" would be an example under "brain_region_label")
	[item]_order
		Order that the item will be displayed on the Transcriptomics Explorer 
	[item]_color
		Color that the item will be displayed on the Transcriptomics Explorer 

* Items in the sample information table:
	sample_name
		Unique sample identifier
	cluster
		Cell type cluster name
	cell_type_accession
		Cell type accession ID (see https://portal.brain-map.org/explore/classes/nomenclature for details)
	cell_type_alias
		Cell type alias (see https://portal.brain-map.org/explore/classes/nomenclature for details).  This is the same as "cluster".
	cell_type_alt_alias
		Cell type alternative alias, if any (see https://portal.brain-map.org/explore/classes/nomenclature for details)
	cell_type_designation
		Cell type label (see https://portal.brain-map.org/explore/classes/nomenclature for details)
	class
		Broad cell class (for example, "GABAergic", "Non-neuronal", and "Glutamatergic")
	subclass
		Cell type subclass (for example, "SST", "L6 CT", and "Astrocyte")
	external_donor_name
		Unique (de-identified) identifier for each human donor
	donor_sex
		Biological sex of the donor
	cortical_layer
		Cortical layer targeted for sampling
	region
		Brain region targeted for sampling
	full_genotype
		(These slots are left blank for human)
	
	
TSNE coordinates (2d-coordinates.zip)
t-Distributed Stochastic Neighbor Embedding (t-SNE) coordinates for each sample shown on the Transcriptomics Explorer.  t-SNE is a method for dimensionality reduction of gene expression that is  well suited for data visualization (as of 1 October 2019, a comprehensive t-SNE resource is available here: https://lvdmaaten.github.io/tsne/)
	sample_name
		Unique sample identifier
	tsne_1
		First t-SNE coordinate
	tsne_2
		Second t-SNE coordinate

		
Taxonomy of clusters (dend.json)
	Serialized cluster hierarchy with all node information embedded in json format.
	The dendrogram shown at the top of the Transcriptomics Explorer, including the underlying cluster order, is derived from this file.
	

Taxonomy information (taxonomy.txt)
	Tracking taxonomy meta-data is critical for reproducibility.  This file is a draft of taxonomy meta-data to be stored.  See the "Tracking taxonomies" section at https://portal.brain-map.org/explore/classes/nomenclature for details of each descriptor.
	

Gene information (**STORED ELSEWHERE**)
* To access this file, please use the following link: http://celltypes.brain-map.org/api/v2/well_known_file_download/694416044
* Within that zip file, the gene information is located in "human_MTG_2018-06-14_genes-rows.csv".  All other files can be ignored.
	gene
		Gene symbol
	chromosome
		Chromosome location of gene
	entrez_id
		NCBI Entrez ID
	gene_name
		Gene name
	mouse_homologenes
		Mouse ortholog

		
Gene ".gtf" file (**STORED ELSEWHERE**)
* To access this file, please use the following link: http://celltypes.brain-map.org/api/v2/well_known_file_download/502175284
.gtf is a standard format for localizing various aspects of transcripts within a specific genome and information about this format is plentiful.
As of 1 October 2019, one active link describing this format is here: https://www.gencodegenes.org/pages/data_format.html