=================================
Mouse Cortex + Hippocampus
=================================

RNA sequencing data of single cells isolated from >20 areas of mouse cortex and hippocamopus, including ACA, AI, AUD, CA, CLA, CLA;EPd, ENTl, ENTm, GU;VISC;AIp, HIP, MOp, MOs, ORB, PAR;POST;PRE, PL;ILA, PTLp, RSP, RSPv, SSp, SSs, SSs;GU, SSs;GU;VISC, SUB;ProS, TEa;PERI;ECT, VISal;VISl;VISli, VISam;VISpm, VISp, and VISpl;VISpor.  Abbreviations match the Allen Mouse Brain Atlas.

The data set includes 1,093,785 single cells.
10xv2 sequencing reads were aligned to the mouse pre-mRNA reference transcriptome (mm10) using the 10x Genomics CellRanger pipeline (version 3.0.0) with default parameters.
For more details, please see the Documentation tab in the Cell Types web application.


Gene expression data matrix (matrix.csv)
    This file csv contains one row for every cell in the dataset and one column for every gene sequenced. The values of the matrix represent counts (UMI)
	for that gene (column) for that cell (row).

		
Medians (medians.csv)
	A table of median expression values for each gene (rows) in each cluster (columns).  Medians are calculated by first normalizing gene expression as follows: norm_data = log2(CPM(exons+introns)), and then calculating the medians independently for each gene and each cluster.
	The first row lists the cluster name (cluster_label), which matches the cell type alias shown in the Transcriptomic Explorer.
	The first column lists the unique gene identifier (gene), which in most cases is the gene symbol.


Cell metadata (metadata.csv)
* Each item of this table (except "sample_name") has three columns:
	[item]_label
		Name of the item (e.g., "V1C" would be an example under "brain_region_label")
	[item]_order
		Order that the item will be displayed on the Transcriptomics Explorer 
	[item]_color
		Color that the item will be displayed on the Transcriptomics Explorer 

* Items in the sample information table:
	sample_name
		Unique sample identifier
	cluster
		Cell type cluster name
	cell_type_accession
		Cell type accession ID (see https://portal.brain-map.org/explore/classes/nomenclature for details)
	cell_type_alias
		Cell type alias (see https://portal.brain-map.org/explore/classes/nomenclature for details).  This is the same as "cluster".
	cell_type_alt_alias
		Cell type alternative alias, if any (see https://portal.brain-map.org/explore/classes/nomenclature for details)
	cell_type_designation
		Cell type label (see https://portal.brain-map.org/explore/classes/nomenclature for details)
	class
		Broad cell class (for example, "GABAergic", "Non-neuronal", and "Glutamatergic")
	subclass
		Cell type subclass (for example, "SST", "L6 CT", and "Astrocyte")
	external_donor_name
		Unique identifier for each mouse donor
	donor_sex
		Biological sex of the donor
	cortical_layer
		Cortical layer targeted for sampling. Cells with cortical_layer=0 are non-cortical cells. 
	region
		Brain region targeted for sampling
	subregion
		Brain sub-region targeted for sampling (e.g., anterior vs. posterior), if any
	full_genotype
		Full genotype of the transgenic mouse donor
	facs_population_plan
		FACS gating criteria used to sort labeled cells
	injection_materials
		Specific virus injected into the mouse.  Blank values for this and subsequent columns indicate that no injection was performed.
	injection_method
		Method used for virus injection (Nanoject, Retro-Orbital)
	injection_roi
		Center of injection site. Abbreviations match the Allen Mouse Brain Atlas.
	propagation_type
		Type of viral propogation (retrograde, anterograde)
	
	
UMAP coordinates (tsne.csv)
UMAP coordinates (the filename is a misnomer) for each sample shown on the Transcriptomics Explorer.  UMAP is a method for dimensionality reduction of gene expression that is  well suited for data visualization.
	sample_name
		Unique sample identifier
	tsne_1
		First coordinate (again, these are actually UMAP coordinates for this dataset)
	tsne_2
		Second coordinate (again, these are actually UMAP coordinates for this dataset)

		
Taxonomy of cell types (dend.json)
	Serialized cell type hierarchy with all node information embedded in json format.
	The dendrogram shown at the top of the Transcriptomics Explorer, including the underlying cell type order, is derived from this file.
	

Taxonomy metadata (taxonomy.txt)
	Tracking taxonomy meta-data is critical for reproducibility.  This file is a draft of taxonomy meta-data to be stored.  See the "Tracking taxonomies" section at https://portal.brain-map.org/explore/classes/nomenclature for details of each descriptor.
	

Gene information (**STORED ELSEWHERE**)
* To access this file, please use the following link: http://celltypes.brain-map.org/api/v2/well_known_file_download/694413985
* Within that zip file, the gene information is located in "mouse_VISp_2018-06-14_genes-rows.csv".  All other files can be ignored.
	gene_symbol
		Gene symbol
	gene_id
		This is an Allen Institute gene ID that can be ignored
	chromosome
		Chromosome location of gene
	gene_entrez_id
		NCBI Entrez ID
	gene_name
		Gene name

		
Gene ".gtf" file (**STORED ELSEWHERE**)
* To access this file, please use the following link: http://celltypes.brain-map.org/api/v2/well_known_file_download/502999254
.gtf is a standard format for localizing various aspects of transcripts within a specific genome and information about this format is plentiful.
As of 1 October 2019, one active link describing this format is here: https://www.gencodegenes.org/pages/data_format.html