Materials: PresentationVideos: Data Exploratory Classification DEA (Statistics) DEA (ANOVA) Enrichment Single Cell Task2
Before we start, let’s download the data and install the software for ThermoFisher / Affymetrix microarrays data analysis.
Let us try to work with gene expression data directly in Excel. This is definetely not the best choice, but will help you to feel the data. Please download a subset of TCGA LUSC data: lusc20.txt
It contains gene expression for 10 normal and 10 cancer lung tissues.
Download the data and save as txt file.
Import data to Excel. Ensure that you use “Open..” from Excel. Otherwise gene names (e.g. SEPT11) will be damaged!
Alternatively (in some systems the problems with decimal separator are severe!) please use prepared lusc20.xlsx file
average expression for each gene: global, normal, cancer
exclude genes that are not detected (this is already done for lusc20.xlsx)
log fold chage: logFC = MeanTumour - MeanNormal (“-”, not “/”, as we work in log scale)
perform a t-test comparing tumour and normal tissues
assign rank to p-value, either manualy or by
estimate FDR: FDR = m * pv / k, where m - number of genes, pv - p-value, k - rank (1..m)
MA-plot (x: AverageGlobal, y:logFC)
Volcano-plot (x: logFC, y: -log10(FDR))
Transcriptome Analysis Console is a user-friendly tool for analysis ex.Affymetrix arrays, that are bought niw by ThermoFisher. Please install it and import CEL files from SCC_CEL.zip.
You might need registration to download library files. If you do not have one, use login: email@example.com Ask for the password.
Annotate and import the data, Samples with “N” in the name come from normal tissue. In order to speed-up the analysis - select “Gene” insted of “Gene+Exon” (optional).
See PCA visualization
Perform differential expression analysis (DEA)
Export the results of DEA and the data
Check functional annotation of the significant genes
This task was generated for you to give a better control over the material. Please, perform analysis of timecourse experiment for IFNg-stimulated A375 cell line. The dataset is discussed in Nazarov et al, 2013. Do not hesitate contacting me to ask questions.
The data are available in 2 versions:
Please try to use (1) to investigate issues of importing to Excel. Hint: decimal separator is “.” and GeneSymbol should be “Text”. If you find it difficult - use (2).
Remove transcript clusters (rows) which have no GeneSymbol or which have maximal expression over all samples below 5
Find mRNA activated by IFNg-stimulation at 24h: T24 vs T00. Consider genes with FDR<0.01
Put these genes (GeneSymbol) to Enrichr and investigate their biological functions
Principal Component Analysis Explained Visually by Victor Powell
Use t-SNE Effectively Wattenberg, et al., “How to Use t-SNE Effectively”, Distill, 2016.
Understanding UMAP by Andy Coenen and Adam Pearce
Dimensionality Reduction for Data Visualization: PCA vs TSNE vs UMAP vs LDA by Sivakar Sivarajah
Tensorflow Playground to understand classification with neural networks.