library(scRNAseq)
7 Workshop
7.1 Overview
The goal of this workshop is to build a workflow with some example single-cell RNA-seq data.
7.2 Data
The scRNAseq package provides convenient access to several publicly available data sets in the form of SingleCellExperiment
objects. The focus of this package is to capture datasets that are not easily read into R with a one-liner from, e.g., read_csv()
. Instead, the necessary data munging is already done so that users only need to call a single function to obtain a well-formed SingleCellExperiment
.
To see the list of available datasets, use the listDatasets()
function:
<- listDatasets()
out out
DataFrame with 61 rows and 5 columns
Reference Taxonomy Part Number
<character> <integer> <character> <integer>
1 @aztekin2019identifi.. 8355 tail 13199
2 @bach2017differentia.. 10090 mammary gland 25806
3 @bacher2020low 9606 T cells 104417
4 @baron2016singlecell 9606 pancreas 8569
5 @baron2016singlecell 10090 pancreas 1886
... ... ... ... ...
57 @zeisel2018molecular 10090 nervous system 160796
58 @zhao2020singlecell 9606 liver immune cells 68100
59 @zhong2018singlecell 9606 prefrontal cortex 2394
60 @zilionis2019singlec.. 9606 lung 173954
61 @zilionis2019singlec.. 10090 lung 17549
Call
<character>
1 AztekinTailData()
2 BachMammaryData()
3 BacherTCellData()
4 BaronPancreasData('h..
5 BaronPancreasData('m..
... ...
57 ZeiselNervousData()
58 ZhaoImmuneLiverData()
59 ZhongPrefrontalData()
60 ZilionisLungData()
61 ZilionisLungData('mo..
You can load a dataset the following way:
<- ZeiselBrainData()
sce sce
class: SingleCellExperiment
dim: 20006 3005
metadata(0):
assays(1): counts
rownames(20006): Tspan12 Tshz1 ... mt-Rnr1 mt-Nd4l
rowData names(1): featureType
colnames(3005): 1772071015_C02 1772071017_G12 ... 1772066098_A12
1772058148_F03
colData names(10): tissue group # ... level1class level2class
reducedDimNames(0):
mainExpName: endogenous
altExpNames(2): ERCC repeat
7.3 Tasks
Pick a scRNA-seq dataset that has more than 5,000 cells and load the
SingleCellExperiment
(orsce
) object.Show the number of number of genes and number of observations in the
sce
object.Using the material we learned in the lecture, analyze the scRNA-seq data using the Biocondutor packages we learned about. This should include (but not be limited to)
- Quality control (you must use at least two different QC metrics)
- Normalization
- Feature selection using highly variable genes
- Dimensionality reduction using PCA
- Data visualization using tSNE or UMAP
- Unsupervised clustering (your choice of method!)
At the end of your analysis, show a plot of both (i) the PCA plot and (ii) either the tSNE or UMAP plot with the colors represented by the predicted labels from the clustering algorithm.
For each component described in Task #3, write 3-4 sentences naming and describing the idea behind the methodology you used, along with interpreting the output.
# Add your solution here
7.3.1 Useful tips
- If the original dataset was not provided with Ensembl annotation, we can map the identifiers with
ensembl=TRUE
. Any genes without a correspondingEnsembl
identifier is discarded from the dataset.
<- ZeiselBrainData(ensembl=TRUE) sce
Warning: Unable to map 1565 of 20006 requested IDs.
head(rownames(sce))
[1] "ENSMUSG00000029669" "ENSMUSG00000046982" "ENSMUSG00000039735"
[4] "ENSMUSG00000033453" "ENSMUSG00000046798" "ENSMUSG00000034009"