Directly to content
  1. Publishing |
  2. Search |
  3. Browse |
  4. Recent items rss |
  5. Open Access |
  6. Jur. Issues |
  7. DeutschClear Cookie - decide language by browser settings

Cell type classification for multi-sample multi-condition comparisons in single-cell RNA sequencing data

Frauhammer, Felix

[thumbnail of Black links for printing on paper]
Preview
PDF, English (Black links for printing on paper) - main document
Download (15MB) | Lizenz: Public Domain

[thumbnail of Colored links for computer screen]
Preview
PDF, English (Colored links for computer screen) - main document
Download (15MB) | Lizenz: Public Domain

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

Abstract

Multicellular organisms require specialized cell types in order to function. While a widely accepted definition does not exist, cell types are regarded as groups of cells with similar properties, such as RNA expression, protein abundance and epigenetic modification. Single-cell RNA sequencing (scRNAseq) is a recent breakthrough for explor- ing cell types, providing expression estimates for all genes in thousands of individual cells. Using data-driven algorithms, such as unsupervised clus- tering, scRNAseq has discovered new cell types and created large reference data sets, next to other exploratory achievements. More recently, scRNA- seq was applied to patient cohorts that include different groups, for example disease and healthy or disease subtypes. These multi-sample multi-condition data sets enable statistical inferences between groups, such as differential ex- pression testing. In contrast to projects exploring unknown tissues or species, patient cohorts often study known cell types defined by specific marker genes. Here, I present Pooled Count Poisson Classification (PCPC), a novel cell type classification approach designed for inference with multi-sample multi- condition scRNAseq data sets. PCPC implements a statistical model that allows researchers to distinguish cells according to marker-based cell type definitions, enabling reproducible and comparable analysis between data sets and technologies (e.g. scRNAseq and flow cytometry). Specifically, PCPC pools marker gene counts across related cells to overcome technical noise, and compares them to a user-defined threshold using the Poisson model. In this work, I apply PCPC to three different data sets to demonstrate its utility. The first application shows it is able to annotate all lineages in data from human cord blood mononuclear cells (CBMCs), with a single marker gene per cell type. The second application shows PCPC is able to discriminate fine cell type sub- sets, using data from a human tumor of mucosa-associated lymphoid tissue (MALT). Many cell types in the MALT tumor microenvironment, and T cell subsets in particular, are transcriptionally related, making their classification difficult. In spite of this challenging complexity, PCPC can even use lowly expressed marker genes, such as FOXP3 marking CD3E + CD4 + FOXP3 + reg- ulatory T (T reg ) cells. Furthermore, I find T reg cells isolated from the MALT tumor can further be subdivided into CCR7 + and ICOS + subsets, indicating a mixture of naive-like and activated T reg cells. In comparison to unsuper- vised clustering and the marker-based tool Garnett, classification with PCPC has more flexibility and fewer misclassifications, respectively. Thus, PCPC removes obstacles in studying complex tissues with scRNAseq, such as the microenvironment in human tumors. Furthermore, I demonstrate a multi-sample multi-condition comparison using data from a patient cohort of aggressive and indolent lymphoma subtypes. PCPC is applied to classify CD3E + CD8B + cytotoxic T cells, followed by differential expression testing between the aggressive and indolent subtypes. This uncovers significantly lower LGALS1 expression in indolent tumors, further implicating this gene in tumor aggressiveness and T cell inhibition. Currently, PCPC requires data generated with unique molecular identifiers (UMI), as well as substantial manual work. Due to its ability to resolve com- plex tissues with few marker genes, PCPC may bring clarity to transcrip- tomic cell type definitions and prove useful for multi-sample multi-condition comparisons in scRNAseq data.

Document type: Dissertation
Supervisor: Brors, Prof. Dr. Benedikt
Place of Publication: Heidelberg
Date of thesis defense: 13 July 2021
Date Deposited: 05 Jan 2022 10:59
Date: 2021
Faculties / Institutes: The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences
DDC-classification: 500 Natural sciences and mathematics
Controlled Keywords: Biologie, RNS, Zelle
Uncontrolled Keywords: Cell types, Single-cell RNA sequencing, statistical method
About | FAQ | Contact | Imprint |
OA-LogoDINI certificate 2013Logo der Open-Archives-Initiative