Abstract:
Single-cell transcriptome data provide us with an enormous scope of studying biological systems at the cellular level. We aim to address different problems involving the statistical analysis of single-cell RNA-seq data. First, we develop a realistic statistical model for fitting single-cell transcriptome data based on a two-part model for gene-wise unimodal or bimodal distribution in addition to using a generalized linear model with a probit link for zero occurrences. In continuation to this work, we discuss testing methods to compare transcriptome profiles between two groups. We suggest two different likelihood ratio-based tests under unimodal and bimodal assumptions. We also propose a cell pseudotime reconstruction method avoiding dimensionality reduction, which may lead to loss of information in the data. We view the pseudotime reconstruction problem as finding the best permutation based on a cost function and invoke a genetic algorithm to find the optimum permutation. We also discuss a novel method to remove batch effects to facilitate merging two or more single-cell RNA-seq datasets. All our approaches are supported by simulation study and real data analysis.