Unsupervised machine learning for single-cell analysis
The volume of data often forces researchers to make concessions and assumptions that introduce biases when conducting analyses with standard methods. The St. Jude scientists used an artificial intelligence approach that removes such bias from these selections.
“Our method uses unsupervised machine learning, which automatically determines more robust and less arbitrary parameters for the analysis,” Liu said. “It learns how to group cells based on their different active biological processes or cell type identities.”
Since the algorithm learns and derives its analysis from the data presented, researchers could use it on any sizeable single-cell RNA sequencing dataset. As it investigates each new large dataset individually and only uses those expression program clues to make conclusions, the researchers called the approach the Consensus and Scalable Inference of Gene Expression Programs (CSI-GEP). When applied to the largest single-cell RNA databases, CSI-GEP produced better results than every other method. Most impressively, the algorithm could identify cell types and the activity of biological processes missed by other methods.
“We’ve created a tool broadly applicable to studying any disease through single-cell RNA analysis,” Geeleher said. “The method performed substantially better than all existing approaches we tested, so I hope other scientists consider using it to get better value out of their single-cell data.”
CSI-GEP is freely available at https://github.com/geeleherlab/CSI-GEP.
Authors and funding
The study’s other authors are Richard Chapple, Declan Bennett, William Wright, Ankita Sanjali, Yinwen Zhang and Min Pan of St. Jude and Erielle Culp, University of Tennessee Health Science Center.
The study was supported by grants from the National Cancer Institute (R01CA260060), National Institute of General Medical Sciences (R35GM138293), National Human Genome Research Institute (R00HG009679) and ALSAC, the fundraising and awareness organization of St. Jude.