Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning. / Webel, Henry; Niu, Lili; Nielsen, Annelaura Bach; Locard-Paulet, Marie; Mann, Matthias; Jensen, Lars Juhl; Rasmussen, Simon.
In: Nature Communications, Vol. 15, 5405, 2024.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning
AU - Webel, Henry
AU - Niu, Lili
AU - Nielsen, Annelaura Bach
AU - Locard-Paulet, Marie
AU - Mann, Matthias
AU - Jensen, Lars Juhl
AU - Rasmussen, Simon
N1 - © 2024. The Author(s).
PY - 2024
Y1 - 2024
N2 - Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Here we demonstrate how collaborative filtering, denoising autoencoders, and variational autoencoders can impute missing values in the context of LFQ at different levels. We applied our method, proteomics imputation modeling mass spectrometry (PIMMS), to an alcohol-related liver disease (ALD) cohort with blood plasma proteomics data available for 358 individuals. Removing 20 percent of the intensities we were able to recover 15 out of 17 significant abundant protein groups using PIMMS-VAE imputations. When analyzing the full dataset we identified 30 additional proteins (+13.2%) that were significantly differentially abundant across disease stages compared to no imputation and found that some of these were predictive of ALD progression in machine learning models. We, therefore, suggest the use of deep learning approaches for imputing missing values in MS-based proteomics on larger datasets and provide workflows for these.
AB - Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Here we demonstrate how collaborative filtering, denoising autoencoders, and variational autoencoders can impute missing values in the context of LFQ at different levels. We applied our method, proteomics imputation modeling mass spectrometry (PIMMS), to an alcohol-related liver disease (ALD) cohort with blood plasma proteomics data available for 358 individuals. Removing 20 percent of the intensities we were able to recover 15 out of 17 significant abundant protein groups using PIMMS-VAE imputations. When analyzing the full dataset we identified 30 additional proteins (+13.2%) that were significantly differentially abundant across disease stages compared to no imputation and found that some of these were predictive of ALD progression in machine learning models. We, therefore, suggest the use of deep learning approaches for imputing missing values in MS-based proteomics on larger datasets and provide workflows for these.
KW - Proteomics/methods
KW - Deep Learning
KW - Humans
KW - Mass Spectrometry/methods
KW - Supervised Machine Learning
KW - Male
U2 - 10.1038/s41467-024-48711-5
DO - 10.1038/s41467-024-48711-5
M3 - Journal article
C2 - 38926340
VL - 15
JO - Nature Communications
JF - Nature Communications
SN - 2041-1723
M1 - 5405
ER -
ID: 396732953