4  Global proteomics data Quality Control

Data quality control of the spverse package is implemented through three sequential procedures: initial low-quality sample filtering, intermediate data normalization, and final missing value imputation.

4.1 Low-quality sample filtering

First, read the data and create the sp object.

global_sp <- creatsp("D:/Rdata/zhc/article/part2/fan/s966_section/")

print(global_sp)
sp data with 599 samples for 7748 features

The raw data contains 599 spots and 7748 features. Next, low-quality samples were excluded using the filter_sample_sp() function, which stores the results in the slot named clean_data. The times parameter retained its default setting of 1.5-fold, and the group parameter is not set.

global_sp <- filter_sample_sp(global_sp,times = 1.5)

dim(global_sp@clean_data)
[1] 7746  571

After filtering with the filter_sample_sp() function, 28 samples were removed, leaving 571 samples remaining.

filter_plot_sp(global_sp,times = 1.5)

filter_plot1.

4.2 Data normalization

If you have performed normalization in the search software, spverse does not recommend performing normalization again. We suggest first examining the overall distribution of the data to assess whether normalization is necessary.

density_plot_sp(global_sp)

DS_plot1.

No significant differences were observed in the distribution of any sample compared to others in the probability density plot, so normalization is unnecessary.

4.3 Missing value imputation

Next, we used the default “impseqrob” method for missing value imputation, and filtered out proteins detected in fewer than 30% of spots.

global_sp <- mvi_sp(global_sp,method = "impseqrob",nmr = 0.3)

density_plot_sp(global_sp)

DS_plot2.

After missing value imputation, the samples also exhibit a normal distribution. Thus, the data quality control process is complete.