Multi-omics datasets for Alzheimer's disease, schizophrenia, invasive breast cancer, gastric adenocarcinoma

Citation Author(s):
Chen
Wen
Department of Medical Statistics, School of Public Health, Sun Yat-sen University
Submitted by:
CHEN Wen
Last updated:
Fri, 02/21/2025 - 03:28
DOI:
10.21227/3vbv-n818
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The data of ROSMAP dataset have been preprocessed and dimensionally reduced in the original research, thus we did not perform further preprocessing on it. For SCZ dataset, we firstly removed features with more than 50% missing or 0 expression values for all omics sets. Log transformation was then utilized to normalize omics expression values, and the Z-score method was used to standardize all features of each sample in every omics sets. Only samples presented in both omics sets and label set were retained in the dataset of analysis. The differential expression analysis was finally performed on the obtained multi-omics data to reduce the dimensionality to a consistent number.

For BRCA and STAD samples, those taken from non-frozen tissues or tissues adjacent to cancer were removed, and those belonging to normal-like types of BRCA (sample size: 39) and EBV types of STAD (sample size: 23) were also deleted to reduce effects caused by the imbalance of sample distribution, since the sample sizes of these categories were less than half of the minimum sample size of other classes. Only samples presented in both omics and label sets were retained in the final BRCA and STAD datasets. RNA features were converted to genes and the converted duplicate genes were merged with mean values. Finally, the same preprocessing workflow as the SCZ data was executed on the obtained expression values.

Instructions: 

Alzheimer's Disease (AD) patients versus Normal Control (NC) classification

0: Normal Control

1: Alzheimer's Disease

 

1: mRNA

2: miRNA

3: DNA methylation

 

Schizophrenia (SCZ) patients versus Normal Control (NC) classification

0: Normal Control

1: Schizophrenia

 

1: Protein

2: Metabolite

 

 

Invasive breast cancer (BRCA) patients subtypes classification

0: Basel

1: Her2

2: Lum A

3: Lum B 

 

1: RNA

2: miRNA

 

Stomach adenocarcinoma (STAD) patients subtypes classification

0: CIN

1: GS

2: MSI   

 

1: RNA

2: miRNA