Datasets
Standard Dataset
Metascape results for Prostate cancer multiomics data
- Citation Author(s):
- Submitted by:
- Y-h. Taguchi
- Last updated:
- Fri, 07/17/2020 - 08:00
- DOI:
- 10.21227/rdmb-jm40
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Large p small n problem is a challenging problem in big data analytics. There are no de facto standard methods available to it. In this study, we propose a tensor decomposition (TD) based unsupervised feature extraction (FE) formalism applied to multiomics datasets, where the number of features is more than 100000 while the number of instances is as small as about 100. The proposed TD based unsupervised FE outperformed other conventional supervised feature selection methods, such as random forest, categorical regression (also known as analysis of variance, ANOVA), and penalized linear discriminant analysis when they are applied to not only multiomics datasets but also synthetic datasets. Genes selected by TD based unsupervised FE were biologically reliable. TD based unsupervised FE turned out to be not only the superior feature selection method but also the method that can select biologically reliable genes.
This is a supplementary file of paper submitted to bigdata2020
Comments
I try to use this dataset in my model
i try to use this dataset in my master