Datasets
Standard Dataset
human
- Citation Author(s):
- Submitted by:
- tian wen
- Last updated:
- Mon, 12/02/2024 - 22:36
- DOI:
- 10.21227/w8nh-z182
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
The Human dataset provides a comprehensive collection of drug-target interactions specific to human proteins, aimed at facilitating research in drug discovery and bioinformatics. This dataset includes a diverse range of human proteins as drug targets, along with associated drug molecules and their respective interaction labels. The data consists of molecular descriptors of drugs, protein sequences, and experimentally validated interactions sourced from various biological databases. The dataset is designed to support the development and evaluation of predictive models for drug-target interaction, enabling researchers to leverage machine learning techniques for identifying potential therapeutic targets and repurposing existing drugs. The dataset is publicly available for use in computational biology, systems pharmacology, and AI-driven drug discovery applications.
To use these data, you can load both human.txt
and humanSeqPdb.txt
using Python's pandas
library. From the human.txt , you can get the sequence of protein, the smiles of drug and the label of the intraction pair. From the humanSeqPdb.txt, you can get the identifier of the Protein Data Bank (PDB) structure associated with the protein, which can be useful for structural bioinformatics studies.