Abstract 

This dataset integrates three valuable sources of drug-target interaction data: the Human dataset, the Biosnap dataset, and the DrugBank dataset, combining them into a comprehensive resource for drug discovery and bioinformatics research. It includes a diverse set of human proteins identified as potential drug targets, along with a variety of corresponding drug molecules. Each drug-target pair is accompanied by interaction labels, indicating whether the drug interacts with the protein target. By merging data from these authoritative biological databases, this dataset provides a rich foundation for developing predictive models and advancing machine learning techniques in the field of drug discovery and repurposing.

Instructions: 

To use the data of human dataset, you can load both human.txt and humanSeqPdb.txt using Python's pandas library. From the human.txt , you can get the sequence of protein, the smiles of drug and the label of the intraction pair. From the humanSeqPdb.txt, you can get the  identifier of the Protein Data Bank (PDB) structure associated with the protein, which can be useful for structural bioinformatics studies. 

The usage of the Biosnap and DrugBank datasets is similar.

 

Documentation

AttachmentSize
File read me.txt2.5 KB