The relative binding affiniy values of all 8000 tripeptide sequenses are shown here. The values are standardized by isoforms so that the mean is zero and the variance is one. The sequences are ordered by the results of hierarchical cluster analysis. 


The group of tripeptides in N-terminal sublibrary that found by cluster analysis in the study is highlighted by black borders. And the sequences that reported to bind to 14-3-3s previously are marked by red color. The marked sequences are:RST(c-Raf-1, A-Raf), RDS(Cdc25a), RPS(Cdc25b), RAA(PKC-ε), RAK(PCTAIRE-2), RSH(mT), RHA(Tyr hydroxylase), RHS(Tryp hydroxylase), RSK(A20), RIH(Cdc25a), RFQ(Cdc25b), CVR(PKCγ), PTR(IRS-1), SYT(K8 keratin), LYR(Clathrin assembly prot).


This is the dataset associated with the IEEE-JBHI submission "Synthesizing Electrocardiograms With Atrial Fibrillation Characteristics Using Generative Adversarial Networks". This dataset contains 4,768 synthesized atrial fibrillation (AF)-like ECG signals stored in PhysioNet MAT/HEA format.


Real-World Multimodal Foodlog Database (RWMF) database is built for evaluating the multimodal retrieval algorithm in real-life dietary environment, and it has 7500 multimodal pairs in total, where each image can be related to multiple texts and each text can be related to multiple images. Details of this database can be found in this paper: Pengfei Zhou, Cong Bai, Kaining Ying, Jie Xia, Lixin Huang, RWMF: Real-World Multimodal Foodlog Database, ICPR 2020


Since this is a multimodal database, the images in RWMF is related to texts by share the same tag, which is saved in `Foodhealth/im_label`

* `Foodlog`: the real-world food images and the associative instant bio-data
** `Image`: the folder that contains all the real-world foodlog images.
** `biodata.csv`: the csv file that contains all the associative instant bio-data, these data are associated to food images by the file names of images.
** `biodata.txt`: the txt that indicate the attributes of each column in `biodata.csv`.
** `data_category.csv`: the health category tags that help the model test the performance of cross-modal retrieval.
** `data_category.txt`: the txt that indicate the attributes of each column in `data_category.csv`.

* `Foodhealth`: the food description texts and the associative food nutrition composition data
** `description.csv`: the csv file that contains all the food description texts refered to each tag.
** `description.txt`: the txt file that indicate the attributes of each column in `description.csv`.
** `composition.csv`: the csv file that contains all the food nutrition composition data refered to each tag.
** `composition.txt`: the txt file that indicate the attributes of each column in `composition.csv`.
** `im_label.csv`: the csv file that contains all the tags related to each image.
** `im_label.txt`: the txt file that indicate the attributes of each column in `im_label.csv`.


Large p small n problem is a challenging problem in big data analytics. There are no de facto standard methods available to it. In this study, we propose a tensor decomposition (TD) based unsupervised feature extraction (FE) formalism applied to multiomics datasets, where the number of features is more than 100000 while the number of instances is as small as about 100.


This dataset contains joint kinematics, kinetics, and EMG activity from an experimental protocol approved by the Institutional Review Board at the University of Texas at Dallas. This data was collected to evaluate the robustness of different parameterization variables during perturbations for application in robotic prosthetic legs. Ten able-bodied subjects self-selected a comfortable speed for walking on level (0 degree), +5 degree, and -5 degree inclines. Subjects walked at the self-selected speed for a minute without perturbations to produce a control dataset of unperturbed kinematics.


Please see the README document for:

  • Details on the available data, how it was collected, and how it has been processed
  • An example of how to efficiently traverse the dataset (ExampleScript.m)
  • Instructions for the script used to execute the experiment (Treadmill_Perturbation_Shell.m)

The dataset contains medical signs of the sign language including different modalities of color frames, depth frames, infrared frames, body index frames, mapped color body on depth scale, and 2D/3D skeleton information in color and depth scales and camera space. The language level of the signs is mostly Word and 55 signs are performed by 16 persons two times (55x16x2=1760 performance in total).



The signs are collected at Shahid Beheshti University, Tehran, and show local gestures. The SignCol software (code: , paper: ) is used for defining the signs and also connecting to Microsoft Kinect v2 for collecting the multimodal data, including frames and skeletons. Two demonstration videos of the signs are available at youtube: vomit: , asthma spray: . Demonstration videos of the SignCol are also available at and .

The dataset contains 13 zip files totally: One zipfile contains readme, sample codes and data (, the next zip file contains sample videos ( and other 11 zip files contain 5 signs in each (e.g. Signs(11-15).zip). For quick start, consider the

Each performed gesture is located in a directory named in Sign_X_Performer_Y_Z format which shows the Xth sign performed by the Yth person at the Znd iteration (X=[1,...,55], Y=[1,...,16], Z=[1,2]). The actual names of the signs are listed in the file: table_signs.csv.

Each directory includes 7 subdirectories:

1.      Times: time information of frames saved in CSV file.

2.      Color Frames: RGB frames saved in 8 bits *.jpg format with the size of 1920x1080.

3.      Infrared Frames: Infrared frames saved in 8 bits *.jpg format with the size of 512x424.

4.      Depth Frames: Depth frames saved in 8 bits *.jpg format with the size of 512x424.

5.      Body Index Frames: Body Index frames scaled in depth saved in 8 bits *.jpg format with the size of 512x424.

6.      Body Skels data: For each frame, there is a CSV file containing 25 rows according to 25 joints of body and columns for specifying the joint type, locations and space environments. Each joint location is saved in three spaces, 3D camera space, 2D depth space (image) and 2D color space (image). The 21 joints are visible in this dataset.

7.      Color Body Frames: frames of RGB Body scaled in depth frame saved in 8 bits *.jpg format with the size of 512x424.


Frames are saved as a set of numbered images and the MATLAB script PrReadFrames_AND_CreateVideo.m shows how to read frames and also how to create videos, if is required.

The 21 visible joints are Spine Base, Spine Mid, Neck, Head, Shoulder Left, Elbow Left, Wrist Left, Hand Left, Shoulder Right, Elbow Right, Wrist Right, Hand Right, Hip Left, Knee Left, Hip Right, Knee Right, Spine Shoulder, Hand TipLeft, Thumb Left, Hand Tip Right, Thumb Right. The MATLAB script PrReadSkels_AND_CreateVideo.m shows an example of reading joint’s informtaion, fliping them and drawing the skeleton on depth and color scale.

The updated information about the dataset and corresponding paper are available at GitHub repository MedSLset.

Terms and conditions for the use of dataset: 

1- This dataset is released for academic research purposes only.

2- Please cite both the paper and dataset if you found this data useful for your research. You can find the references and bibtex at MedSLset.

3- You must not distribute the dataset or any parts of it to others. 

4- The dataset just inclues image, text and video files and is scanned via malware protection softwares. You accept full responsibility for your use of the dataset. This data comes with no warranty or guarantee of any kind, and you accept full liability.

5- You will treat people appearing in this data with respect and dignity.

6- You will not try to identify and recognize the persons in the dataset.


Guinea pig erythrocytes without neuraminidase inhibitor. Guinea pig erythrocytes without neuraminidase inhibitor sequence.


Invasive lobular carcinoma (ILC) is the second most prevalent histologic subtype of invasive breast cancer. Here, we comprehensively profiled 817 breast tumors, including 127 ILC, 490 ductal (IDC), and 88 mixed IDC/ILC. Besides E-cadherin loss, the best known ILC genetic hallmark, we identified mutations targeting PTEN, TBX3 and FOXA1 as ILC enriched features. PTEN loss associated with increased AKT phosphorylation, which was highest in ILC among all breast cancer subtypes. Spatially clustered FOXA1 mutations correlated with increased FOXA1 expression and activity.


This Dataset contains EEG recordings from epileptic rats. The genetic absence epilepsy rats (GAERS) are one of the best-established rodent models for generalized epilepsy. The rats show seizures with characteristic "spike and wave discharge" EEG patterns. Experiments were performed in accordance with the German law on animal protection and were approved by the Animal Care and Ethics Committee of the University of Kiel.

  • Sample Frequency: 1600
  • Day1 (18:23:57-16:35:56): Three animals (R1, R2, R3): Array (data points x channels (3))
  • Day2 (16:42:53-16:52:06): Three animals (R1, R2, R3): Array (data points x channels (3))
  • Day3 (17:32:19-10:25:19): Three animals (R1, R2, R3): Array (data points x channels (3))
  • Day4 (10:26:40-14:46:13): Two animals (R1, R3): Array (data points x channels (3))

The accurate prediction of new interactions between drugs is important for avoiding unknown (mild or severe) adverse reactions to drug combinations. The development of effective in silico methods for evaluating drug interactions based on gene expression data requires an understanding of how various drugs alter gene expression. Current computational methods for the prediction of drug-drug interactions (DDIs) utilize data for known DDIs to predict unknown interactions. However, these methods are limited in the absence of known predictive DDIs.