PCSPF-Pancreatic Cancer Survival based on Preoperative Features

Citation Author(s):
Department of Hepatobiliary Pancreatic Surgery, Changhai Hospital, Navy Medical University, Shanghai 200433, China
Submitted by:
Pengjie Zhuang
Last updated:
Sun, 04/07/2024 - 10:44
Data Format:
0 ratings - Please login to submit your rating.


The prognostic survival dataset, Pancreatic Cancer Survival based on Preoperative Features (PCSPF), was constructed to explore the impact of key preoperative features on prognosis based on the follow-up data of patients with pancreatic cancer at Changhai Hospital, Shanghai, China. Based on the suggestions of doctors, the PCSPF contained 20 preoperative features that they considered important, including sex, abdominal pain, age, body mass index (BMI), C-reactive protein (CRP), albumin (ALB), CRP/ALB, leukocyte, neutrocyte, platelet, lymphocyte, neutrocyte lymphocyte ratio (NLR), platelet lymphocyte ratio (PLR), systemic immune-inflammation index (SII), lactic dehydrogenase, carbohydrate antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA), prealbumin, total bilirubin, and directed bilirubin. The most critical preoperative features affecting individual patients were selected from this list. Patients that survived for less than 90 days, had samples with missing values, or were labeled as survivors but had less than 365 days of follow-up were excluded from the initial 2,257 samples. These steps eliminated potential confounders that could have arisen from surgical interventions or statistical data. In total, 878 samples were selected to construct the dataset. In addition, pancreatic cancer survival prediction was formulated as a binary classification task because survival for at least one year is an important threshold for the prognosis of pancreatic cancer patients. Based on the survival duration, 1 and 0 denoted patients surviving one year or more and less than one year, respectively.


Due to privacy concerns for patient data, the standardized PCSPF dataset is provided for training machine learning models and neural networks, consistent with the format used in our paper. The data is saved in an Excel file, where the first row contains feature names (gender, abdominal pain, and 18 other features) along with the patient's survival label (whether the patient survived one year). Each subsequent row represents a patient sample, containing corresponding feature values, totaling 878 samples. Users need not perform additional data processing as the data has already been standardized. The data can be directly read from the file and used to train machine learning models by libraries such as sklearn in Python. Users can follow the procedures outlined in our paper, utilizing the PyTorch library in Python to conduct training and testing for artificial neural networks and deep Q-learning networks.



Submitted by Amanuel Lefebo on Thu, 12/21/2023 - 17:08

i can not access the dataset.

Submitted by Rizwan Qureshi on Tue, 01/09/2024 - 02:38