Clinical Notes

We present the SynSUM benchmark, a synthetic dataset linking unstructured clinical notes to structured background variables. The dataset consists of 10,000 artificial patient records containing tabular variables (like symptoms, diagnoses and underlying conditions) and associated clinical notes describing the fictional patient encounter in the domain of respiratory diseases. The tabular portion of the data is generated through a Bayesian network, where both the causal structure between the variables and the conditional probabilities are proposed by an expert based on domain knowledge.

Categories:
139 Views

The large and diverse access to data sources in healthcare has boosted the application of novel computer techniques that can extract meaningful information to improve patients' prognoses and other important medical uses. However, current systems require the professional to manually type the information, which increases the risk of transcription errors and cross-contamination. We propose an automated system that allows healthcare professionals to dictate clinical information that can be transcribed and analyzed.

Categories:
108 Views