large language model | IEEE DataPort

CodyDroid Model Performance

The CodyDroid Evaluation Dataset is curated to benchmark the Android SDK code generation capabilities of two lightweight language models: CodyDroid Model-1 (deepseek-coder-5.9M-kexer) and CodyDroid Model-2 (codegen-175K-mono-java). Each entry in the dataset consists of a natural language description of an Android development task, paired with code outputs from both models, a reference solution crafted by a developer, and a detailed qualitative analysis.

Categories:

Machine Learning

MedCD: A Medical Clinical Dataset

We curated and release a real-world medical clinical dataset, namely MedCD, in the context of building generative artificial intelligence (AI) applications in the clinical setting. The MedCD dataset is one of the accomplishments from our longitudinal applied AI research and deployment in a tertiary care hospital in China. First, the dataset is real and comprehensive, in that it was sourced from real-world electronic health records (EHRs), clinical notes, lab examination reports and more.

Categories:

Benchmark Dataset for Generative AI on Edge Devices

The benchmarking dataset, GenAI on the Edge, contains performance metrics from evaluating Large Language Models (LLMs) on edge devices, utilizing a distributed testbed of Raspberry Pi devices orchestrated by Kubernetes (K3s). It includes performance data collected from multiple runs of prompt-based evaluations with various LLMs, leveraging Prometheus and the Llama.cpp framework. The dataset captures key metrics such as resource utilization, token generation rates/throughput, and detailed inference timing for stages such as Sample, Prefill, and Decode.

Categories:

MageCode

We collected programming problems and their solutions from previous studies. After applying some pre-processing steps, we queried advanced LLMs, such as GPT4, with the collected problems to produce machine-generated codes, while the original solutions were labeled as human-written codes. Finally, the entire collected dataset was divided into training, validation, and test sets, ensuring that there is no overlap among these sets, meaning no solutions in two different sets that solve the same programming problem.

Categories: