Machine Learning

We collected programming problems and their solutions from previous studies. After applying some pre-processing steps, we queried advanced LLMs, such as GPT4, with the collected problems to produce machine-generated codes, while the original solutions were labeled as human-written codes. Finally, the entire collected dataset was divided into training, validation, and test sets, ensuring that there is no overlap among these sets, meaning no solutions in two different sets that solve the same programming problem.

Categories:
27 Views

The dataset consists of uplink channel gains, downlink channel gains and uplink to downlink channel gains along with corresponding power allocations for uplink users and downlink users across all subcarriers. Additionally, it consists of NOMA decoding order for successful implementation of SIC at NOMA receiver. The number of UL users and DL users are considered as N=M=6, and subcarriers are S=9. Each column in the dataset is a sample for fading channel realization and it should be converted back to the matrix to compute sumrate.

Categories:
117 Views

The study focused on two regions in Rupnagar district, India, with an area of 216 km² as shown in Fig. 1a, using satellite data from June to November 2023. The upper region predominantly features paddy and maize, while the lower region includes paddy and sugarcane. Satellite images were obtained from PlanetScope’s 130-satellite constellation, with a spatial resolution of 3 meter. A total of 32 images, captured between late May and mid-November 2023, were used, all with less than 15% cloud cover.

Categories:
129 Views

This dataset addresses the challenge of limited vocal recordings available in secondary datasets, particularly those that predominantly feature foreign accents and contexts. To enhance the accuracy of our solution tailored for Sri Lankans, we employed primary data-gathering methods.

The dataset comprises vocal recordings from a sample population of youth. Participants were instructed to read three specific sentences designed to capture a range of vocal tones:

Categories:
159 Views

The Facial Expression Dataset (Sri Lankan) is a culturally specific dataset created to enhance the accuracy of emotion recognition models in Sri Lankan contexts. Existing datasets, often based on foreign samples, fail to account for cultural differences in facial expressions, affecting model performance. This dataset bridges that gap, using high-quality data sourced from over 100 video clips of professional Sri Lankan actors to ensure expressive and clear facial imagery.

Categories:
339 Views

This is part of our external validation set, which contains 40 volunteers and about 80 hematological examination items. Among them, Cl, BHB, AG, RBP, HCO3, FT3, aTPO, CYSC, FT4, Folate, UA and aTG contribute more to the prediction. Because the data involves personal privacy and research confidentiality, it cannot be fully public. However, you can still make predictions by using our ML model and get a high accuracy on the external dataset.

Categories:
122 Views

The SINEW (Sensors in Home for Elderly Wellbeing) dataset consists of 15 high-level biomarker features, derived from raw sensor readings collected by in-home sensors used for predictive modeling research: SINEW Weekly Biomarker.

This dataset was collected for a study focused on the early detection of mild cognitive impairment, providing an opportunity for timely intervention before it progresses to Alzheimer's disease.

Categories:
117 Views

The SINEW (Sensors in Home for Elderly Wellbeing) dataset consists of 15 high-level biomarker features, derived from raw sensor readings collected by in-home sensors used for predictive modeling research: SINEW 15 - Monthly Biomarker.

This dataset was collected for a study focused on the early detection of mild cognitive impairment, providing an opportunity for timely intervention before it progresses to Alzheimer's disease.

Categories:
115 Views

Two thousand MTJ samples in total were included in the dataset for analysis; First, shuffle the dataset randomly to eliminate bias, then split it into ten equal folds of 200 samples each. For each iteration of cross-validation, use nine folds for training and one for testing, rotating the test fold across all ten groups so that every sample is tested once. 

Categories:
32 Views

This dataset contains LoRa physical layer signals collected from 60 LoRa devices and six SDRs (PLUTO-SDR, USRP B200 mini, USRP B210, USRP N210, RTL-SDR). It is intended for use by researchers in the development of a federated RFFI system, whereby the signals collected from different receivers and locations can be employed for evaluation purposes.

More details can be found at https://github.com/gxhen/federatedRFFI

Categories:
449 Views

Pages