Machine Learning

We collected programming problems and their solutions from previous studies. After applying some pre-processing steps, we queried advanced LLMs, such as GPT4, with the collected problems to produce machine-generated codes, while the original solutions were labeled as human-written codes. Finally, the entire collected dataset was divided into training, validation, and test sets, ensuring that there is no overlap among these sets, meaning no solutions in two different sets that solve the same programming problem.

Categories:
21 Views

The dataset consists of uplink channel gains, downlink channel gains and uplink to downlink channel gains along with corresponding power allocations for uplink users and downlink users across all subcarriers. Additionally, it consists of NOMA decoding order for successful implementation of SIC at NOMA receiver. The number of UL users and DL users are considered as N=M=6, and subcarriers are S=9. Each column in the dataset is a sample for fading channel realization and it should be converted back to the matrix to compute sumrate.

Categories:
40 Views

The study focused on two regions in Rupnagar district, India, with an area of 216 km² as shown in Fig. 1a, using satellite data from June to November 2023. The upper region predominantly features paddy and maize, while the lower region includes paddy and sugarcane. Satellite images were obtained from PlanetScope’s 130-satellite constellation, with a spatial resolution of 3 meter. A total of 32 images, captured between late May and mid-November 2023, were used, all with less than 15% cloud cover.

Categories:
83 Views

This dataset addresses the challenge of limited vocal recordings available in secondary datasets, particularly those that predominantly feature foreign accents and contexts. To enhance the accuracy of our solution tailored for Sri Lankans, we employed primary data-gathering methods.

The dataset comprises vocal recordings from a sample population of youth. Participants were instructed to read three specific sentences designed to capture a range of vocal tones:

Categories:
132 Views

The Facial Expression Dataset (Sri Lankan) is a culturally specific dataset created to enhance the accuracy of emotion recognition models in Sri Lankan contexts. Existing datasets, often based on foreign samples, fail to account for cultural differences in facial expressions, affecting model performance. This dataset bridges that gap, using high-quality data sourced from over 100 video clips of professional Sri Lankan actors to ensure expressive and clear facial imagery.

Categories:
189 Views

This is part of our external validation set, which contains 40 volunteers and about 80 hematological examination items. Among them, Cl, BHB, AG, RBP, HCO3, FT3, aTPO, CYSC, FT4, Folate, UA and aTG contribute more to the prediction. Because the data involves personal privacy and research confidentiality, it cannot be fully public. However, you can still make predictions by using our ML model and get a high accuracy on the external dataset.

Categories:
54 Views

Two thousand MTJ samples in total were included in the dataset for analysis; First, shuffle the dataset randomly to eliminate bias, then split it into ten equal folds of 200 samples each. For each iteration of cross-validation, use nine folds for training and one for testing, rotating the test fold across all ten groups so that every sample is tested once. 

Categories:
30 Views

This dataset contains LoRa physical layer signals collected from 60 LoRa devices and six SDRs (PLUTO-SDR, USRP B200 mini, USRP B210, USRP N210, RTL-SDR). It is intended for use by researchers in the development of a federated RFFI system, whereby the signals collected from different receivers and locations can be employed for evaluation purposes.

More details can be found at https://github.com/gxhen/federatedRFFI

Categories:
242 Views

This study presents a English-Luganda parallel corpus comprising over 2,000 sentence pairs, focused on financial decision-making and products. The dataset draws from diverse sources, including social media platforms (TikTok comments and Twitter posts from authoritative accounts like Bank of Uganda and Capital Markets Uganda), as well as fintech blogs (Chipper Cash and Xeno). The corpus covers a range of financial topics, including bonds, loans, and unit trust funds, providing a comprehensive resource for financial language processing in both English and Luganda.

Categories:
85 Views

Two-year price movements from 01/01/2014 to 01/01/2016 of 88 stocks are selected to target, coming from all the 8 stocks in the Conglomerates sector and the top 10 stocks in capital size in each of the other 8 sectors. The full list of 88 stocks and their companies selected from 9 sectors is available in StockTable, a facsimile of the paper appendix appendix_table_of_target_stocks.pdf.

Categories:
21 Views

Pages