Machine Learning

This dataset consists of 737 documents from the BBC Sport website, corresponding to sports news articles in five topical areas from 2004-2005. The class labels are divided into five categories: athletics, cricket, football, rugby, and tennis. The datasets have been pre-processed using the Porter stemming algorithm, stop-word removal, and filtering out terms with low frequency (count < 3).

Categories:
151 Views

The dataset contains two types of articles fake and real News. This dataset was collected from realworld sources; the truthful articles were obtained by crawling articles from Reuters.com (News website). As for the fake news articles, they were collected from different sources. The fake news articles were collected from unreliable websites that were flagged by Politifact (a fact-checking organization in the USA) and Wikipedia. The dataset contains different types of articles on different topics, however, the majority of articles focus on political and World news topics.

Categories:
639 Views

These are tight pedestrian masks for the thermal images present in the KAIST Multispectral pedestrian dataset, available at https://soonminhwang.github.io/rgbt-ped-detection/

Both the thermal images themselves as well as the original annotations are a part of the parent dataset. Using the annotation files provided by the authors, we develop the binary segmentation masks for the pedestrians, using the Segment Anything Model from Meta.

Categories:
297 Views

The fifteen publicly available datasets from the UCI Machine Learning Library are selected for experiments. The selected datasets has no missing values, therefore, normalization is applied solely to the original datasets. For nominal data, it has been discretized and compressed to the range of 0 to 1. Similarly, for continuous data that does not fall within the range of 0 to 1, normalization has been applied.

Categories:
27 Views

Recently, combinatorial interaction strategies have a large spectrum as black box strategies for testing software and hardware. This paper discusses a novel adoption of a combinatorial interaction strategy to generate a sparse combinatorial data table (SCDT) for machine learning. Unlike test data generation strategies, in which the t-way tuples synthesize into a test case, the proposed SCDT requires analyzing instances against their corresponding tuples to generate a systematic learning dataset.

Categories:
111 Views

Efficient and realistic tools capable of modeling radio signal propagation are an indispensable component for the effective operation of wireless communication networks. The advent of artificial intelligence (AI) has propelled the evolution of a new generation of signal modeling tools, leveraging deep learning (DL) models that learn to infer signal characteristics.

Categories:
930 Views

The dataset includes 22 projects and 1680 user stories, with the aim of classifying these stories into those suitable for AI implementation and those not recommended for AI implementation. The labeling was done in a group, reaching a consensus on each user story in each project, determining whether it is susceptible to being developed with AI. Thus, each user story was evaluated and assigned a value of 1 if it was considered suitable for AI implementation (this label was named AI), and a value of 0 if it was not (this label was named not-AI).

Categories:
480 Views

The terahertz communications band in the 252 to325 GHz range has been recently explored for its potential to meet the stringent requirements for the emerging sixth generation of wireless communications. However, there are several challenges including noise and nonlinearity that hinder efficient implementations. We aim to address this limitation in terahertz communications through convolutional neural networks (CNN) enhanced by the domain knowledge from traditional Volterra filters.

Categories:
256 Views

We evaluate the performance of our proposed protocol using three benchmark datasets. Each dataset is composed of 25 percent of the local data forming the test dataset and 75 percent of the local data forming the training dataset.

MNIST: The dataset is made up of gray pictures that are digits which are written by hand containing ten different classes which provides 60000 training samples in total.

Categories:
94 Views

The dataset contains 560 different observations each having 1049 absorption data points for cancerous and non-cancerous skin cells. The reflection absorption data were obtained from terahertz metamaterials on top of which the cells are placed. The 560 observations made were for varying size tissue thickness and polarization and incident wave angle

Categories:
273 Views

Pages