Natural Language Processing

Dialogue corpus for explanation request in a customer-care chatbot service

The dialogue corpus is described in the paper "Anticipating User Intentions in Customer Care Dialogue Systems" and contains a selection of human-chatbot Italian dialogues concerning customer-care requests.

In order to preserve the privacy and company data property, we removed the actual sentences and we present only the annotation described in the paper.

Categories:: Artificial Intelligence

454 Views

Shahmukhi Database SMDB- SMHaroof V1

The greatest challenge of machine learning problems is to select suitable techniques and resources such as tools and datasets. Despite the existence of millions of speakers around the globe and the rich literary history of more than a thousand years, it is expensive to find the computational linguistic work related to Punjabi Shahmukhi script, a member of the Perso-Arabic context-specific script low-resource language family. The selection of the best algorithm for a machine learning problem heavily depends on the availability of a dataset for that specific task.

Categories:: Machine Learning

231 Views

Portuguese Aspect Sentiment Triplet Extraction Datasets

Aspect Sentiment Triplet Extraction (ASTE) is an Aspect-Based Sentiment Analysis subtask (ABSA). It aims to extract aspect-opinion pairs from a sentence and identify the sentiment polarity associated with them. For instance, given the sentence ``Large rooms and great breakfast", ASTE outputs the triplet T = {(rooms, large, positive), (breakfast, great, positive)}. Although several approaches to ASBA have recently been proposed, those for Portuguese have been mostly limited to extracting only aspects without addressing ASTE tasks.

Categories:: Artificial Intelligence
Machine Learning

536 Views

Dataset: Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments

Dataset asscociated with a paper in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems

"Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments"

If you use this code or data, please cite the above paper.

Categories:: Artificial Intelligence
Computer Vision
Machine Learning

251 Views

Exploring Automated GDPR-Compliance in Requirements Engineering: A Systematic Mapping Study

The General Data Protection Regulation (GDPR), adopted in 2018, profoundly impacts information processing organizations as they must comply with this regulation. In this research, we consider GDPR-compliance as a high-level goal in software development that should be addressed at the offset of software development, meaning during requirements engineering (RE). In this work, we hypothesize that Natural Language Processing (NLP) can offer a viable means to automate this process.

Categories:: Other

241 Views

Wineinformatics: 21st Century Bordeaux Wines Dataset

Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016.

Categories:: Artificial Intelligence
Machine Learning
Other

2295 Views

Technological Trends of Natural Language Processing Based Semantic Analysis: A Comparative Study of the US, the EU, and Korea Patents Data

The age of Artificial Intelligence (AI) is coming. Since Natural Language Processing (NLP) is a core AI technology for communication between humans and devices, it is vital to understand technological trends. Early research on NLP focused on syntactic processing such as information extraction and subject modeling but later developed into the semantic-oriented analysis. To analyze technological trends concerning NLP, especially semantic analysis, patent data that contains objective and extensive information is analyzed.

Categories:: Standards Research Data

336 Views

Twitter Sentiment Analysis Data

This dataset page is currently being updated. The tweets collected by the model deployed at https://live.rlamsal.com.np/ are shared here. However, because of COVID-19, all computing resources I have are being used for a dedicated collection of the tweets related to the pandemic. You can go through the following datasets to access those tweets:

Categories:: Artificial Intelligence
Machine Learning

9969 Views

A Benchmark Dataset for Manipuri Meetei-Mayek Handwritten Character Recognition

A benchmark dataset is always required for any classification or recognition system. To the best of our knowledge, no benchmark dataset exists for handwritten character recognition of Manipuri Meetei-Mayek script in public domain so far. Manipuri, also referred to as Meeteilon or sometimes Meiteilon, is a Sino-Tibetan language and also one of the Eight Scheduled languages of Indian Constitution. It is the official language and lingua franca of the southeastern Himalayan state of Manipur, in northeastern India.

Categories:: Computer Vision
Image Processing
Other

1245 Views

Natural Language Processing

Natural Language Processing

Pages