Natural Language Processing
The greatest challenge of machine learning problems is to select suitable techniques and resources such as tools and datasets. Despite the existence of millions of speakers around the globe and the rich literary history of more than a thousand years, it is expensive to find the computational linguistic work related to Punjabi Shahmukhi script, a member of the Perso-Arabic context-specific script low-resource language family. The selection of the best algorithm for a machine learning problem heavily depends on the availability of a dataset for that specific task.
- Categories:
Aspect Sentiment Triplet Extraction (ASTE) is an Aspect-Based Sentiment Analysis subtask (ABSA). It aims to extract aspect-opinion pairs from a sentence and identify the sentiment polarity associated with them. For instance, given the sentence ``Large rooms and great breakfast", ASTE outputs the triplet T = {(rooms, large, positive), (breakfast, great, positive)}. Although several approaches to ASBA have recently been proposed, those for Portuguese have been mostly limited to extracting only aspects without addressing ASTE tasks.
- Categories:
Dataset asscociated with a paper in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems
"Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments"
If you use this code or data, please cite the above paper.
- Categories:
The General Data Protection Regulation (GDPR), adopted in 2018, profoundly impacts information processing organizations as they must comply with this regulation. In this research, we consider GDPR-compliance as a high-level goal in software development that should be addressed at the offset of software development, meaning during requirements engineering (RE). In this work, we hypothesize that Natural Language Processing (NLP) can offer a viable means to automate this process.
- Categories:
Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016.
- Categories:
The age of Artificial Intelligence (AI) is coming. Since Natural Language Processing (NLP) is a core AI technology for communication between humans and devices, it is vital to understand technological trends. Early research on NLP focused on syntactic processing such as information extraction and subject modeling but later developed into the semantic-oriented analysis. To analyze technological trends concerning NLP, especially semantic analysis, patent data that contains objective and extensive information is analyzed.
- Categories:
This dataset page is currently being updated. The tweets collected by the model deployed at https://live.rlamsal.com.np/ are shared here. However, because of COVID-19, all computing resources I have are being used for a dedicated collection of the tweets related to the pandemic. You can go through the following datasets to access those tweets:
- Categories:
A benchmark dataset is always required for any classification or recognition system. To the best of our knowledge, no benchmark dataset exists for handwritten character recognition of Manipuri Meetei-Mayek script in public domain so far. Manipuri, also referred to as Meeteilon or sometimes Meiteilon, is a Sino-Tibetan language and also one of the Eight Scheduled languages of Indian Constitution. It is the official language and lingua franca of the southeastern Himalayan state of Manipur, in northeastern India.
- Categories: