NLP

Legal analysis utilizing natural language processing and machine learning technologies is a difficult undertaking that has recently sparked the interest of many academics and industries. Using a human-annotated dataset summarized into colloquial Thai from Supreme Court decisions, this work investigates a different combination of NLP, ML, and rule-based techniques for accurate legal case analysis as per Thai law, especially property-related offences, with the intuition to imitate the lawyer's cognitive process.

Categories:
16 Views

The process of allocating the intestate inheritance among the statutory heirs is sophisticated yet occurs regularly. Many scholars have attempted to develop automated allocation systems to tackle this High task. However, most amply existing systems rely on conventional form-based input, which may overwhelm the general users. Furthermore, no existing system concerning intestate inheritance allocation according to the Civil and Commercial Code of Thailand is publicly available.

Categories:
23 Views

<p>The <strong>Twitter2015-Urdu Dataset</strong> is a multimodal resource designed to advance Multimodal Named Entity Recognition (MNER) research in Urdu, a low-resource language. It adapts the widely used Twitter2015 English dataset with culturally grounded annotations tailored to Urdu's unique linguistic complexities.

Categories:
17 Views

DragonVerseQA is an open-domain and long-form Over-The-Top (OTT) Question-Answering (QA) dataset specifically oriented to the fantasy universe of "The House of the Dragon" and "Game Of Thrones" TV series. The curated dataset combines full episode summaries sourced from HBO and fandom wiki websites, user reviews from sources like IMDb and Rotten Tomatoes, and high-quality, open-domain, legally admissible sources, and structured data from repositories like WikiData into one dataset.

Categories:
143 Views

Please cite the following paper when using this dataset:

Vanessa Su and Nirmalya Thakur, “COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations”, Proceedings of the IEEE 15th Annual Computing and Communication Workshop and Conference 2025, Las Vegas, USA, Jan 06-08, 2025 (Paper accepted for publication, Preprint: https://arxiv.org/abs/2412.17180).

Abstract:

Categories:
120 Views

This dataset comprises a comprehensive collection of PubMed abstracts and associated metadata focusing on the topic of multiple sclerosis (MS) in relation to social determinants and environmental factors, spanning publications from January 1, 2018, to November 15, 2024.

Categories:
61 Views

As shown in the figure 1, the NLP market is projected to grow from USD 31.76 billion in 2024 to USD 92.99 billion by 2029. This growth is driven by advances in deep learning and algorithms, increased digitization, and the integration of NLP with machine learning and deep learning. Key factors contributing to this expansion include the increasing use of NLP in healthcare and call centers, the demand for advanced text analytics, and growing machine-to-machine technology.

Categories:
84 Views

Microsoft contains a productive tool known as MS Office but the inclusion of VBA Macros inside the MS Office for automation purposes makes it a way for attackers to perform malicious activities. To get an up-to-date dataset, the research regarding VBA macros is still working to find efficient ways to detect it. To perform analysis, the dataset is required which is publically harder to find. To overcome this issue, a dataset is created from VirusTotal, VirusShare, Zenodo, Malware Bazaar, Github and InQuest Labs.

Categories:
1117 Views

Pages