Multilabel Thai property-related offences

Citation Author(s):
Sirawit
Chokphantavee
Sirindhorn International Institute of Technology, Thammasat University
Sorawit
Chokphantavee
Sirindhorn International Institute of Technology, Thammasat University
Submitted by:
Sirawit Chokpha...
Last updated:
Mon, 02/10/2025 - 06:44
DOI:
10.21227/chhq-e465
Data Format:
License:
5 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

Legal analysis utilizing natural language processing and machine learning technologies is a difficult undertaking that has recently sparked the interest of many academics and industries. Using a human-annotated dataset summarized into colloquial Thai from Supreme Court decisions, this work investigates a different combination of NLP, ML, and rule-based techniques for accurate legal case analysis as per Thai law, especially property-related offences, with the intuition to imitate the lawyer's cognitive process. We experimented with two major tasks, binary and multi-label classification, evaluated using a five-fold cross-validation method. We achieved exceptional performance for the former task for average accuracy and F1-score, reaching 94.2\% and 96.7\%, respectively, together with an intriguing finding that solely vanilla fastText, a static embedding, is enough for such a task. For the part of multi-label classification, we obtained a remarkable result of 82\% in average zero-one accuracy and 92\% in average hamming accuracy, with the fine-tuned joint embedding classification pipeline incorporating rule-based post-processing, showing an improvement from without the rule-based technique. This highlights the possibility of integrating the symbolic information from a rule-based algorithm together with the statistical computation from machine learning techniques in performing a complex legal analysis task.

Instructions: 

Dataset Name: Thai Property-Offence Dataset

Description:
This dataset contains 120 legal case descriptions related to property-related offences in Thailand. Each entry includes a reference to a Supreme Court decision, a text prompt describing the case, and binary labels indicating the presence of specific legal provisions. The dataset is useful for legal NLP tasks such as classification and case analysis.

Columns:

  1. Supreme Court Decision No. – The reference number of the Supreme Court ruling.
  2. Prompt – A textual description of the case, written in Thai.
  3. Section 334 – Binary indicator (1 or 0) for whether the case involves theft under Section 334 of the Thai Criminal Code.
  4. Section 336 – Binary indicator for whether the case involves snatching under Section 336.
  5. Section 339 – Binary indicator for whether the case involves robbery under Section 339.
  6. Section 340 – Binary indicator for whether the case involves gang robbery under Section 340.