Access Control Policy Generation and Verification Datasets

Citation Author(s):
Sakuna
Jayasundara
Submitted by:
Sakuna Harinda ...
Last updated:
Sat, 02/15/2025 - 04:30
DOI:
10.21227/hqea-vp47
Data Format:
License:
41 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

The manual generation of access control policies from an organization’s high-level requirement specifications is a laborious and error-prone process. Mistakes in this manual policy generation process cause access control failures that may lead to data breaches. As a solution, previous research pro- posed automated access control policy generation frameworks. However, existing approaches suffer from several limitations, such as the inability to handle complex access requirements due to the lack of domain adaptation, making them highly unreliable. To fill this gap, this paper proposes AGentV, a novel access control policy generation and verification framework that significantly improves the reliability of the access control policy generation process through language models (LMs). It achieves this improvement while using small open-source LMs, enabling AGentV to run efficiently even on on-premise resource- constrained computers of the organization. Hence, AGentV does not need to send the organization’s access requirements to a third-party black box LMs hosted and controlled by another entity for policy generation, preserving their confidentiality. Our evaluation shows that AGentV is very effective in identifying natural language access control policies (NLACPs) from high- level requirements, achieving an average state-of-the-art F1 score of 90.3%. Unlike existing frameworks limited to generating simple policies with three components (i.e., subject, action, and resource), AGentV successfully translates more complex NLACPs containing elements like purposes and conditions via a novel ac- cess control-specific structured information extraction technique. Its ability to extract word-level as well as semantic information from NLACPs at the same time yields a state-of-the-art policy generation F1 score of 80.3%. Finally, AGentV introduces a novel technique to verify the generated policies and provide precise feedback to the administrators. It allows administrators to refine the incorrectly generated policies before adding them to the authorization system. Additionally, to facilitate future research, we release two annotated datasets, addressing the data scarcity of this domain.

Instructions: 

Refer to the github repository for instructions on using the datasets: https://anonymous.4open.science/r/agentvllm