BERT | IEEE DataPort

LLM Empowering Urban Science: The Exploration of Constructing a New Instruction Dataset

The application of large language models (LLMs) in urban planning has gained momentum, with prior research demonstrating their value in participatory planning, process streamlining, and event forecasting. This study focuses on further enhancing urban planning through the integration of more comprehensive datasets. We introduce a newly developed instruction dataset that amalgamates crucial information from several prominent urban datasets, including highD, NGSIM, the Road Networks dataset, TLC Trip data, and the Urban Flow Prediction Survey dataset.

Categories:

Readability Classifier with Linguistic Characteristics

This data repository contains test data and corresponding test code for evaluating the performance of a machine learning model. The dataset includes 950 labeled samples across 7 different classes. The test code provides implementations of several common evaluation metrics, including accuracy, precision, recall, and F1-score. This resource is intended to facilitate the benchmarking and comparison of different machine learning algorithms on a standardized task.

Categories:

BERT fine-tuned CORD-19 NER Dataset

This Named Entities dataset is implemented by employing the widely used Large Language Model (LLM), BERT, on the CORD-19 biomedical literature corpus. By fine-tuning the pre-trained BERT on the CORD-NER dataset, the model gains the ability to comprehend the context and semantics of biomedical named entities. The refined model is then utilized on the CORD-19 to extract more contextually relevant and updated named entities. However, fine-tuning large datasets with LLMs poses a challenge. To counter this, two distinct sampling methodologies are utilized.

Categories: