Abstract 

The TripAdvisor online airline review dataset, spanning from 2016 to 2023, provides a comprehensive collection of passenger feedback on airline services during the COVID-19 pandemic. This dataset includes user-generated reviews that capture sentiments, preferences, and concerns, allowing for an in-depth analysis of shifting customer priorities in response to pandemic-related disruptions. By examining these reviews, the dataset facilitates the study of evolving passenger expectations, changes in service perceptions, and the airline industry's adaptive strategies. The dataset offers valuable insights into how the pandemic has reshaped customer experiences and behaviors, supporting research on resilience and innovation in the airline sector for Aviation 5.0.

Instructions: 

TripAdvisor Airline Review Dataset (2016-2023) - Pandemic Analysis

Overview

The TripAdvisor Airline Review Dataset covers passenger feedback from 2016 to 2023, providing insights into the evolving priorities and concerns of airline passengers in response to the COVID-19 pandemic. The dataset includes reviews from a variety of airlines, with user-generated comments and ratings reflecting changes in customer satisfaction, service expectations, and experiences during the pandemic period.

Dataset Contents

The dataset Top10_airlines_reviews(2016-2023Jul).txt contains the following details:

  • the main data on reviews and ratings.
  • airlines and their respective details.
  • information on the date and time of each review submission
  • reviewer metadata

Data Structure Documentation

Column Name

Description

review_id

Unique identifier for each review.

airline_id

Unique identifier for the airline.

reviewer_id

Unique identifier for each reviewer

rating

Rating given by the passenger (1-5 scale).

review_date

The date the review was submitted (format: YYYY-MM-DD).

review_title

Title of the review written by the passenger.

review_text

Full text of the review written by the passenger.

trip_details

Destination | boarder | cabin class

aspect_rating

Rating given by the passenger for 8 aspects (1-5 scale).

reviewer_profile

Hyperlink to reviewer’s profile

user_reviews

Hyperlink to reviews submitted by the passenger

airline_reviews

Hyperlink to reviews received by the airline

 

 

Instructions for Using the Dataset

1. Download the Dataset

The dataset can be downloaded from the IEEE DataPort repository. Once downloaded, unzip the files to access the TXT file.

2. Loading Data

You can load the dataset into any data analysis tool such as Python (using Pandas) or R. Here’s an example of how to load the reviews.csv file in Python:

import pandas as pd

 

reviews_df = pd.read_csv('Top10_airlines_reviews(2016-2023Jul).txt')

print(reviews_df.head())

3. Preprocessing

To analyze the reviews, some preprocessing is required to clean and prepare the data. The preprocessing_scripts/ directory includes scripts to:

  • Remove stop words and irrelevant characters from the review text.
  • Handle missing or incomplete data entries.
  • Perform sentiment analysis and classify reviews into positive, negative, or neutral categories.

Example preprocessing in Python:

import preprocessing_script as prep

 

clean_reviews = prep.clean_reviews(reviews_df)

4. Analyzing Sentiments

Sentiment analysis can be performed on the review text using machine learning models or NLP techniques. A sample script to perform sentiment analysis can be found in the preprocessing_scripts/sentiment_analysis.py file.

5. Analyzing COVID-19 Impact

The dataset includes a covid-impact years to detect the impact of the pandemic. Use specific years to filter reviews that specifically discuss COVID-related changes:

covid_reviews = reviews_df[reviews_df['Year'] >=2020]

print(covid_reviews.head())

6. Visualizing Trends

To analyze the temporal changes in reviews and sentiment, you can aggregate the data by year, month, or day. Use tools like Matplotlib or Seaborn in Python to create visualizations:

import matplotlib.pyplot as plt

 

# Plot sentiment over time

reviews_df.groupby('year')['sentiment'].value_counts().unstack().plot(kind='bar', stacked=True)

plt.title('Sentiment Distribution Over Time')

plt.xlabel('Year')

plt.ylabel('Number of Reviews')

plt.show()

Citation

If you use this dataset in your research, please cite it as follows:

  • Siu-Hin Ng. (2025). TripAdvisor Airline Review Dataset (2016-2023) - Pandemic Analysis. IEEE DataPort.