Datasets
Standard Dataset
TripAdvisor Airline Reviews from Year 2016 to 2023
- Citation Author(s):
- Submitted by:
- Siu Hin Ng
- Last updated:
- Fri, 01/31/2025 - 13:55
- DOI:
- 10.21227/093a-gz50
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
The TripAdvisor online airline review dataset, spanning from 2016 to 2023, provides a comprehensive collection of passenger feedback on airline services during the COVID-19 pandemic. This dataset includes user-generated reviews that capture sentiments, preferences, and concerns, allowing for an in-depth analysis of shifting customer priorities in response to pandemic-related disruptions. By examining these reviews, the dataset facilitates the study of evolving passenger expectations, changes in service perceptions, and the airline industry's adaptive strategies. The dataset offers valuable insights into how the pandemic has reshaped customer experiences and behaviors, supporting research on resilience and innovation in the airline sector for Aviation 5.0.
TripAdvisor Airline Review Dataset (2016-2023) - Pandemic Analysis
Overview
The TripAdvisor Airline Review Dataset covers passenger feedback from 2016 to 2023, providing insights into the evolving priorities and concerns of airline passengers in response to the COVID-19 pandemic. The dataset includes reviews from a variety of airlines, with user-generated comments and ratings reflecting changes in customer satisfaction, service expectations, and experiences during the pandemic period.
Dataset Contents
The dataset Top10_airlines_reviews(2016-2023Jul).txt contains the following details:
- the main data on reviews and ratings.
- airlines and their respective details.
- information on the date and time of each review submission
- reviewer metadata
Data Structure Documentation
Column Name
Description
review_id
Unique identifier for each review.
airline_id
Unique identifier for the airline.
reviewer_id
Unique identifier for each reviewer
rating
Rating given by the passenger (1-5 scale).
review_date
The date the review was submitted (format: YYYY-MM-DD).
review_title
Title of the review written by the passenger.
review_text
Full text of the review written by the passenger.
trip_details
Destination | boarder | cabin class
aspect_rating
Rating given by the passenger for 8 aspects (1-5 scale).
reviewer_profile
Hyperlink to reviewer’s profile
user_reviews
Hyperlink to reviews submitted by the passenger
airline_reviews
Hyperlink to reviews received by the airline
Instructions for Using the Dataset
1. Download the Dataset
The dataset can be downloaded from the IEEE DataPort repository. Once downloaded, unzip the files to access the TXT file.
2. Loading Data
You can load the dataset into any data analysis tool such as Python (using Pandas) or R. Here’s an example of how to load the reviews.csv file in Python:
import pandas as pd
reviews_df = pd.read_csv('Top10_airlines_reviews(2016-2023Jul).txt')
print(reviews_df.head())
3. Preprocessing
To analyze the reviews, some preprocessing is required to clean and prepare the data. The preprocessing_scripts/ directory includes scripts to:
- Remove stop words and irrelevant characters from the review text.
- Handle missing or incomplete data entries.
- Perform sentiment analysis and classify reviews into positive, negative, or neutral categories.
Example preprocessing in Python:
import preprocessing_script as prep
clean_reviews = prep.clean_reviews(reviews_df)
4. Analyzing Sentiments
Sentiment analysis can be performed on the review text using machine learning models or NLP techniques. A sample script to perform sentiment analysis can be found in the preprocessing_scripts/sentiment_analysis.py file.
5. Analyzing COVID-19 Impact
The dataset includes a covid-impact years to detect the impact of the pandemic. Use specific years to filter reviews that specifically discuss COVID-related changes:
covid_reviews = reviews_df[reviews_df['Year'] >=2020]
print(covid_reviews.head())
6. Visualizing Trends
To analyze the temporal changes in reviews and sentiment, you can aggregate the data by year, month, or day. Use tools like Matplotlib or Seaborn in Python to create visualizations:
import matplotlib.pyplot as plt
# Plot sentiment over time
reviews_df.groupby('year')['sentiment'].value_counts().unstack().plot(kind='bar', stacked=True)
plt.title('Sentiment Distribution Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Reviews')
plt.show()
Citation
If you use this dataset in your research, please cite it as follows:
- Siu-Hin Ng. (2025). TripAdvisor Airline Review Dataset (2016-2023) - Pandemic Analysis. IEEE DataPort.