Online Shoppers Purchasing Intention Dataset

0
0 ratings - Please login to submit your rating.

Abstract 

  • The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period.
  • Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.
  • The dataset consists of 10 numerical and 8 categorical attributes. The 'Revenue' attribute can be used as the class label.

The dataset contains 18 columns, each representing specific attributes of online shopping behavior:

  • Administrative and Administrative_Duration: Number of pages visited and time spent on administrative pages.
  • Informational and Informational_Duration: Number of pages visited and time spent on informational pages.
  • ProductRelated and ProductRelated_Duration: Number of pages visited and time spent on product-related pages.
  • BounceRates and ExitRates: Metrics indicating user behavior during the session.
  • PageValues: Value of the page based on e-commerce metrics.
  • SpecialDay: Likelihood of shopping based on special days.
  • Month: Month of the session.
  • OperatingSystems, Browser, Region, TrafficType: Technical and geographical attributes.
  • VisitorType: Categorizes users as returning, new, or others.
  • Weekend: Indicates if the session occurred on a weekend.
  • Revenue: Target variable indicating whether a transaction was completed (True or False).

The original dataset has been picked up from the UCI Machine Learning Repository, the link to which is as follows :

https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+inten...

Additional Variable Information

The dataset consists of 10 numerical and 8 categorical attributes. The 'Revenue' attribute can be used as the class label. "Administrative", "Administrative Duration", "Informational", "Informational Duration", "Product Related" and "Product Related Duration" represent the number of different types of pages visited by the visitor in that session and total time spent in each of these page categories. The values of these features are derived from the URL information of the pages visited by the user and updated in real time when a user takes an action, e.g. moving from one page to another. The "Bounce Rate", "Exit Rate" and "Page Value" features represent the metrics measured by "Google Analytics" for each page in the e-commerce site. The value of "Bounce Rate" feature for a web page refers to the percentage of visitors who enter the site from that page and then leave ("bounce") without triggering any other requests to the analytics server during that session. The value of "Exit Rate" feature for a specific web page is calculated as for all pageviews to the page, the percentage that were the last in the session. The "Page Value" feature represents the average value for a web page that a user visited before completing an e-commerce transaction. The "Special Day" feature indicates the closeness of the site visiting time to a specific special day (e.g. Mother’s Day, Valentine's Day) in which the sessions are more likely to be finalized with transaction. The value of this attribute is determined by considering the dynamics of e-commerce such as the duration between the order date and delivery date. For example, for Valentina’s day, this value takes a nonzero value between February 2 and February 12, zero before and after this date unless it is close to another special day, and its maximum value of 1 on February 8. The dataset also includes operating system, browser, region, traffic type, visitor type as returning or new visitor, a Boolean value indicating whether the date of the visit is weekend, and month of the year.

Instructions: 
  • The dataset is provided as a CSV file named online_shoppers_intention.csv.
  • Load the dataset using any CSV-compatible software or programming language. For example, in Python, you can use the pandas library:

               import pandas as pd

               data = pd.read_csv('online_shoppers_intention.csv')

  • Dataset Applications:

    • Predict customer purchasing intent using machine learning models.
    • Analyze user behavior based on session-level attributes.
    • Identify patterns in e-commerce engagement by various demographics.
  • Ensure any preprocessing, such as handling categorical variables or normalizing numerical data, aligns with your analysis objectives.
  • No missing values are present, so the dataset is ready for immediate use.

Comments

This dataset contains detailed information about online shopping behavior and purchasing intention.

Submitted by Dinesh Vishwakarma on Thu, 01/09/2025 - 13:20