Proactive Identification Datasets - Kubernetes Performance of Boutique during Co-Location and Consolidation

Citation Author(s):
Yar
Rouf
York University
Submitted by:
Yar Rouf
Last updated:
Tue, 01/21/2025 - 14:07
DOI:
10.21227/bd3s-2z25
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Performance models identified at run-time can be used by self-adaptive software systems to execute decisions on a cloud environment. These performance models are built by measuring the control inputs, disturbances, and outputs of the controlled system. These models have been shown to accurately interpolate for data already seen by the model identification method. However, automation in cloud operations can push the environment into operational regions the system has not seen, thus the performance model may not accurately extrapolate into unseen regions. The unexplored operational regions can be the result of an expansion in the environment with the deployment of a co-located application or a reduction in environment resources with cloud consolidation. With more modern applications deployed on large-scale Kubernetes clusters, scaling up and down of applications is quite common. We propose a proactive dynamic model identification technique to predict the impact of cloud consolidation and co-location for large at-scale deployments. The method uses a Look-Ahead Scanner (LAS) mechanism that explores different operational regions through controllable perturbations at run-time on multiple cluster nodes. We evaluated the proposed method on realistic applications deployed in a large-scale cluster on public clouds. The datasets contain the performance metrics of a Kubernetes Cluster for Co-Location experiments, and Consolidation experiments used to build our performance models.

Instructions: 

There are two main directories. The first directory contains the Co-Location experiments, and the second directory contains the Consolidation experiments. In the Co-Location directory, the "[1] Boutique Data at O(0)" folder contains the O0 datasets for when the in-production application, Boutique, is only running on our environment. The "[2] Boutique + LAS at O(t)" folder contains the Ot datasets where the LAS is deployed alongside Boutique. The "[3] Boutique + Acme-Air at O(t)" folder contains the dataset where the co-located Acme-Air application is deployed alongside Boutique, and its measured response time is compared with the predicted response time from our LAS-based model. There are two additional folders containing the raw data before pre-processing and the synthetic data generated by Tabgan.

In the Consolidation directory, the "[1] Boutique - 4 Nodes Data at O(0)" folder contains the O0 datasets for when the in-production application, Boutique, is only running on our environment in a 4 node cluster. The "[4] Boutique - LAS Data (3 Nodes) at O(t)" folder contains the Ot datasets where the LAS is deployed alongside Boutique on a 4 node cluster and induces load representative of a 3 node cluster. The "[5] Boutique - LAS Data (2 Nodes) at O(t)" folder contains the Ot datasets where the LAS is deployed alongside Boutique on a 4 node cluster and induces load representative of a 2 node cluster.

The "[2] Boutique - 3 Nodes Data at O(t)" folder contains the dataset where Boutique is deployed on a 3 node cluster, and its measured response time is compared with the predicted response time from our LAS-based model. The "[3] Boutique - 2 Nodes Data at O(t)" folder contains the dataset where Boutique is deployed on a 2 node cluster, and its measured response time is compared with the predicted response time from our LAS-based model. There is an additional folder containing the raw data before pre-processing.