CGCSDD: Cloud Gaming Client-Server Delay Dataset
This is a dataset of client-server Round Trip Time delays of an actual cloud gaming tournament run on the infrastructure of the cloud gaming company Swarmio Inc. The dataset can be used for designing algorithms and tuning models for user-server allocation and server selection. To collect the dataset, tournament players were connected to Swarmio servers and delay measurements were taken in real time and actual networking conditions. The dataset consists of two subsets: the main dataset contains network delays between each of 189 players around the world to each of 9 different Swarmio servers. The secondary dataset contains the delays between each of 67 players to each of 11 servers around the world. As an example demonstration, we use the dataset to test and report the results of our player-server fair allocation algorithm.
For the main dataset, the 189 players and the 9 servers were distributed among 4 different regions: North America, South America, Europe, East Asia. The 9 servers were located in the following cities with their acronyms in the dataset:
- Santa Clara (nasc),
- Chicago (nach),
- Dallas (nada),
- Toronto (nato),
- Brazil (sabr),
- London (uk),
- Amsterdam (nl),
- Hong Kong (hk),
- Singapore (sg).
Each of the 189 players were able to connect to each of the 9 servers. The following data is registered for each player:
- User Identifier (in the field: user_id)
- Time of access (in the field: timestamp)
- Longitude (in the field: longitude)
- Latitude (in the field: latitude)
- IP Address (in the field: address)
- Access Support Network or Internet Service Provider (in the field: asn_org)
In the dataset file main-dataset.json, every record contains the network delay measurements from a particular player to each of the 9 servers. It should be noted that the URLs and the IP addresses of the servers are provided in a separate file main-dataset-servers.json.
The user ID is a unique 32-character identifier that is generated for each player; for example, 5193b0e1-2412-4338-ac8d-6f519049aa77. The time of access is based on the Unix timestamp which is counted in seconds January 1, 1970; for example, 1528484445170. Longitude and latitude are based on the geo-location of the player; for example, "longitude": "121.0409", "latitude": "14.5832". The Access Support Network is the ISP network in which the player is registered, for example Rogers Communications Canada Inc, Philippine Long Distance Telephone Company, AT&T Services Inc., tec.
Each measurement consisted of sending 11 packets from the player to the server, and the following measurements were obtained (all in ms):
- Median latency/delay (in the field: latency)
- Delay jitter (in the field: jitter)
- Minimum obtained delay (in the field: min)
- Maximum obtained delay (in the field: max)
- Average obtained delay (in the field: avr)
It should be noted that out of the 9 servers, only the 1st server (“nl”) was used for testing the connection, and that can be noted from the field “testing” having the value of “1”. Therefore, the value of “stats” for the first server will have no measurements.
For the secondary dataset, we set up 11 different servers: 1 server owned by Swarmio Media in Toronto and 10 servers using the AWS cloud in the following locations:
- North Virginia,
- Northern California,
- Sydney, AU
The same script as the main dataset was run in the Swarmio client software of 67 players. This time, each server sent 8 packets to each player, and only the average delay was recorded and stored.
The secondary dataset consists of the JSON file secondary-dataset.json, where the keys are the names of the servers, and the values contain a list of the delays to the 67 players. The players IPs are provided in order in a separate file secondary-dataset-users.json. It is also possible to reuse the code that was used to retrieve the measurements in the file HostsUsersRTT.py . The IP addresses of the 11 servers can also be accessed in the file secondary-dataset-servers.json where the key of the record will have the name of the server; for example “N Virginia”, and the value will have the IP address of the server
In contrast to the main dataset, the secondary dataset contains only the delay between the servers and the players whereas the main dataset has more information such as the geo-location and the ISP. This makes the secondary dataset more suitable for testing and verification due to having a single label with only 2 features (IP addresses and city names), while the main dataset contains more features and measurements suitable for training and inference.