A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Citation Author(s):
Zhen Zhang
Submitted by:
Zhen Zhang
Last updated:
Thu, 11/08/2018 - 10:34
DOI:
10.21227/fpkq-za03
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The dataset contains the simulation results on two stochastic games -- box pushing and distributed sensor network (DSN). The setting of parameters is given in the manuscript named "A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents".  

Instructions: 

here two folders -- box pushing and DSN in the data folder. The simulation results for each task is in the correspongding folders. 

 

There are four folders -- EMA, PMR-EGA, single-agent RL and WoLF-PHC under the above two folers, corresponding to results of the four algorithms.

 

An example:

Box pushing:  PMR-EGA:

 

Within the folder of data/box pushing/PMR-EGA, the folders named 10w, 20w, 40w, 80w, 120w store the results through learning by PMR-EGA after 10w, 20w, 40w, 80w, 120w episodes respectively.  EMA, single-agent RL and WoLF-PHC have the similar folders within themselves.

 

Under the path of data/box pushing/PMR-EGA/10w, there are files as follows:

 

all_avr_step.txt  -- the average steps during evaluation, which is shown in Table I of the paper.

averageReward.dat -- the sliding average reward during learning

averageStep.dat -- the sliding average steps during learning

avr_successRate.dat -- the average success rate during evaluation. It is shown in Table II of the paper.

avr_successTimes.dat -- the average success times during evaluation. 

completeRecord.dat -- for degugging

Q_single_Sarsa.dat -- the Q-table used to evaluate the gradient. Each row represents a state, and each column represents the Q-vlue of a joint action under each state. 

QTable_agent1.dat -- the Q-table of agent 1, which is used as the strategy of agent 1, Each row represents a state, and  each column represents the Q-vlue of its own action under each state. 

QTable_agent2.dat -- the Q-table of agent 2, which is used as the strategy of agent 2.

QTable_agent3.dat -- the Q-table of agent 3, which is used as the strategy of agent 3.

QTable_agent4.dat -- the Q-table of agent 4, which is used as the strategy of agent 4.

successRate.dat -- Each row represents the average success rate during evaluation for each run.

successTimes.dat -- Each row represents the average success times during evaluation for each run.

updateQTime.dat --for debugging

 

The above files in the other folders within box pushing have the same meaning, except that for single-agent RL, the following file has other meanings:

Q_single_Sarsa.dat -- the Q-table of the joint actions, which is used as joint strategies of all agents. Each row represents a state, and each column represents the Q-vlue of a joint action under each state. 

 

 

An example:

DSN:  PMR-EGA:

Within the folder of data/DSN/PMR-EGA, the folders named 20w, 30w, 40w store the results through learning by PMR-EGA after 20w, 30w, 40w episodes respectively.  EMA, single-agent RL and WoLF-PHC have the similar folders within themselves.

 

Under the path of data/box pushing/PMR-EGA/10w, there are files as follows:

 

all_avr_reward.dat --  the average cumulative reward during evaluation, which is shown in Table IV of the paper.

all_avr_step.dat  -- the average steps during evaluation, which is shown in Table V of the paper.

all_avr_successRate.dat -- the average success rate during evaluation, which is shown in Table III of the paper.

averageReward.dat -- the sliding average reward during learning

averageStep.dat -- the sliding average steps during learning

avr_reward.dat -- Each row represents the average cumulative reward during evaluation for each run. 

avr_step.dat -- Each row represents the average steps during evaluation for each run. 

completeRecord.dat -- for degugging

QTable_agent0.dat -- the Q-table of agent 0, which is used as the policy of agent 0, Each row represents a state, and                                     each column represents the Q-vlue of its own action under each state. 

QTable_agent1.dat -- the Q-table of agent 1, which is used as the strategy of agent 1.

QTable_agent2.dat -- the Q-table of agent 2, which is used as the strategy of agent 2.

QTable_agent3.dat -- the Q-table of agent 3, which is used as the strategy of agent 3.

QTable_agent4.dat -- the Q-table of agent 4, which is used as the strategy of agent 4.

QTable_agent5.dat -- the Q-table of agent 5, which is used as the strategy of agent 5.

QTable_agent6.dat -- the Q-table of agent 6, which is used as the strategy of agent 6.

QTable_agent7.dat -- the Q-table of agent 7, which is used as the strategy of agent 7.

successRate.dat -- Each row represents the average success rate during evaluation for each run.

successTimes.dat -- Each row represents the average success times during evaluation for each run.

 

 

The above files in the other folders within DSN have the same meaning.

 

 

Documentation

AttachmentSize
File The detail of dataset4.39 KB