A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Citation Author(s):
Zhen Zhang
Submitted by:
Zhen Zhang
Last updated:
Thu, 11/08/2018 - 10:34
DOI:
10.21227/fpkq-za03
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Abstract: 

The dataset contains the simulation results on two stochastic games -- box pushing and distributed sensor network (DSN). The setting of parameters is given in the manuscript named "A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents".  

Instructions: 

here two folders -- box pushing and DSN in the data folder. The simulation results for each task is in the correspongding folders. 

 

There are four folders -- EMA, PMR-EGA, single-agent RL and WoLF-PHC under the above two folers, corresponding to results of the four algorithms.

 

An example:

Box pushing:  PMR-EGA:

 

Within the folder of data/box pushing/PMR-EGA, the folders named 10w, 20w, 40w, 80w, 120w store the results through learning by PMR-EGA after 10w, 20w, 40w, 80w, 120w episodes respectively.  EMA, single-agent RL and WoLF-PHC have the similar folders within themselves.

 

Under the path of data/box pushing/PMR-EGA/10w, there are files as follows:

 

all_avr_step.txt  -- the average steps during evaluation, which is shown in Table I of the paper.

averageReward.dat -- the sliding average reward during learning

averageStep.dat -- the sliding average steps during learning

avr_successRate.dat -- the average success rate during evaluation. It is shown in Table II of the paper.

avr_successTimes.dat -- the average success times during evaluation. 

completeRecord.dat -- for degugging

Q_single_Sarsa.dat -- the Q-table used to evaluate the gradient. Each row represents a state, and each column represents the Q-vlue of a joint action under each state. 

QTable_agent1.dat -- the Q-table of agent 1, which is used as the strategy of agent 1, Each row represents a state, and  each column represents the Q-vlue of its own action under each state. 

QTable_agent2.dat -- the Q-table of agent 2, which is used as the strategy of agent 2.

QTable_agent3.dat -- the Q-table of agent 3, which is used as the strategy of agent 3.

QTable_agent4.dat -- the Q-table of agent 4, which is used as the strategy of agent 4.

successRate.dat -- Each row represents the average success rate during evaluation for each run.

successTimes.dat -- Each row represents the average success times during evaluation for each run.

updateQTime.dat --for debugging

 

The above files in the other folders within box pushing have the same meaning, except that for single-agent RL, the following file has other meanings:

Q_single_Sarsa.dat -- the Q-table of the joint actions, which is used as joint strategies of all agents. Each row represents a state, and each column represents the Q-vlue of a joint action under each state. 

 

 

An example:

DSN:  PMR-EGA:

Within the folder of data/DSN/PMR-EGA, the folders named 20w, 30w, 40w store the results through learning by PMR-EGA after 20w, 30w, 40w episodes respectively.  EMA, single-agent RL and WoLF-PHC have the similar folders within themselves.

 

Under the path of data/box pushing/PMR-EGA/10w, there are files as follows:

 

all_avr_reward.dat --  the average cumulative reward during evaluation, which is shown in Table IV of the paper.

all_avr_step.dat  -- the average steps during evaluation, which is shown in Table V of the paper.

all_avr_successRate.dat -- the average success rate during evaluation, which is shown in Table III of the paper.

averageReward.dat -- the sliding average reward during learning

averageStep.dat -- the sliding average steps during learning

avr_reward.dat -- Each row represents the average cumulative reward during evaluation for each run. 

avr_step.dat -- Each row represents the average steps during evaluation for each run. 

completeRecord.dat -- for degugging

QTable_agent0.dat -- the Q-table of agent 0, which is used as the policy of agent 0, Each row represents a state, and                                     each column represents the Q-vlue of its own action under each state. 

QTable_agent1.dat -- the Q-table of agent 1, which is used as the strategy of agent 1.

QTable_agent2.dat -- the Q-table of agent 2, which is used as the strategy of agent 2.

QTable_agent3.dat -- the Q-table of agent 3, which is used as the strategy of agent 3.

QTable_agent4.dat -- the Q-table of agent 4, which is used as the strategy of agent 4.

QTable_agent5.dat -- the Q-table of agent 5, which is used as the strategy of agent 5.

QTable_agent6.dat -- the Q-table of agent 6, which is used as the strategy of agent 6.

QTable_agent7.dat -- the Q-table of agent 7, which is used as the strategy of agent 7.

successRate.dat -- Each row represents the average success rate during evaluation for each run.

successTimes.dat -- Each row represents the average success times during evaluation for each run.

 

 

The above files in the other folders within DSN have the same meaning.

 

 

s