cart-pole; expert demonstrations; human behavioral data; cognitive modeling; imitation learning

The instantaneous state (situation) of the game was constituted by four values: the cart position, the cart speed, the pole angle to the vertical axis, and the pole angular velocity.

For each action taken by the human player in the game, a tuple containing the four values representing the current game situation, along with the action and reward obtained (utility), is recorded as a situation-decision-utility (SDU) tuple.

3 types of actions have been recorded: Move left (-1), move rght (1) and no action (0).

Categories:
11 Views