Our video action dataset is generated using a 3D simulation program developed in Unity. Each data sample consists of a video capturing a human performing various actions. Our initial set of actions comprises a total of 10 different yoga poses: camel, chair, child's pose, lord of the dance, lotus, thunderbolt, triangle, upward dog, warrior II, and warrior III. Within each of these 10 yoga poses, there are four variations, some exhibiting more pronounced differences than others. This results in a total of 40 action types within our dataset.