A composite dataset with eight videos (totaling the pronunciation of seventeen words, with intervals, sagittal plane, and gray scale), for experiments in computer vision, video processing, and articulation investigation of the vocal tract.