The performance of most of the classification models is dependent on the data used for training. The data must be reliable, robust and meticulously labelled. In order to form such a data a systematical approach has been designed and moreover, it should be. The data set was collected from a well-known source, namely Center for Language Engineering available at http://www.cle.org.pk.