A dataset from semiconductor assembly and testing processes is used to evaluate the model selection prediction method. The response variable refers to the throughput rate of a specific machine–product combination in one of the assembly and testing process steps based on historical data. This data set includes 1 response variable, 5 categorical machine and product attributes and 11 numerical attributes. The dataset contains 13186 observations.