Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a smaller language model (student) from a larger one (teacher). Previous research has introduced several distillation methods for both generating training data and training the student model.
Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a smaller language model (student) from a larger one (teacher). Previous research has introduced several distillation methods for both generating training data and training the student model.