LLM optimization for "Santa 2024" Kaggle competition

The Challenge

Kaggle competition Santa 2024. My goal was not to get the highest possible score but to optimize and compare different LLM-based systems to see what’s the best score reachable for an LLM and study their differences.

My Solution

I produced 7 notebooks where I compare different LLM systems and models:

Explored Claude Haiku and Sonnet in single call and Agentic architectures
Compared perplexity scores of baseline VS random order
Prompt optimization on Claude Haiku using DSPY framework
Comparison of DeepSeek-R1-Distill-Llama-8B VS Llama-8B models
Synthetic data generation using LLMs for creating a finetuning dataset
Finetuning of DeepSeek-R1-Distill-Llama-8B on Google Colab
Evaluate my finetuned version of DeepSeek-R1-Distill-Llama-8B

The Outcome

I reached the best-performing system after finetuning DeepSeek-R1-Distill-Llama-8B. This model performed better than a large commercial LLM like Claude Sonnet.

My Kaggle notebook for prompt optimization using DSPY obtained a bronze medal.