International Conference on AI, Data Science, Cybersecurity, Cloud Architectures, and Software Engineering

Aladin Djuhera Profile

Aladin Djuhera

Biography

I am a passionate PhD student at TUM, specializing in Advanced Machine & Deep Learning. My academic research interests lie in Large
Language Model Post-Training, hardware-efficient AI, multi-task model fine-tuning, and reasoning. I have gained significant software
engineering experience during my time at Audi, BMW and IBM, and have also gained research-academic experience in the field of
distributed AI, in particular federated learning and model fusion. With IBM I have filed over 11 patents in the areas of distributed AI
systems, hardware-efficient & privacy preserving distributed inference, and federated foundation model training.

Research Interest

Advances in LLM Post-Training: The Role of High-Quality Data Curation Recipes

Abstract

Post-training is central to aligning large language models (LLMs) with human preferences, spanning supervised fine-tuning (SFT) and preference optimization. Recent studies show that beyond algorithms, the composition and quality of datasets determine how well models generalize across reasoning, math, coding, and instruction-following tasks. In this talk, I will highlight systematic approaches to dataset analysis and curation, including quality annotation frameworks, reward-based filtering, and task-aware balancing. I will discuss how principled curation recipes enabled us to produce leaner mixtures that outperform state-of-the-art open mixtures, while reducing compute cost. The talk concludes with open challenges and opportunities for transparent, reproducible, and data-centric LLM alignment.