kapynResearch

Direct Preference Optimization Beyond Chatbots

DPO is extended to novel multimodal scenarios. Researchers demonstrate DPO's effectiveness in aligning large multimodal models (LMMs) with human preferences on image generation and captioning tasks, showing competitive results against established fine-tuning methods. This work opens new avenues for applying preference-based alignment to a broader range of AI applications.

Hugging Face·Jun 3, 2026

Opening Kapyn…