DPO is extended to novel multimodal scenarios. Researchers demonstrate DPO's effectiveness in aligning large multimodal models (LMMs) with human preferences on image generation and captioning tasks, showing competitive results against established fine-tuning methods. This work opens new avenues for applying preference-based alignment to a broader range of AI applications.
Opening Kapyn…