#01
multiple sources
research / multimodal / reasoning
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model Enhancement
arXiv:2412.01282v2 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) bring powerful understanding and reasoning capabilities to multimodal tasks. Meanwhile, the great need for capable aritificial intel
10 sources
merged into one brief so readers can compare the reporting quickly.