🔥 Effective Training Data Synthesis for Improving MLLM Chart Understanding

ICCV 2025 (poster)

1Australian National University   2Ohio State University   3Cisco   4Johns Hopkins University
teaser image

✨ Abstract

Being able to effectively read scientific plots, or chart understanding, is a central part toward building effective agents for science. However, existing multimodal large language models (MLLMs), especially open-source ones, are still falling behind with a typical success rate of 30%-50% on challenging benchmarks. Previous studies on fine-tuning MLLMs with synthetic charts are often restricted by their inadequate similarity to the real charts, which could compromise model training and performance on complex real-world charts. In this study, we show that modularizing chart generation and diversifying visual details improves chart understanding capabilities. In particular, we design a five-step data synthesis pipeline, where we separate data and function creation for single plot generation, condition the generation of later subplots on earlier ones for multi-subplot figures, visually diversify the generated figures, filter out low quality data, and finally generate the question-answer (QA) pairs with GPT-4o. This approach allows us to streamline the generation of fine-tuning datasets and introduce the Effective Chart Dataset (ECD), which contains 10k+ chart images and 300k+ QA pairs, covering 25 topics and featuring 250+ chart type combinations with high visual complexity. We show that ECD consistently improves the performance of various MLLMs on a range of real-world and synthetic test sets.

🏆 Our ECDBench

Evaluating Multimodal LLM performance on scientific charts with descriptive + reasoning Q&A.

Overview

Model Performance Comparison (%)

Model Reasoning Descriptive Average
o4‑mini 57.03 77.45 67.24
o3 56.13 74.51 65.32
Gemini‑2.5‑Pro 44.36 76.88 60.62
o1 40.52 74.18 57.35
Claude-4-Sonnet 44.20 69.36 56.78
Claude-3.7-Sonnet 43.38 69.61 56.50
Claude-3.5-Sonnet 41.99 68.14 55.07
Qwen2.5‑VL‑72B 38.81 68.46 53.64
GPT‑4o 35.62 70.18 52.90
GPT‑4o‑mini 24.26 57.27 40.77
Qwen2.5‑VL‑32B 24.92 53.92 39.42
Qwen2.5‑VL‑7B 19.04 57.35 38.19
Random (GPT‑4o) 4.58 1.63 3.10

Effects of ECD Supervised Fine‑Tuning

Model Reasoning (Before / After) Descriptive (Before / After) Average (Before / After)
LLaVA‑Next‑Llama3‑8B 4.74 / 16.50 17.16 / 46.65 10.95 / 31.58
MiniCPM‑V2.6 15.15 / 18.14 39.95 / 52.21 27.53 / 35.17
Phi‑3‑Vision 21.65 / 29.49 41.18 / 59.31 31.41 / 44.40
Qwen2.5‑VL‑7B 19.04 / 35.38 57.35 / 66.34 38.19 / 50.86

Citation

@inproceedings{yang2025effective,
  title={Effective Training Data Synthesis for Improving MLLM Chart Understanding},
  author={Yang, Yuwei and Zhang, Zeyu and Hou, Yunzhong and Li, Zhuowan and Liu, Gaowen and Payani, Ali and Ting, Yuan-Sen and Zheng, Liang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}