Effective Training Data Synthesis for Improving MLLM Chart Understanding

Being able to effectively read scientific plots, or chart understanding, is a central part toward building effective agents for science. However, existing multimodal large language models (MLLMs), especially open-source ones, are still falling behind with a typical success rate of 30%-50% on challenging benchmarks. Previous studies on fine-tuning MLLMs with synthetic charts are often restricted by their inadequate similarity to the real charts, which could compromise model training and performance on complex real-world charts. In this study, we show that modularizing chart generation and diversifying visual details improves chart understanding capabilities. In particular, we design a five-step data synthesis pipeline, where we separate data and function creation for single plot generation, condition the generation of later subplots on earlier ones for multi-subplot figures, visually diversify the generated figures, filter out low quality data, and finally generate the question-answer (QA) pairs with GPT-4o. This approach allows us to streamline the generation of fine-tuning datasets and introduce the Effective Chart Dataset (ECD), which contains 10k+ chart images and 300k+ QA pairs, covering 25 topics and featuring 250+ chart type combinations with high visual complexity. We show that ECD consistently improves the performance of various MLLMs on a range of real-world and synthetic test sets.

Model	Reasoning	Descriptive	Average
o4‑mini	57.03	77.45	67.24
o3	56.13	74.51	65.32
Gemini‑2.5‑Pro	44.36	76.88	60.62
o1	40.52	74.18	57.35
Claude-4-Sonnet	44.20	69.36	56.78
Claude-3.7-Sonnet	43.38	69.61	56.50
Claude-3.5-Sonnet	41.99	68.14	55.07
Qwen2.5‑VL‑72B	38.81	68.46	53.64
GPT‑4o	35.62	70.18	52.90
GPT‑4o‑mini	24.26	57.27	40.77
Qwen2.5‑VL‑32B	24.92	53.92	39.42
Qwen2.5‑VL‑7B	19.04	57.35	38.19
Random (GPT‑4o)	4.58	1.63	3.10

Model	Reasoning (Before / After)	Descriptive (Before / After)	Average (Before / After)
LLaVA‑Next‑Llama3‑8B	4.74 / 16.50	17.16 / 46.65	10.95 / 31.58
MiniCPM‑V2.6	15.15 / 18.14	39.95 / 52.21	27.53 / 35.17
Phi‑3‑Vision	21.65 / 29.49	41.18 / 59.31	31.41 / 44.40
Qwen2.5‑VL‑7B	19.04 / 35.38	57.35 / 66.34	38.19 / 50.86

🔥 Effective Training Data Synthesis for Improving MLLM Chart Understanding

ICCV 2025 (poster)

✨ Abstract

📊 Our Chart Generation Pipeline — Key Insights

📊 Effective Chart Dataset (ECD) — Key Highlights

📊 Comparing with existing datasets

📊 Comparing MLLMs and Training Sets — Key Insights

🏆 Our ECDBench

Overview

Model Performance Comparison (%)

Effects of ECD Supervised Fine‑Tuning

Citation