Back to blog
InsightsNov 16, 20253 min read

I Made a Mistake About TOON (And Here's the Data That Proved Me Wrong)

This article revisits earlier views about TOON, presenting data that corrected previous misconceptions and offered a new perspective.

Updated Apr 1, 2026
I Made a Mistake About TOON (And Here's the Data That Proved Me Wrong)

I rushed into discussing a topic without conducting proper research.

A few months ago, I read that TOON is the new way to talk to AI—efficient, fast, and easy to read all in one. Oh boy, I was naive thinking that one parameter is enough to switch completely from my beloved JSON.

Now I feel I cheated, and I hope you'll forgive me. More importantly, I want to show you the data that changed my mind.

Why I Got Excited About TOON

If you haven't already, check my previous blog post where I praised TOON for being an amazing format to save tokens (money). The promise was that using TOON would use 30-60% fewer tokens with the same or better accuracy.

That sounded like a no-brainer. Who wouldn't want that?

The Question That Started My Doubt

I proudly shared my blog post on LinkedIn. Then Till Simon asked a deceptively simple question: "Does the LLM respond with the same quality?"

That question haunted me. So I dug deeper.

What the Data Actually Shows

Here's where things get interesting—and complicated.

The Official TOON Benchmarks:

  • TOON: 68.7% accuracy with 4,389 tokens
  • JSON: 65.7% accuracy with 7,260 tokens
  • Token savings: 39.5%

Looks great, right? TOON wins on both metrics.

But Independent Testing Tells a Different Story:

When improvingagents.com tested TOON on tabular data with GPT-4.1-nano:

  • TOON: 47.5% accuracy (ranked 9th out of 12 formats)
  • JSON: 52.3% accuracy
  • Markdown-KV: 60.7% accuracy (best performer)

Even worse with nested data:

  • TOON: 43.1% accuracy (dead last)
  • JSON: 50.3% accuracy
  • YAML: 62.1% accuracy

Why Such Different Results?

After analysing both benchmarks, here's what I learned:

1. Data Structure Matters More Than Anything

  • TOON excels with simple, flat tabular data
  • TOON struggles badly with nested structures (orders with customer objects, complex hierarchies)
  • The official benchmarks were weighted toward TOON's strengths

2. Model Performance Varies Wildly

  • GPT-5-nano with TOON: 88.6% on official tests, but only 43.1% on nested data
  • Claude Haiku with TOON: 50.7%
  • Your model choice changes everything

3. Token Savings Are Real, But... The most efficient format (Markdown-KV at 60.7% accuracy) used 52,104 tokens—more than double TOON's 21,518 tokens. Being too compact might actually hurt comprehension.

What I Should Have Told You

TOON works well for:

  • Uniform tabular data (employee records, simple databases)
  • Simple field retrieval queries
  • GPT-5 and similar high-performance models
  • High-volume, low-stakes operations where token costs matter most

Stick with JSON for:

  • Complex nested structures (e-commerce orders, API responses)
  • Production systems where accuracy cannot be compromised
  • When using Claude or smaller models
  • Any workflow requiring validation or schema enforcement

What I Learned

LLMs have thousands of parameters working together. A single format change affects the entire system in ways we don't fully understand yet. What works in one scenario fails in another.

The real lesson: Test everything yourself. Don't trust the hype—including mine from three months ago.

TOON is an interesting specialised tool, not a universal JSON replacement. The 30-60% token savings are real, but they come with accuracy trade-offs you need to measure for your specific use case.


Sources:

Topics

toondata analysiscorrectioninsightsmistakelearningevidenceresearchrevisit

Keep reading

Go back to the archive and read more posts.

Browse all posts