I Made a Mistake About TOON (And Here's the Data That Proved Me Wrong)

I rushed into discussing a topic without conducting proper research.

A few months ago, I read that TOON is the new way to talk to AI—efficient, fast, and easy to read all in one. Oh boy, I was naive thinking that one parameter is enough to switch completely from my beloved JSON.

Now I feel I cheated, and I hope you'll forgive me. More importantly, I want to show you the data that changed my mind.

Why I Got Excited About TOON

If you haven't already, check my previous blog post where I praised TOON for being an amazing format to save tokens (money). The promise was that using TOON would use 30-60% fewer tokens with the same or better accuracy.

That sounded like a no-brainer. Who wouldn't want that?

The Question That Started My Doubt

I proudly shared my blog post on LinkedIn. Then Till Simon asked a deceptively simple question: "Does the LLM respond with the same quality?"

That question haunted me. So I dug deeper.

What the Data Actually Shows

Here's where things get interesting—and complicated.

The Official TOON Benchmarks:

→TOON: 68.7% accuracy with 4,389 tokens
→JSON: 65.7% accuracy with 7,260 tokens
→Token savings: 39.5%

Looks great, right? TOON wins on both metrics.

But Independent Testing Tells a Different Story:

When improvingagents.com tested TOON on tabular data with GPT-4.1-nano:

→TOON: 47.5% accuracy (ranked 9th out of 12 formats)
→JSON: 52.3% accuracy
→Markdown-KV: 60.7% accuracy (best performer)

Even worse with nested data:

→TOON: 43.1% accuracy (dead last)
→JSON: 50.3% accuracy
→YAML: 62.1% accuracy

Why Such Different Results?

After analysing both benchmarks, here's what I learned:

1. Data Structure Matters More Than Anything

→TOON excels with simple, flat tabular data
→TOON struggles badly with nested structures (orders with customer objects, complex hierarchies)
→The official benchmarks were weighted toward TOON's strengths

2. Model Performance Varies Wildly

→GPT-5-nano with TOON: 88.6% on official tests, but only 43.1% on nested data
→Claude Haiku with TOON: 50.7%
→Your model choice changes everything

3. Token Savings Are Real, But... The most efficient format (Markdown-KV at 60.7% accuracy) used 52,104 tokens—more than double TOON's 21,518 tokens. Being too compact might actually hurt comprehension.

What I Should Have Told You

TOON works well for:

→Uniform tabular data (employee records, simple databases)
→Simple field retrieval queries
→GPT-5 and similar high-performance models
→High-volume, low-stakes operations where token costs matter most

Stick with JSON for:

→Complex nested structures (e-commerce orders, API responses)
→Production systems where accuracy cannot be compromised
→When using Claude or smaller models
→Any workflow requiring validation or schema enforcement

What I Learned

LLMs have thousands of parameters working together. A single format change affects the entire system in ways we don't fully understand yet. What works in one scenario fails in another.

The real lesson: Test everything yourself. Don't trust the hype—including mine from three months ago.

TOON is an interesting specialised tool, not a universal JSON replacement. The 30-60% token savings are real, but they come with accuracy trade-offs you need to measure for your specific use case.

Sources: