I Made a Mistake About TOON (And Here's the Data That Proved Me Wrong)
Idir Ouhab Meskine
November 16, 2025

I rushed into discussing a topic without conducting proper research.
A few months ago, I read that TOON is the new way to talk to AI—efficient, fast, and easy to read all in one. Oh boy, I was naive thinking that one parameter is enough to switch completely from my beloved JSON.
Now I feel I cheated, and I hope you'll forgive me. More importantly, I want to show you the data that changed my mind.
Why I Got Excited About TOON
If you haven't already, check my previous blog post where I praised TOON for being an amazing format to save tokens (money). The promise was that using TOON would use 30-60% fewer tokens with the same or better accuracy.
That sounded like a no-brainer. Who wouldn't want that?
The Question That Started My Doubt
I proudly shared my blog post on LinkedIn. Then Till Simon asked a deceptively simple question: "Does the LLM respond with the same quality?"
That question haunted me. So I dug deeper.
What the Data Actually Shows
Here's where things get interesting—and complicated.
The Official TOON Benchmarks:
- →TOON: 68.7% accuracy with 4,389 tokens
- →JSON: 65.7% accuracy with 7,260 tokens
- →Token savings: 39.5%
Looks great, right? TOON wins on both metrics.
But Independent Testing Tells a Different Story:
When improvingagents.com tested TOON on tabular data with GPT-4.1-nano:
- →TOON: 47.5% accuracy (ranked 9th out of 12 formats)
- →JSON: 52.3% accuracy
- →Markdown-KV: 60.7% accuracy (best performer)
Even worse with nested data:
- →TOON: 43.1% accuracy (dead last)
- →JSON: 50.3% accuracy
- →YAML: 62.1% accuracy
Why Such Different Results?
After analysing both benchmarks, here's what I learned:
1. Data Structure Matters More Than Anything
- →TOON excels with simple, flat tabular data
- →TOON struggles badly with nested structures (orders with customer objects, complex hierarchies)
- →The official benchmarks were weighted toward TOON's strengths
2. Model Performance Varies Wildly
- →GPT-5-nano with TOON: 88.6% on official tests, but only 43.1% on nested data
- →Claude Haiku with TOON: 50.7%
- →Your model choice changes everything
3. Token Savings Are Real, But... The most efficient format (Markdown-KV at 60.7% accuracy) used 52,104 tokens—more than double TOON's 21,518 tokens. Being too compact might actually hurt comprehension.
What I Should Have Told You
TOON works well for:
- →Uniform tabular data (employee records, simple databases)
- →Simple field retrieval queries
- →GPT-5 and similar high-performance models
- →High-volume, low-stakes operations where token costs matter most
Stick with JSON for:
- →Complex nested structures (e-commerce orders, API responses)
- →Production systems where accuracy cannot be compromised
- →When using Claude or smaller models
- →Any workflow requiring validation or schema enforcement
What I Learned
LLMs have thousands of parameters working together. A single format change affects the entire system in ways we don't fully understand yet. What works in one scenario fails in another.
The real lesson: Test everything yourself. Don't trust the hype—including mine from three months ago.
TOON is an interesting specialised tool, not a universal JSON replacement. The 30-60% token savings are real, but they come with accuracy trade-offs you need to measure for your specific use case.
Sources:
Want More Like This?
Get daily AI news and insights delivered straight to your inbox. Join thousands of professionals staying ahead of the curve.
Subscribe to Newsletter✓ No spam, unsubscribe anytime
Tags