One of the best advances of the 21st century has been bringing Artificial Intelligence into our homes. No longer just to help us ask ChatGPT what the height of a poodle is, but to use it for slightly more useful tasks (not much more) like helping me do my job better.

But in this post I'm not here to talk to you about automation, but about the quality of the data used to train the now-famous LLMs. If you don't know what an LLM is, I recommend watching this video from Dot CSV before continuing.

😡🤬 The Damn Emojis 🤯🚀

How many emojis have suddenly appeared in LinkedIn posts, Facebook, and other social networks from pseudo-content creators?

How many of them do you think are AI-generated?

Year	AI Impact	Notable Statistics
2024	Predictive AI	😭 used 761M times
2025	Automated insertion (AI)	Emoji commit adoption jumped 3x to 75%

Sources:

→Global daily usage: Sixth City Marketing
→2024 data: Meltwater - Top Emojis
→Commit data: Allstacks - Emoji Commit Index

And why does this matter? Because those automatic emojis are just the visible signal of a much deeper problem: AI-generated content that's becoming truth.

The New Pages of Truth

As I said, we're flooding the internet with AI-generated text that will be stored for years, and that will become the new truth.

Keep in mind that information tends to expire quickly, so new content is what's considered truthful because it hasn't expired yet.

So the AI-generated post that Jaime Jimenez Jerga created now becomes... 🥁 THE INFORMATION THAT AI WILL USE TO TRAIN ITSELF

AI Training on AI

You know the hall of mirrors game? Well, this is something similar:

→An AI generates the post for Mr. Wonderful
→Mr. Wonderful publishes it on LinkedIn, on Twitter, and even in his Good Morning message to his parents
→Google stores it and indexes it
→Other LLMs read it and interpret it as valid content

Can we call this digital autophagy?

Model Collapse

Everything I've said before has a real name, and it's Model Collapse, where models generate so much content that it reaches a point where it surpasses human-generated content, so LLMs train on their own data, leading to the catastrophic result of having 💩 results.

A 2024 Nature study demonstrated that after just 5 generations of training with synthetic data, models lost up to 50% of their ability to generate diverse content.

Sources:

→Global daily usage: Science Media Centre

The salvation of humankind

After discussing this topic with my colleague Mandip Gosal, who works with me at n8n, we concluded that the possible solution to this problem would be through spoken language.

In other words, we have not yet routinely introduced AI into our verbal interactions with other people, so transcripts of conversations could be the key to maintaining solid models that are free of digital noise.

HELP!

Two things you can do TODAY:

→Verify everything: Spend an extra 5 minutes checking that the data is correct.
→Contribute your unique experience: AI hasn't lived your experience, use it.

The problem isn't using AI, it's using it without critical thinking.

Are LLMs Becoming Idiots?

😡🤬 The Damn Emojis 🤯🚀

The New Pages of Truth

AI Training on AI

Model Collapse

The salvation of humankind

HELP!

Want More Like This?

Related Posts

Microsoft's "Humanist" Superintelligence: When Marketing Meets Existential Dread

Companies have discovered that AI costs money

Want More Like This?