NextGenAI

Jan 29

5 min read

Introduction: The Next Evolution of GenAI

Artificial Intelligence is not new but 2025 is going to be the most important year yet and may even be a watershed moment. AGI and even ASI is now starting to be realistically talked about and with the new reasoning models, optimised transformers, agentic AI and diffusuion models converging this year we are in the midst of a paradigm shift. I remember GPT 2 in 2019 where it could barely string a sentence together without turning into gibberish to today’s GPT versions that can pass university-level exams, write code, and even get into philosophical debates on sentience. But as powerful as they are, today’s AI models still stumble over key challenges — rigid tokenisation, lack of true contextual output and inefficiencies in using compute resources.

Enter the next wave of AI architectures: DeepSeek-R1’s reinforcement learning efficiency, Hierarchical Transformers’ tokeniser-free approach and Google’s post-transformer memory architectures. Also, let’s not forget emerging wildcards like Liquid AI, which hint at a future where AI continuously learns and adapts in real time. All of that is before we even get to QuantumML!

So, where are we headed? Are we on the brink of AI models that think, plan, and adapt not just like humans but even better? Let’s dive in.

Key AI Architectures: What’s New and Why It Matter

1. DeepSeek-R1: Reasoning, No Humans Required

If traditional AI models are like students cramming for exams with massive textbooks, DeepSeek-R1 is the genius who figures things out through trial and error — without ever looking at the answer key.

Instead of learning from human-labeled data, DeepSeek-R1 uses pure reinforcement learning (RL). This means it gets better at math, coding, and logical reasoning by constantly testing itself and adjusting accordingly. It’s the equivalent of a chess player improving by playing millions of games against itself rather than studying grandmaster strategies. Unironically, the comparison to DeepMinds AlphaZero is highly relevant.

Strengths:

Matches GPT-4-level reasoning on complex problems.
Distills its intelligence into much smaller models in the more efficient Mixture-of-Experts (MoE) architecture, meaning it can outperform massive models while using fewer resources.
Open Source and cheap, like very very cheap!

Weaknesses:

Struggles with making its answers sound natural and readable.
Can mix languages in ways that don’t always make sense.
Not multi-modal. Yet!

2. Hierarchical Transformers: Breaking Free from Tokens

Most AI models today break text into subwords using tokens, which works well — until it doesn’t. Ever noticed how AI sometimes butchers names, misspells words, or struggles with languages that don’t have clear spaces between words (like Chinese or Hindi)? How about losing detail and context the longer a conversation gets? That’s the tokeniser’s fault.

Hierarchical Transformers ditch tokenisation altogether, processing text at both the byte and word levels. Imagine reading a sentence where you can both see the full words and analyse each letter at the same time — your brain would be more robust to spelling mistakes and multilingual shifts. That’s the advantage these models bring but it comes at cost.

Strengths:

Trains up to twice as fast on new languages compared to traditional tokenised models.
Significantly better at multi-language text generation than current transformers.

Weaknesses:

Requires more parameters, making it bulkier and costlier.
Struggles with some non-English structured languages.

3. AlphaZero & MuZero: The AI Grandmasters of Planning

AlphaZero and MuZero are the brains behind AI’s domination of chess, Go, and even some video games. These models use Monte Carlo Tree Search (MCTS), which means they simulate multiple possible futures before making a move, like a chess grandmaster thinking five steps ahead.

But here’s the catch: while they excel at structured tasks (like protein-folding), they struggle with the messy, unpredictable nature of language and open-ended real-world reasoning.

Strengths:

Superhuman strategic planning.
Learns purely through self-fine tuning, no humans required.

Weaknesses:

Not designed for open-ended tasks that are typical of LLMs.

4. Liquid Networks: AI That Thinks Like a Brain?

Most AI models today are like frozen brains — you train them once, and they don’t evolve much afterward. Liquid Networks, on the other hand, behave more like human neurons, continuously adapting to new data without needing to retrain the foundational models. They’re still in the ealy phase and on the surface are less capable than the reasoning models of today, but they hint at a future where AI could adjust in real time, making them perfect for IoT devices and real-time assistants and opens up a new paradigm of emergent behaviour from our LLMs.

Strengths:

Adapts continuously to streaming data.
Efficient for time-series tasks like financial modeling or sensor monitoring.

Weaknesses:

Not yet practical for mainstream AI applications like language processing.

Where AI is Headed: Three Big Predictions

1. MCTS-Like Planning for AI Reasoning

Today’s AI models are great at following a step-by-step logic process, but they don’t plan ahead like a human would. What if we combined the strategic depth of AlphaZero with the reasoning skills of DeepSeek-R1? AI could start considering multiple paths before deciding on the best answer, like a math tutor figuring out where a student might get stuck before they even reach the problem. You can do this through clever prompt engineering but it's not leveraging the benefits of a true MoE architecture at the conceptual level.

2. Hybrid Rewards: Teaching AI to Be Both Accurate and Creative

Right now, AI models are trained using rigid reward structures (e.g., “this is a correct answer; this is incorrect”). But what if we blended this with more flexible rewards based on human preference or even more abstract concepts of value? An AI writer could be trained to balance strict grammar rules but be creative and break out of the rules with the stylistic flair of Hemingway or Shakespeare, depending on the context.

3. True Zero-Shot Learning: No More Training Wheels

Today’s AI models still rely on fine-tuning for specific tasks. The dream? AI that generalises across domains without needing any additional tweaking or even prompting — an AI that reads about quantum mechanics for the first time and, similar to Richard Fenyman’s view of inference, apply that knowledge to reason across something completely unrelated.

Conclusion: AI’s Next Chapter is Unpredictably Important

AI is evolving at a pace that feels like science fiction unfolding in real time. The innovations in reasoning, adaptability and agency suggest that the next breakthroughs won’t just be about bigger models — they’ll be about smarter models. Whether the future belongs to reinforcement learning, tokenisation-free architectures or real-time adaptive AI, one thing is clear: we’re still only scratching the surface of what’s possible.

So, what do you think? Will AI’s future be shaped by optimising transformers, real-time neural network adaptability or something completely unexpected? The only certainty is that the journey will be anything but boring.