Notes on AGI
March 14, 2025
It takes centuries, but the time comes when humanity is driven to sacrifice natural evolution for the accelerated revolution of the entire planet; AI has led us to believe that we are getting closer to that moment of singularity when we finally overcome the gravity of slow natural evolution and leapfrog into the next generation.
We stand at the inflection point between epochs, watching gradient descent discover solutions no human ever imagined. The algorithms capture patterns we never noticed and implement strategies we couldn’t conceive. What emerges isn’t just computational power but a different kind of intelligence altogether—one that might finally break the chains of our biological inheritance and usher in that next generation, for better or worse.
We speak less of artificial general intelligence now and more of robust systems with practical capabilities beyond human benchmarks. Brighter than Nobel laureates across fields, working at speeds ten to a hundred times faster, with millions of instances collaborating invisibly across data centers. Not magic, but still revolutionary—bound only by physical limits, experimental latencies, and human institutions that move glacially by comparison.
The return on intelligence becomes the key question: what factors limit even superintelligent systems? The physical world moves at fixed speeds. Cells divide on their schedule. Data remains scarce in crucial domains. Humans resist change through bureaucracies and fears. But gradually these constraints themselves become targets, problems to optimize away.
Scaling laws reveal themselves like ancient truths: more data, bigger networks, more computing—the trinity that builds intelligence. We don’t know why it works so well, why directions in vector space encode meaning, and why features form universally across different architectures—every day, the models climb upward, solving tasks that seemed impossible just months before.
‘AGI’ is a term that defies scaling laws—based on neural networks, a valuable way to think about neural networks (NN) is that we don’t program them; we grow them. We have these neural network architectures that we design and create loss objectives. The NN architecture is like a scaffold on which the circuits grow—it starts with random things, grows, and almost like the purpose we train for is this light—we create the scaffold that it grows on, and we create the light that it grows towards—but the thing that we make is this almost biological entity or organism that we’re studying.
Let's talk about ‘AGI’—clearly, such an entity would be capable of solving very difficult problems very fast, but it is not trivial to figure out how fast. Two “extreme” positions both seem false—first, you might think that the world would be instantly transformed on the scale of seconds or days (“the singularity”), as superior intelligence builds on itself and solves every possible scientific, engineering, and operational task almost immediately.
When will AGI happen—I think there are still worlds where it doesn’t happen in 100 years. The number of those worlds is rapidly decreasing—we are rapidly running out of truly convincing blockers and truly compelling reasons why this will not happen in the next few years. Scale-up is very quick—we do this today, we make a model, and then we deploy thousands, maybe tens of thousands of instances of it—I think by the time, certainly within two to three years, whether we have these super-powerful AIs or not, clusters are going to get to the size where you’ll be able to deploy millions of these.