Skip to Main Content

Three Observations on AI Developments at the Start of 2026

January 7, 2026

The three most brain-tickling bits of news or commentary on AI I’ve come across recently all challenge the “race to AGI between the hyperscalers” consensus on AI investment, but in potentially quite different ways. Two are based on relatively “hard” research publications, and one is based more on “vibes.”

  1. Models that are trained for anything at all may have a great deal of underlying similarity to one another, regardless of what they’re trained for.
  2. Open Weight Models are proliferating at both the higher end (largely out of China) and at the low end (two million specialized models at Hugging Face).
  3. The core logical premise of “singularity” seems to be losing consensus even among the technorati.

I. Model Universality

Researchers out of Johns Hopkins University published a paper this month that examines hundreds of models trained for different tasks and discovered that there are deep mathematical similarities. The paper proposes the somewhat grandiose term “Universal Weight Subspace Hypothesis.”

The math is admittedly beyond this author. But a more common mathematical approach known as Principal Components Analysis (PCA) might be more familiar to the reader. PCA in a nutshell can take in a huge number of somewhat predictive variables, and using math magic, derive the few underlying “principal components” that contain most of the useful information.

(For example, input variables about a person might include height, weight, hat size, and shoe size. These might all be much more efficiently described as a “size” component in a model.)

What is striking about the JHU paper is that it doesn’t appear to matter what the model is trained on (as long as it’s a “good” model), it will have much of the same underlying principal components in its structure.

That result suggests that there could be at least two huge efficiency gains: one in terms of shrinking the resources needed to run models (which today are bloated with all of these common underlying components, plus their unique capabilities), and the second in dramatically reducing the training overhead. It would be like the difference between giving a high school graduate 10 weeks of vocational training versus starting by teaching an infant to speak.

If either of these huge efficiencies is realized, it likely doesn’t so much accelerate the very top end (“frontier models”) as it does level the playing field, and makes it much easier for smaller players to catch up and rival the state of the art.

II. Two Million Open Models

The tech press breathlessly reports on the “horse race” between point-version benchmarks of ChatGPT, Claude, sometimes Gemini, and occasionally Grok or other also-rans among the majors. Old-timers who remember the pre-SaaS era might wonder if this is like the showdowns between AMD and Pentium chip models by MHz, or PC Magazine comparisons of WordPerfect and Microsoft Word point versions.

It’s worth considering that the press approach to reporting on these might owe more to the structure and “deep learning” that resides within the practice of trade journalism than it does to the structure of the actual AI models. After all, if you’re printing Wine Spectator or Car & Driver, you need a horse race to make some readable content about your comparison of California cabernet producers or Ford vs. GM three-row family haulers. Over and over. And of course, manufacturers and their PR firms are all too happy to oblige, in order to get their “earned media” coverage that month.

What may be more interesting, if harder to write about in the trade rag’s accustomed paradigm, is the fact that there are two million open-weight models available for download today on Hugging Face (the “GitHub” of AI models). Many of these come from the hyperscalers. But many more are derivatives or independent work that focus in on particular use cases: specializing in a little-spoken language, for example, or optimized for performance in a constrained environment.

A recent paper from Stanford researchers calls out an aspect of this broader population of model builders: the high level of participation from mainland China.

What is really interesting here, though, is what happens if the major gains in efficiency from Model Universality (above) serve to supercharge the already impressive creative ferment that’s resulted in the millions of models. At that point, it’s no longer an AMD-vs-Intel slugfest among behemoths, it’s much more like the early days of the Apple App Store, where the new breakout winner could appear overnight from a virtual unknown. And, that previously unknown company very likely will be Chinese if patterns hold.

III. Questioning the “Singularity”

Readers who are acquainted with the West Coast, and specifically Bay Area, technologist and futurist culture will recognize the fervor around this odd noun, “Singularity.” For those for whom the idea is less familiar, the reasoning goes like this: One of the activities of human intelligence is producing tools that have human intelligence-like capabilities. And one of the outputs of using such tools is to improve upon humans’ tool-building. At the moment that those tools surpass humans’ own capabilities at producing ever more capable such tools (AI codes a new AI better than itself) an exponential “explosion” will take off such that the intelligent tools self-improve at an ever-increasing rate. In the quasi-religious extreme version of this theory, all of history will be different after that singular moment of intelligence increase; hence, it is the “Singular” moment of the rest of history (capital-S).

(If that sounds to you like an echo of a religious just-so story – either of the beginning or end of the world – you’re not alone.)

At a strictly material level, there is a question of whether scaling of the abilities of AI models can continue to follow the path they have so far, in terms of training data as well as infrastructure.

The availability of training data has widely been seen as the bottleneck. Companies have been hiring people to train AIs for years, and in the last couple of years, “synthetic” training datasets have become more common. (I suspect that this will work a little longer, until all of the novel information content has been squeezed out of the original training data. But there’s no fooling Claude Shannon on this one.)

Infrastructure, being energy and chips, will continue to be required in greater degree. Energy is not trivial but is a well-studied problem. The ability of chips to continue to scale in the way required is in some question, however. Even the fabled Moore’s Law describing the exponential growth of CPU capacity since the 1960s seemed to peter out by the 2010s and go sigmoidal. GPU-derived AI chips have different dimensions along which to improve, but will still ultimately be subject to the laws of physics as to what exactly can be squeezed onto silicon. Eventually, doublings of capacity will slow.

But the larger question is not one of the inputs to the self-improvement loop, it is about the assumption underlying the loop itself: does being “more intelligent” necessarily accelerate the creation of subsequently more intelligent machines? Put another way, even if a “scaling law” can be fed by adding resources like data, chips, and energy, will intelligence alone be able to continue advancing itself recursively? It’s not obvious that the answer here is “yes.”

A related but not identical point is the fervor with which proponents speak about “AGI,” or Artificial General Intelligence. This means, more or less, the development of an AI at or above human-level intelligence in practically “all” areas of endeavor. In the straw man version of the Singularity given above, developing an AGI is definitionally the moment of singularity.

However, it may well be that even with AGI, recursive self-improvement is not a given. That is, you can have AGI without Singularity. It may simply be that self-improvement is not part of what an AGI is particularly good at. An AGI might be at human level in many ways, and far beyond human level in many other ways (e.g., the ability to rapidly “learn” a corpus of information), without necessarily being able to apply that to the specific problem of tuning the next generation of neural network in a useful way. For example, I am confident that if I sit down with textbooks and flash cards to study for two hours tonight, I’ll be a little smarter tomorrow. But, I am not confident that tomorrow’s smarter me will be willing to sit down for four hours tomorrow and eight hours the next day.

That also brings us to another problem of what “self-improvement” really means beyond AGI. So far, this AI boom has given somewhat short shrift to the question of what are the training goals, instead relying on the implicit notion that generative AIs should produce outputs that are, like their inputs, sensical and comprehensible to humans and that resemble the best among their training data. In other words, a generated photo of a butterfly should resemble as closely as possible the very best and highest resolution photos of butterflies previously taken. But then what?

What does it mean for an image-based generative AI to create a butterfly image that is far, far better than any actual such image taken in reality? What does it mean for an LLM to produce a Ph.D. thesis in philosophy that is 10x more advanced than any human philosopher could even comprehend? How would the self-improving machine even be able to know that its next generation was an improvement on the prior one, and not a left turn out into artificial madness?

I submit that a major portion (at least, say, 10%, and maybe 50% or more) of the speculation on AI is in effect a bet on the Singularity, and a post-AGI world in which the owners of the AGI capabilities effectively own the entire economy. It’s effectively a version of Pascal’s wager, where even a tiny probability of the existence of eternal Heaven or Hell is enough to convince a non-believer to profess faith. If the belief in a possible, even if improbable, calamitous and rapturous Singularity begins to falter, a major part of the urgency underpinning AI investment will disappear. For those of us who have pinned our AI hopes on rather more concrete, if more pedestrian, improvements in actual workflows and outcomes, that might just be for the best.

Randall Lucas

Managing Director, SaaS Capital

SaaS Capital® pioneered alternative lending to SaaS. Since 2007 we have spoken to thousands of companies, reviewed hundreds of financials, and funded 100+ companies. We can make quick decisions. The typical time from first “hello” to funding is just 5 weeks. Learn more about our philosophy.

Our Approach

Who Is SaaS Capital?

SaaS Capital® is the leading provider of long-term Credit Facilities to SaaS companies.

Read More

Subscribe

Get SaaS Capital® research delivered to your inbox.