0:00
/
0:00
Transcript

Next generation AI: inefficient Stargate?

... or why science leads to improved AI **without** wasting power
1

Stargate was announced at the white house as a plan to get AI in the USA to world domination. By spending $500 billion to build datacenters, electrical power plants and associated infrastructure, OpenAI with Oracle and SoftBank propose to get a stranglehold on intelligent machines in competition to the rest of the world.

But the technology to get there isn’t in today’s generative AI models: LLMs are fundamentally lacking and wasteful, causing the appearance of intelligence while using brute force computing to find statistical answers from previous work scraped from other’s articles.

It is hoped that coders working around LLMs will solve the problem, but the scale of what is needed to deal with the long tail of problems remains infinite — and just as impossible a task because the fundamentals of LLMs rely on lossy representation (word vectors).

Today’s video is a little long—it’s a talk about the topic covering a number of considerations—but it was important to talk through quickly what the problems are for this initiative. $500 billion investment in LLM technology is unlikley to lead to anything more than development on today’s exponential increase in computational burn, without solving the fundamental problems of the technology that led to ‘hallucinations’ or confabulations.

Industrial strength ‘making stuff up’ is not the panacea for today’s human goals with AI.

Many proposals use the words ‘artificial intelligence’ or AI that in theory suggests the word ‘intelligence.’ The name pretends to hold that a breakthrough in intelligence is possible, but just using statistics and not intelligence. Once the breakthrough in intelligence takes place (or if others believe an LLM is more than a human-engineered statistical engine) the rhetoric jumps to human level intelligence and then super-human level intelligence.

With word vectors at their core, there is no path to human-level intelligence from a wasteful LLM’s statistical knowledge base!

Science is needed, not engineering

In the video I compare LLMs based on Deep Learning statistics — with language systems based on Deep Symbolics, a symbolic system based on how brains work from Patom Theory.

In LLMs, the number of computations used to perform simple tasks in language is staggering. It uses exponential effort to convert vectors to roughly represent the meaning of words in order to do something with that approximation. Only the creation of GPUs dedicated to performing the computations make the system work in roughly real time, but in the end, the system is inaccurate and cannot be trusted. It factors out key concepts from the science of language like context as a human knows it, other pragmatics considerations, and meaning as a human knows it.

A word vector may hold hints at meaning, but it is irrelevant outside of the LLM that assigned it for comparative purposes.

This comparison model requires all knowledge to be associated as vectors. A banking application using an LLM gets bundled with the statistical vectors for astronomy, medicine, chemistry, geography and so on.

It’s a statistical model of a knowledge base. It tightly couples knowledge with language. It starts with tokens, rather than the building blocks of language to bypass a hard task. Its representation is lossy. All of that’s not good if the goal is human-like emulation.

If Deep Symbolics can perform the most expensive functions in real time without training and operate in real time without special hardware, wouldn’t that be a better investment? It’s an engineering completion against science.

Electrical Power Generation

To support these inefficient systems, unprecedented levels of power are required.

Although the systems don’t perform well against human-level capabilities, they do perform well against benchmarks. There may well be an unacceptable correlation between the ability for systems to pass tests and the funding of the tests (meaning tests may be contrived or shared for specific systems to do well, which is akin to cheating according to one of the Turing award winners in recent years).

Is it a government’s role to back the work of private industry? In the case of science, should the government throw its weight and money behind systems that are one of many choices to break through into the next generation of technology. Trying to pick winners is harder for government than industry!

For the price of $500 billion dollars, wouldn’t it be worth a review of alternative technologies before going all in on development of new energy to support a specific technology?

A 5-year View

Let’s say investors put up $500 billion to create massive datacenters to continue the path of centralized statistical AI. Would you bet on a series of claimed breakthroughs using similar technology that’s been available for decades? In the past, capital markets wouldn’t be keen to fund the creation of new technology that may not come to market. Instead they would insist on seeing market traction through sales and revenue. Only in few cases like the internet of the 2000s, the Bitcoin revolution in recent years and generative AI have the rules been relaxed.

I mean, when have companies spent $100M to train a model many times in the hope it can be applied to a problem?

Just split language from knowledge

Let’s conclude on this note. The deep learning model that is popular today in LLMs tightly couples knowledge—the specifics of what is writen—with language, how the words relate to each other. But the representation uses an encoded form called a word vector that can only be used to measure the similarity with other vectors.

By splitting the different and smaller task of recognizing language from the larger task of representing knowledge in context, a much cheaper model becomes possible. Rather than taking all the knowledge from all wreitten and other forms of communications as the knowledge base, store the relevant knowledge only. Then provide the details of how language works. Language is far more accurate that the word vector model and in conjunction with the right knowledge, can do anything in that scope.

And behind all of this is the fundamental scientific approach: (a) apply the best technologies to relevant problems versus (b) provide unlimited resources for a model that doesn’t work properly and hope necessary changes can be made. Put another way, build endless data centers and stock them with expansive computational devices or setup developers with cheap and effective tools for less to do a better job at many use cases.

It’s a choice.

Discussion about this video

User's avatar