OpenAI as we knew it is dead
Mass automation has happened before, at the start of the Industrial Revolution, and some people sincerely expect that in the long run it’ll be a good thing for society. (My take: That really, really depends on whether we have a plan to maintain democratic accountability and adequate oversight, and to share the benefits of the alarming new sci-fi world. Right now, we absolutely don’t have that, so I’m not cheering the prospect of being automated.)
But even if you’re more excited about automation than I am, “we will replace all office work with AIs” — which is fairly widely understood to be OpenAI’s business model — is an absurd plan to spin as a jobs program. But then, a $500 billion investment to eliminate countless jobs probably wouldn’t get President Donald Trump’s imprimatur, as Stargate has.
DeepSeek may have figured out reinforcement on AI feedback
The other huge story of this week was DeepSeek r1, a new release from the Chinese AI startup DeepSeek, that the company advertises as a rival to OpenAI’s o1. What makes r1 a big deal is less the economic implications and more the technical ones.
To teach AI systems to give good answers, we rate the answers they give us, and train them to home in on the ones we rate highly. This is “reinforcement learning from human feedback” (RLHF), and it has been the main approach to training modern LLMs since an OpenAI team got it working. (The process is described in this 2019 paper.)
But RLHF is not how we got the ultra superhuman AI games program AlphaZero. That was trained using a different strategy, based on self-play: the AI was able to invent new puzzles for itself, solve them, learn from the solution, and improve from there.
This strategy is particularly useful for teaching a model how to do quickly anything it can do expensively and slowly. AlphaZero could slowly and time-intensively consider lots of different policies, figure out which one is best, and then learn from the best solution. It is this kind of self-play that made it possible for AlphaZero to vastly improve on previous game engines.
So, of course, labs have been trying to figure out something similar for large language models. The basic idea is simple: you let a model consider a question for a long time, potentially using lots of expensive computation. Then you train it on the answer it eventually found, trying to produce a model that can get the same result more cheaply.
But until now, “major labs were not seeming to be having much success with this sort of self-improving RL,” machine learning engineer Peter Schmidt-Nielsen wrote in an explanation of DeepSeek r1’s technical significance. What has engineers so impressed with (and so alarmed by) r1 is that the team seems to have made significant progress using that technique.
This would mean that AI systems can be taught to rapidly and cheaply do anything they know how to slowly and expensively do — which would make for some of the fast and shocking improvements in capabilities that the world witnessed with AlphaZero, only in areas of the economy far more important than playing games.
One other notable fact here: these advances are coming from a Chinese AI company. Given that US AI companies are not shy about using the threat of Chinese AI dominance to push their interests — and given that there really is a geopolitical race around this technology — that says a lot about how fast China may be catching up.
It’s still January
A lot of people I know are sick of hearing about AI. They’re sick of AI slop in their newsfeeds and AI products that are worse than humans but dirt cheap, and they aren’t exactly rooting for OpenAI (or anyone else) to become the world’s first trillionaires by automating entire industries.
But I think that in 2025, AI is really going to matter — not because of whether these powerful systems get developed, which at this point looks well underway, but for whether society is ready to stand up and insist that it’s done responsibly.
When AI systems start acting independently and committing serious crimes (all of the major labs are working on “agents” that can act independently right now), will we hold their creators accountable? If OpenAI makes a laughably low offer to its nonprofit entity in its transition to fully for-profit status, will the government step in to enforce nonprofit law?
A lot of these decisions will be made in 2025, and the stakes are very high. If AI makes you uneasy, that’s a lot more reason to demand action than it is a reason to tune out.
Please first to comment
Related Post
Stay Connected
Tweets by elonmuskTo get the latest tweets please make sure you are logged in on X on this browser.
Sponsored
Popular Post
tesla Model 3 Owner Nearly Stung With $1,700 Bill For Windshield Crack After Delivery
33 ViewsDec 28 ,2024
Middle-Aged Dentist Bought a Tesla Cybertruck, Now He Gets All the Attention He Wanted
32 ViewsNov 23 ,2024