At the end of my last post I mentioned the release of a reasoning model, Deep Seek R1, and mentioned that it might be a big deal. As it turns out, many people agree. On Monday, the New York Times DealBook newsletter said "the Chinese A.I. sensation...is shaking the technology industry to its core." The entire framing of this newsletter was strange to me, because, if you didn't know much about tech, you would be forgiven for being uncertain whether the release of this model was actually a good thing in pure technological terms (i.e., setting aside the geopolitics).
I do not aspire to be the person you read for reactions to breaking tech industry news. Personally, I read Ben Thompson at Stratechery for that, and his DeepSeek FAQ is very good. However, I don't have another post planned for this week, so I thought I'd record a couple of thoughts explaining why I think this is a good sign for development of better AI tools.
To be clear, DeepSeek is not new to the nerds; it released a series of canonical LLMs (the latest is called V3) last year that, by all accounts, are very good. The coverage now is related to DeepSeek R1, which is a "reasoning model" that performs just as well as Open AI's o1—discussed last week—in terms of performance. DeepSeek is "different" from o1 in key three ways:
The model is "open source" (or, more precisely, "open weight");
It's around 30x less expensive to use;
It was made by a Chinese company.
Factor No. 3 is important for many reasons, but I have nothing original to add. Big picture, the Biden administration invested plenty of effort into preventing Chinese companies from obtaining AI-related hardware and intellectual property. This effort included placing export restrictions advanced chips from US companies like Nvidia. It also included paying diplomatic capital to restrict exports of products needed to make these chips from U.S. allies, like specialized lithography machines made by exactly one company, ASML, headquartered in the Netherlands. The DeepSeek releases suggests these efforts didn't work, although some people say that it's too early to evaluate success.
Others have noted that the model appears to self-censor when asked about topics like Taiwan or Tiananmen Square, and that the terms of service for using the app itself (not the underlying model) are concerning from a privacy standpoint. I have not actually used DeepSeek yet, but those reports are important and should be acknowledged.
Some brief thoughts related to 1 and 2:
Open source/open weight platforms
I realize many people reading this might not understand what "open source" means practically. There are vigorous line-drawing debates, but in principle i) software creators have a copyright in their software, ii) by distributing software under an "open source" license, creators give others the right to use and modify the source code as they please. There are many different such licenses; DeepSeek releases their models under the MIT license, one of the most permissive.
In practice, DeepSeek—as well as Meta and Mistral, the other big players in the open source game right now—are not releasing the source code, they are just releasing the parameters (i.e., weights) used to train the model, which anyone can download and modify. There is a big fight, which I won't get into, about whether this can truly be called open source, but to be precise it's probably better to call these "open weight" models.
To understand the impact of the closed v. open distinction, let's say you work in IT at a law firm and the partners want to create an in-house chatbot for lawyers to use. Assuming your firm is not in a position to train its own LLM from scratch—which is a safe assumption—you need to acquire your model somewhere. Every major company with such a “foundation model”, whether closed or open, let's you access it through an "application programming interface" (API), which provides back-end access to the model so that a developer can write a new program around it in a popular programming language like Python or JavaScript. If you decide to build your firm chatbot using a closed source tool like OpenAI (or Anthropic, or Google), you are tied to its API. This means that you have to pay OpenAI on a "per token" basis (the more it's used, the more you pay), and limited to whatever functionality OpenAI provides through its API. If you pick an open-weight, you can use their API, but you can also just download the model for free.
Open weight has two main advantages:
You can customize the model, rather than being limited to whatever the API allows you to do.
You also aren't paying another company on a rate basis every time the software is used.
You might still use a closed source API as it will be easier to use out of the box and you don't need to figure out how to host and run the model. But in the long run you will save more money and have more latitude if you make the up-front investment in the open weight version.
The average person experiences "AI" right now as a website or app like ChatGPT: a product. But there is a growing software development ecosystem developing that quickly adjusts to take advantage of breakneck developments in AI research; it's not an exaggeration to say that there are 15-20 research papers released every week that report meaningful developments. And the technology is increasingly available at smaller scale: You can currently download very good models from Meta to run locally on your computer or smartphone for free. As the LLMs get better and easier to use across devices, you're not going to see more ChatGPT competitors, you're going to see more products that use the LLM as a platform on which software gets built.
For these reasons, it's a safe bet that open weight (or open source, if we get there) LLMs will power most of the AI software we use in 10-20 years.1 It makes sense economically, and it's been the pattern with other technology platforms. To pick two major examples:
Most web servers—i.e., the computers that host the websites you visit—are configured using open-source software from either Nginx or Apache.
Although the iPhone, which uses Apple's proprietary operating system, is big in the United States, the open-source Android operating system from Google powers the vast majority of smart phones worldwide. Phone makers and app developers can just download the software and customize it for their hardware or their app. (Google makes its money largely by taking a cut from paid app revenue.)
The cost
When I say that DeepSeek is 30x cheaper, I'm specifically saying that the cost of using their API is 30x cheaper; the downloads are free. The reason it's so cheap is covered well in the Stratechery article, but it comes down to i) applying some recent advances in AI research to develop a much more efficient architecture for training; ii) ironically, performing some clever (very complicated) hacks to get much better performance out of the lower-grade chips they were stuck using because of the U.S. export controls. This allowed them to train a model that competes with o1 for a substantially lower cost.
The ability to do what OpenAI (and others) have done but much more cheaply is what drives the apocalyptic tone over at DealBook:
DeepSeek is forcing a reckoning in Silicon Valley. The company’s models appear to rival those from OpenAI, Google and Meta, despite the U.S. government’s efforts to limit China’s access to leading-edge A.I. technology. And DeepSeek says it did all this with a fraction of the resources that American competitors use.
Over the weekend, DeepSeek shot to the top of Apple’s App Store charts, rivaling ChatGPT. And DeepSeek is drastically undercutting OpenAI on price.
That raises a number of questions:
Do leading A.I. companies like Google, Meta and the privately held OpenAI and Anthropic deserve their astronomical valuations?
Do companies need to spend hundreds of billions on vast data centers powered by hugely expensive chips from Nvidia and others? Consider that OpenAI and its partners have promised to spend at least $100 billion on their Stargate project, or that Microsoft said it will spend $80 billion, or Meta $65 billion.
Does America need the huge uptick in electricity generation that has fueled a run-up in utility stocks?
And yes, if we focus narrowly on the performance of major tech companies in the American stock market, there are "winners and losers." But if we bracket the geopolitical questions and focus on this strictly from a competition perspective, we’re all winners. By releasing a state-of-the-art, low-cost platform on which to build AI products, the barrier to entry for any startup is now lower, and it forces all the major AI labs to compete at that reduced price point while facing a new competitor for what counts as "state-of-the-art." And from the standpoint of the environmental and energy costs associated with model training, this is good news—cheaper models are cheaper because they use less energy.
To be clear, “most” of it does not mean all of it. Apple obviously does well for itself with a closed source smart phone operating system. To write this blog, I decided to forgo the widely used open source blogging software (WordPress) in favor of a closed source platform.
For both Apple and Substack, they have a popular closed source product because they are providing a value-add for people who want something that “just works.” I suspect there will be a robust closed source LLM market as well, for this same reason.