DeepSeek V4 is here, and it's actually interesting

DeepSeek just dropped V4, their first major model since R1 shook the industry back in January 2025. And yeah, the hype is real — though maybe not in the way you’d expect.

R1 was a bombshell. Trained on limited compute, it matched or beat far more expensive models, and suddenly everyone knew DeepSeek’s name. Since then, the company has been quiet. Personnel departures, launch delays, government scrutiny — you name it. So V4 arriving now feels like a statement: “We’re still here, and we’re still competitive.”

Let’s get the big stuff out of the way. V4 comes in two flavors: Pro and Flash. Pro is the heavy lifter, built for coding and complex agent tasks. Flash is the lean, fast, cheap version. Both are open-source, both handle up to 1 million tokens of context — that’s roughly three full-length novels in one go — and both are available on DeepSeek’s site, app, and API right now.

Pricing is where things get spicy. V4-Pro runs $1.74 per million input tokens and $3.48 per million output. Compare that to OpenAI’s GPT-5.4 or Anthropic’s Claude-Opus-4.6, which charge 10x or more. V4-Flash is almost comically cheap at $0.14 input and $0.28 output per million tokens. That’s not just competitive — that’s disruptive for anyone building applications on top of these models.

On benchmarks, V4-Pro matches Claude-Opus-4.6, GPT-5.4, and Gemini-3.1 on major tests. It beats open-source rivals like Alibaba’s Qwen-3.5 and Z.ai’s GLM-5.1 on coding, math, and STEM. DeepSeek also ran an internal survey of 85 experienced developers — over 90% put V4-Pro in their top three for coding tasks. I’d normally take internal surveys with a grain of salt, but the public benchmark data backs it up.

What’s genuinely interesting is the architecture. V4 uses what DeepSeek calls “multi-head latent attention with dynamic sparsity” — which is a mouthful, but basically means it’s smarter about memory. Instead of loading every token into full attention, it compresses and prunes irrelevant parts on the fly. This is why it can handle 1 million tokens without the quadratic cost blowup that plagues most transformers. It’s not entirely new — sparse attention has been tried before — but DeepSeek seems to have made it work at scale without sacrificing quality.

V4 also introduces a “flash” mode that’s basically a distilled version of the Pro model. It’s faster, cheaper, and still surprisingly capable. For developers who don’t need maximum accuracy on every query, this is a godsend. It’s optimized for agent frameworks like Claude Code, OpenClaw, and CodeBuddy, which suggests DeepSeek is thinking hard about real-world deployment, not just benchmark chasing.

So will V4 shake things up the way R1 did? Probably not. R1 was a paradigm shift — a proof that open-source could compete with the best closed models on a shoestring budget. V4 is more of an evolution: better, faster, cheaper, but not a new paradigm. That said, it matters for three reasons.

First, it keeps the pressure on OpenAI, Anthropic, and Google. Every time DeepSeek releases a model that’s 90% as good at 10% the cost, the incumbents have to justify their pricing. That’s good for everyone who isn’t a shareholder in those companies.

Second, it shows that China’s AI ecosystem isn’t just hype. Despite all the noise — export controls, talent flight, political pressure — DeepSeek is still shipping cutting-edge research. That’s a signal that the US’s attempts to slow China’s AI progress aren’t working as well as planned.

Third, the open-source angle. V4 is fully downloadable, modifiable, and usable. That means startups, researchers, and even hobbyists can run frontier-level models on their own hardware. No API keys, no rate limits, no surprise bills. That’s how innovation happens, not by locking everything behind paywalls.

I’ve been using V4-Flash for a few days now for code generation and light agent tasks. It’s fast — like, noticeably faster than GPT-5.4 on similar prompts. The reasoning mode is solid, though it occasionally gets lost on multi-step problems with ambiguous instructions. For straightforward coding tasks, it’s excellent. For complex planning, I’d still reach for Claude.

Is V4 perfect? No. The documentation is sparse in places, the open-source community is still catching up on tooling, and DeepSeek’s track record on uptime and API reliability isn’t as polished as the big players. But for the price and openness, it’s hard to complain.

If you’re building AI applications and haven’t tried DeepSeek V4 yet, you’re leaving money on the table. Go spin up a Flash instance — it costs pennies — and see for yourself. I think you’ll be surprised.

DeepSeek V4 is here, and it’s actually interesting

Comments (0)