Goodfire’s Silico Lets You Tweak LLMs Like a Car Engine—Mostly

Goodfire’s Silico Lets You Tweak LLMs Like a Car Engine—Mostly

1 0 0

Goodfire, a San Francisco startup, just dropped Silico—a tool that claims to let you peek inside an LLM and actually fiddle with its knobs while it’s still training. Not after the fact, not just auditing, but real-time adjustments. That’s new.

They’re calling it the first off-the-shelf mechanistic interpretability tool that covers the whole dev pipeline, from dataset curation to final model tweaks. CEO Eric Ho frames it as turning AI development from alchemy into engineering. I’ve heard that pitch before, and it usually oversells, but let’s see what they’ve actually built.

The core idea is simple enough: instead of throwing more compute and data at problems and hoping for the best, you map out which neurons do what, then adjust them. Goodfire’s been using this internally to cut hallucinations. Now they’re packaging it up for the rest of us.

Silico uses agents to automate the interpretability work that used to require human experts. Ho says agents are finally strong enough to handle it. That’s interesting because it suggests the bottleneck wasn’t just the math—it was the manual labor of tracing neural pathways.

You can zoom in on individual neurons or groups, see what makes them fire, trace connections upstream and downstream. They found a neuron in Qwen 3 that’s tied to the trolley problem—activate it and the model starts framing everything as moral dilemmas. That’s the kind of weird internal behavior you’d never catch without this level of granularity.

The more practical demo: they asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases affecting 200 million users. The model said no, citing business impact. By boosting neurons associated with transparency, they flipped the answer nine times out of ten. The ethical reasoning was already there—it was just being overruled by commercial risk assessment.

That’s neat, but Leonard Bereska from the University of Amsterdam pushes back hard. He says Silico adds precision to the alchemy, but calling it engineering is a stretch. I’m inclined to agree. You’re still poking at a system nobody fully understands, just with better instruments.

The biggest limitation: you need access to the model’s internals. You’re not using this on ChatGPT or Gemini. It works with open-source models like Qwen 3, which is fine for researchers but limits real-world impact.

Silico also helps filter training data to avoid baking in unwanted behaviors from the start. That’s probably where the real value is—prevention rather than post-hoc fixes. If you can catch that a dataset is reinforcing a weird bias before training, you save yourself a lot of headache.

Goodfire’s mission is noble. We do need less black-box AI. But let’s not pretend we’ve cracked interpretability. Silico is a step forward, a useful tool for people who can get their hands dirty with model internals. It’s not the final answer, and it’s definitely not turning AI into a predictable engineering discipline. Not yet.

Comments (0)

Be the first to comment!