Engineering notes

Engineering notes.

Short, honest write-ups on the problems we work on — what's solved, what isn't, and how we tell the difference. We publish when we have something true to say, not on a schedule.

On grounding what a model says.

A language model will produce an answer to almost anything you ask it. The harder question is whether that answer is supported by what you actually have in front of you. We build our systems so that every claim points back to specific evidence — a record, a document, a measurement — and so that a claim with nothing behind it is marked as exactly that, rather than quietly smoothed over.

The uncomfortable part is cause and effect. It is easy to generate text that asserts one thing led to another. It is much harder to show the data supports the link — and most of the time it doesn't, at least not on its own. Our default is to decline the causal claim and report what we can actually see. We would rather say less and be right.

None of this is finished work. Grounding catches a great deal and still misses some. We measure where it fails, and we treat those failures as the next problem rather than an embarrassment.

On finding what matters.

Across enough markets and sources, something is always moving. Most of it doesn't matter. The job is to find the few changes that genuinely do — and to do it without sounding an alarm every time the data wobbles.

That is a balance between two kinds of mistake: missing a real signal, and crying wolf. Tune only for sensitivity and you bury people in noise; tune only for precision and you miss the thing they needed to see. We build for that trade-off on purpose, and we accept that the right setting depends on what a given decision can tolerate.

We don't believe there is a universal answer here. There is careful work, measured against reality, and repeated. That is the part we think is worth doing well.

On getting sources to agree.

One source is a claim. Two that were gathered independently and still agree is closer to a fact. A good deal of our work is deciding, automatically, when several noisy, mismatched sources are really saying the same thing — and when they only appear to.

The failure mode we worry about most is a confident answer quietly assembled from sources that contradict each other. So when they disagree, our default is to surface the disagreement rather than average it into a tidy number. A clean answer that hides a split is worse than an honest “these don't line up yet.”

This is unfinished. Reconciling entities and claims across sources that were never meant to match is genuinely hard, and we treat each case it gets wrong as the next thing to improve.

On telling change from noise.

A snapshot is easy. The value is in what moved — and most of what looks like movement is the data breathing, not the world changing. Re-reading the same landscape over time, and separating real change from that background wobble, is its own problem.

Get it wrong in one direction and you miss the entrant, the shift, the thing someone needed to know. Wrong in the other and you cry “it changed” so often that no one listens. We calibrate against what actually happened, market by market, and re-calibrate when reality tells us we were off.

We don't think there's a setting that's right everywhere. There's measurement, honesty about the misses, and another pass. That's the work.

Back to home