About Source Originality Score

A sibling experiment to Source Bias Atlas. The Atlas asks what kind of stories does each source like to publish? This one asks the simpler, sharper question: when N sources cover the same story, who got there first?

What the score measures

For each source we look at every multi-source topic it appeared in and ask whether it was the scooper (position 0 in the topic) or an echo. The headline number is the weighted_originality: a Bayesian-shrunk scoop rate that pulls sources with few samples toward a neutral baseline so that one lucky scoop doesn't crown an unknown blog the king of breaking news.

scoops — topics where this source posted first
echoes — topics where it was not first
median_lag_hours — when echoing, how late on average?
coverage_rate — what fraction of multi-source topics did this source touch?

Two ways we cluster posts into topics

URL match. When two daily.dev posts point at the same canonical URL (or a small permutation of one), we know with high confidence they're covering the same story. The strongest signal we have, and the one we trust most.
Title similarity. When URLs differ — every source rewrites the same press release — we fall back to a title-similarity threshold tuned via manual review of scoop_examples.md. Less precise than URL match, but catches the "every blog wrote up the same OpenAI announcement" case.

Built with

Next.js 14 (static export, no runtime API routes)
Recharts for the per-source scatter
Tailwind for the dark UI
Python pipeline: pandas, rapidfuzz for title clustering, url-normalize for canonicalisation
daily.dev's public source-feed API for the underlying data

Caveats

Same-story detection is imperfect. Some clusters will be false positives; some real-world echoes will not get clustered.
Lag is measured against the first post on daily.dev, not the original publishing time. A source that publishes elsewhere first but submits to daily.dev late will look like an echo here.
Recent stories are favored — older posts may have echoes outside our window.
"Originality" here is operational: first to a story. It says nothing about analysis quality, depth, or whether a take is good. A late, thoughtful piece may matter more than a fast, shallow one.
Snapshot in time, not live — regenerated on demand from a single JSON artifact.

Built for

The daily.dev 72-hour Public API hackathon, 2026. No auth, no backend — the entire site is a static export driven by a single public/originality.json snapshot produced by the OS-1 pipeline. To swap in fresh data, drop a new file in.

← Back to the leaderboard Source code on GitHub →Sibling: Source Bias Atlas →