About Source Originality Score
A sibling experiment to Source Bias Atlas. The Atlas asks what kind of stories does each source like to publish? This one asks the simpler, sharper question: when N sources cover the same story, who got there first?
What the score measures
For each source we look at every multi-source topic it appeared in and ask whether it was the scooper (position 0 in the topic) or an echo. The headline number is the weighted_originality: a Bayesian-shrunk scoop rate that pulls sources with few samples toward a neutral baseline so that one lucky scoop doesn't crown an unknown blog the king of breaking news.
scoops— topics where this source posted firstechoes— topics where it was not firstmedian_lag_hours— when echoing, how late on average?coverage_rate— what fraction of multi-source topics did this source touch?
Two ways we cluster posts into topics
- URL match. When two daily.dev posts point at the same canonical URL (or a small permutation of one), we know with high confidence they're covering the same story. The strongest signal we have, and the one we trust most.
- Title similarity. When URLs differ — every source rewrites the same press release — we fall back to a title-similarity threshold tuned via manual review of
scoop_examples.md. Less precise than URL match, but catches the "every blog wrote up the same OpenAI announcement" case.
Built with
- Next.js 14 (static export, no runtime API routes)
- Recharts for the per-source scatter
- Tailwind for the dark UI
- Python pipeline: pandas, rapidfuzz for title clustering, url-normalize for canonicalisation
- daily.dev's public source-feed API for the underlying data
Caveats
- Same-story detection is imperfect. Some clusters will be false positives; some real-world echoes will not get clustered.
- Lag is measured against the first post on daily.dev, not the original publishing time. A source that publishes elsewhere first but submits to daily.dev late will look like an echo here.
- Recent stories are favored — older posts may have echoes outside our window.
- "Originality" here is operational: first to a story. It says nothing about analysis quality, depth, or whether a take is good. A late, thoughtful piece may matter more than a fast, shallow one.
- Snapshot in time, not live — regenerated on demand from a single JSON artifact.
Built for
The daily.dev 72-hour Public API hackathon, 2026. No auth, no backend — the entire site is a static export driven by a single public/originality.json snapshot produced by the OS-1 pipeline. To swap in fresh data, drop a new file in.
← Back to the leaderboardSource code on GitHub →Sibling: Source Bias Atlas →