About the Source Bias Atlas

A 72-hour public-API hackathon project, built on top of daily.dev's public source feeds.

What is this?

The Source Bias Atlas is an interactive map of daily.dev's content sources, clustered by stylistic personality. Each dot is a source; sources that publish in similar styles — comparable hype, cadence, depth, topical focus — sit near each other. The clusters are discovered automatically from the data, not curated.

The goal is to make the underlying "texture" of your feed visible: which sources lean clickbait-y, which are firehose news, which are deep technical longreads, which spark the most discussion.

The data

A source is one daily.dev feed surfaced via /feeds/source/{handle}. For each source we collected recent posts (titles, summaries, tags, read-time, upvote and comment counts, post type, dates) and aggregated them into 20 numerical features.

  • Snapshot generated: loading…
  • Sources in atlas: loading… non-Squad
  • Sources with fewer than ~10 recent posts in our sample are excluded for stability.

How sources are characterized

Each source becomes a 20-dimensional feature vector. Long-tailed counts (upvotes, comments, posts/week) are log1p-transformed first; all features are then z-scored before clustering. Below is the full feature list with definitions, lifted from the atlas pipeline so it always stays in sync with what the atlas actually shows.

FeatureDescriptionRange (min – max)Polarity
Loading feature metadata…
A note on hype_score, since people ask

hype_score is the fraction of titles that either contain a curated hype-lexicon phrase (e.g. revolutionary, game-changing, you won't believe, insane, shocking) or end with one or more exclamation marks. Higher = more clickbait energy.

Source of truth: features/features/title_style.py. It's a heuristic, not an LLM judgment — easy to audit, easy to argue with.

How sources are clustered

Clustering: K-means on the z-scored feature matrix. The number of clusters is chosen to balance silhouette and interpretability (typically 6–8). Cluster labels are then generated from the dominant features of each cluster centroid.

2D layout: UMAP (n_neighbors and min_dist tuned for visual separation) over the same feature matrix. Random state is fixed so the layout is reproducible across runs.

What the axes mean

Loading axis metadata…

Limitations

  • Snapshot in time — not live; regenerated on demand.
  • Skews toward sources with at least ~10 recent posts in our sample window.
  • User-created Squads are excluded by default — toggle on the atlas to show them. They often have very thin samples and would dominate noise.
  • Heuristic features only. No LLM is used in v1; everything is regex, counts, ratios and a small curated lexicon. That keeps it auditable.
  • Engagement is post-level; we don't know who upvoted or why.

Built with

  • Next.js 14 (static export, no runtime API routes)
  • deck.gl for the WebGL scatter map
  • Recharts for the per-source feature radar
  • Tailwind for the dark UI
  • Python: scikit-learn (K-means), umap-learn (2D layout), pandas
  • daily.dev's public source feed API for the underlying data

Sibling project

Source Originality Score → asks the sharper question: when N daily.dev sources cover the same story, who got there first? Same hackathon, same data ethos, different lens.

What's next / known limits

  • Feature definitions are heuristic, not LLM-derived. Easy to argue with — that's the point.
  • Cluster labels are auto-generated from dominant features; some are sharper than others.
  • UMAP is non-linear: re-running may rotate the layout. Distances and clusters are stable; absolute axes are not.
  • Engagement is post-level. We can't see who upvoted or why.

Built for

The daily.dev 72-hour Public API hackathon, 2026. No auth, no backend — the entire site is a static export driven by a small set of pre-built JSON artifacts.

← Back to the atlasSource code on GitHub →