Topic Bank Input — Mockup
Stage 0 of the redesigned HDLY pipeline. Jake & Scott add candidate events; the system handles dedupe, rerun-surfacing, and selection downstream.
What this mockup shows
Glossary
- Event — the historical happening itself; the row in
historical_events.
- Description — full prose with framing and context. Primary input, what Jake pastes from Wikipedia / On This Day / etc.
- Synopsis — short card-facing one-liner (~80 chars). Auto-derived from description; user-editable.
- Source URLs — array of research links (Wikipedia, Britannica, etc.). One per line in the form.
- Suggester — who originated the suggestion. Defaults to the signed-in user; overridable when entering on behalf of someone else (forwarded listener email, Lindsay-suggested topic, etc.).
- Bank — the suggestion pool: events with
status='suggested', waiting for Will to pick.
- Aired catalog — releases that have already aired. Title in this context is the published episode title (release metadata), distinct from the event's synopsis.
Input-first design
- The calendar is gone. Jake's actual workflow is event-driven: "I read a thing, it happened on date Y, file it." Date is a field, not a navigator. Scott's flow is identical.
- Auth identity — the form's auth bar shows the signed-in user (mocked as Jake; production uses Google Sign-In). The suggester field defaults to identity but is overridable per entry.
- Quick-paste accepts one-liners like "May 21, 1881 — Clara Barton founds the American Red Cross" and parses out the date. Everything after the date populates Description; Synopsis auto-derives from Description's first sentence (capped ~80 chars). Discourse prefixes like "On", "In", "At" are stripped.
- Historical date is a single field accepting many formats:
April 4, 1865, 1881-05-21, 5/19/1536, 404 BC 1/2, 330 CE 5/11. Validates month/day/year (leap years for CE Feb 29); rejects ambiguous 2-digit years without an era marker. Feedback appears on blur (red error / yellow ambiguity / green echo) so editing isn't nagged mid-typing.
- Live adjacency check runs as you type the Synopsis or Description, hitting the bank + aired catalog with TF-IDF token weighting (rare tokens score higher; common ones don't drown signal). Requires ≥2 distinct shared tokens to surface at all, so single-word coincidences don't fire. Three sections light up when matches are found:
⚠ Already aired, ⚠ Already in the bank, ℹ Related events. Hard duplicate flags also require year match within ±1.
- Save with similarity review. If the new entry tightly matches existing bank entries (TF-IDF ≥ 6 plus year match within ±1), Save opens a modal listing each near-duplicate stacked with the new entry. Per-row decision: 🔁 Use this — discard mine (closes modal and loads the existing entry into the form for further editing), ✓ Keep both — distinct, or 🚫 Mark existing superseded. All rows must be resolved; forced-choice pattern matches the picker's post-pick modal.
Search across bank + aired catalog
- Top search bar hits both surfaces in one query — the "have we ever covered Julius Caesar?" lookup Jake does today by uncollapsing all 12 months in his Doc and cross-referencing the Release Schedule. Now it's one box. Try "Plessy", "Lincoln", "Lindbergh", "Anne Boleyn".
- Results normalized across sections: each row leads with the full event date
YYYY-MM-DD (with BCE marker where appropriate), then the synopsis. Aired entries append (aired YYYY-MM-DD). Click a bank row to expand inline with description + sources + suggester meta + an Edit button. Click an aired row to open the script in a new tab.
Recent additions stream
- Right column shows the bank's recent activity — most-recent additions across all dates, ordered newest-first. Jake sees what Scott just added (and vice versa), without needing a separate handoff.
- Suggester chips filter the stream — All / Jake / Scott / Lindsay / Listener. Listener-flagged entries show a ♥ badge.
- Each card shows MMDD + year + synopsis + meta (suggester, when added) plus an Edit button that loads the entry into the form.
Editing existing entries (protection against catalog loss)
- Edit button on search results and stream cards loads the bank entry into the form. A yellow "✎ Editing existing entry" banner appears at the top of the form; save becomes
UPDATE instead of INSERT.
- Self-match suppressed — the entry being edited is excluded from adjacency comparisons, so it can't flag itself as a duplicate.
- Drift detection guards against an edit silently overwriting a different event. Three signals; ANY trips the warning modal: (1) combined synopsis+description token Jaccard ≥ 0.6, (2) synopsis-only Jaccard ≥ 0.7, (3) year shift > 5 years with multi-digit changes (single-character year typos like
1845 → 1945 are allowed). The modal lists which signals tripped and offers three choices: Discard edit — keep original, Save as new entry — preserve original (fresh INSERT, original stays as-is), or Overwrite anyway.
What's no longer here (and why)
- Calendar grid. Was solving "find empty MMDDs to fill" — but that's Will's problem, surfaced in the picker. Jake doesn't think date-first when entering.
- "Suggest as rerun" button. The picker auto-surfaces past airings on the target MMDD as rerun candidates. No manual flagging needed.
- Day editor with three columns (Aired / Bank / Add). Aired context is now contextual — appears in the adjacency panel when relevant. Bank context appears in the stream and the search.
- Picked / scheduled state on the calendar. That belonged to the picker, which already shows it.
- Auth user switcher. The mock-only identity dropdown is gone — production uses Google Sign-In; we don't want impersonation affordances.
Schema implications
historical_events has synopsis NOT NULL (short) and description NOT NULL (long), plus source_urls as a JSON array.
superseded_by_event_id nullable FK records the "this is a duplicate" decision from the post-save similarity-review modal. Picker filters out events where superseded_by_event_id IS NOT NULL.
- TF-IDF matching is a tactical fix; the picker spec'd
event_groups with kind-based filtering as the canonical similarity engine. Convergence target: both surfaces (Topic Input + Picker) consume the same engine with entity / event / embedding / llm-cluster detection layers.