Meridian — Hard Questions

Not yet — and that's intentional at this stage. What you're looking at simulates exactly what a production integration would surface: the job reqs table mirrors what you'd pull from Greenhouse or Lever, the talent pool mirrors what you'd pull from Workday or Rippling, and the new hire roster mirrors what an HRIS onboarding trigger would provide.

The architecture is designed so that replacing the mock data files with real API calls is a configuration change, not a rebuild. In production, each table would be populated by a scheduled sync or a webhook event — a new req opens in Greenhouse and it appears in The Loop automatically; an offer is accepted in Workday and that employee appears in the Onboarding roster.

In this prototype, the candidate records, job reqs, and employee talent pool live in the application itself as static data — a stand-in for what would be real HRIS and ATS integration in production. Synthesis results and recommendations are held in browser memory for the duration of the session.

That's the honest answer about what this is: a prototype designed to show what the product should do, not a production system. Operationally, this data absolutely needs to persist. A hiring manager's synthesis needs to still be there on Thursday. Two HRBPs need to see the same candidate record. You need an audit trail of who ran what analysis and when. That requires a database layer, role-based access controls, and data residency decisions — which is exactly the kind of infrastructure this prototype is designed to inform.

The one thing that genuinely doesn't persist even in production: the text pasted into debrief fields is sent to the Anthropic API to generate the synthesis and is not stored or used for training. The API key never touches the browser — all Claude calls are proxied through a serverless function.

Right now, Meridian generates the plan — it doesn't track execution. That's a deliberate scope decision for this prototype, and it's also the honest answer about where most onboarding tools fall short: they produce a document and then disappear.

The natural next layer is a milestone check-in system: at the end of week 1, week 4, and day 60, the new hire and manager each answer 3–5 questions, and Meridian surfaces whether the plan is on track or needs refinement. The multi-turn conversation architecture is already in place — that loop just needs a trigger and a UI surface.

That's the right challenge. The current model takes HR's read of the employee as the input, which is one-sided. In practice, the manager's perspective — and ideally the employee's own stated interests — are both missing.

The full version of this would have three inputs: HR's assessment, a structured manager input (a short form asking about performance, readiness signals, and any concerns), and optionally a self-assessment from the employee. Meridian would then synthesize all three and surface where they agree, where they diverge, and what the divergence means for the mobility recommendation.

The divergence case is actually the most valuable: if HR thinks someone is ready and the manager doesn't, or if an employee is interested in a move the manager hasn't flagged, that's exactly the kind of blind spot that causes preventable attrition.

Fair point — bias detection is built into The Loop for interview feedback, but the mobility engine doesn't have the same safeguard. In production, the same bias-flagging logic should apply: if the model's rationale for a low match score relies on tenure, communication style, or other proxies that correlate with protected characteristics, that should be surfaced.

This is on the roadmap and architecturally straightforward — the same system prompt pattern that flags bias in interview feedback can be applied to mobility reasoning.

The bias detection catches the language, not the intent. If a manager writes "she struggled with technical questions" (behavioral, specific), that passes — even if the underlying judgment was biased. The tool surfaces non-competency language and vague subjective claims, but it can't audit whether a specific behavioral claim is true.

This is why the calibration questions matter as much as the bias flags — they're designed to force the debrief conversation to interrogate the specific evidence behind each claim, not just the language it's wrapped in.

Three layers of work:

Data integration — connect the tables to your actual HRIS, ATS, and internal systems. The mock data is a clean spec for what those integrations need to return.
Auth and access control — wrap the app in SSO, define role-based permissions (recruiter vs. HRBP vs. hiring manager vs. executive), and ensure candidates can only be seen by people with a legitimate need.
Persistence and audit trail — store syntheses, track who ran what analysis on whom, and maintain an audit log for compliance.

The core AI logic — the prompts, the schema enforcement, the bias detection vocabulary — is production-grade today.

Two reasons. First, structured output reliability — Claude is exceptionally consistent at returning valid JSON that matches a schema, which is what makes every output in Meridian renderable as a real UI rather than raw text. Second, instruction-following in sensitive domains — the bias detection prompt requires the model to apply a specific vocabulary and flag specific language patterns without over-flagging. Claude handles that nuance better than the alternatives for this use case.

At demo/prototype scale, essentially nothing — a few cents per synthesis. At enterprise scale with hundreds of debriefs per week, you're looking at $50–200/month in API costs depending on volume, which is trivial relative to the cost of a bad hire or an avoidable attrition event.

The Hard Questions