Proposed
Links to 0007
Date: 2026-05-11
Deciders: Engineering team
Having chosen Azure AI Agent Service as the orchestration platform (ADR-0007), the next decision is how many agents to define and where their responsibilities begin and end.
This maps directly to the service boundary question in distributed systems. The distributed transactions video makes the trade-off explicit for both extremes:
Too few (one agent / monolith): > “When you’re first building an application, transactions are easy… wrap the whole thing in a transaction.” A single agent with a compound system prompt handles all roles. Simple to start, but coupling means the interviewer tone, scoring rubric, and sales pitch compete in one context window — and any change to one affects all.
Too many (five or more agents): > “Once you get to five or six services all publishing and reacting to each other’s events, figuring out the current state of any given transaction becomes really difficult. Where exactly did it fail? Which compensating actions have already run?” Fine-grained agents (separate knowledge retrieval, scoring sub-agent, resource catalogue agent, personalisation agent…) maximise isolation but create an observability problem. Each handoff is a potential failure point requiring its own compensating action.
Three agents: > “Choreography is usually where teams start and for simple flows it works great. If you have three or four steps, the services are truly independent… orchestration keeps things simple.” Three clearly-bounded agents with sequential handoffs and one orchestrator matches the pattern the video endorses as the industry default for flows of this complexity.
Additionally, EU AI Act Article 14 requires human oversight to be feasible. More agents = more handoff boundaries to gate; fewer agents = oversight points are clear and auditable.
| Dimension | Assessment |
|---|---|
| Context window | All conversation history + interview + exam + sales in one 4,096-token context (ADR-0004 constraint) — will overflow for longer sessions |
| Role isolation | None — interviewer tone bleeds into examiner scoring |
| Observability | Trivial — one thread, one log |
| EU AI Act oversight | Human can review single output but cannot intervene at phase boundary |
| Compensating action | Restart whole session on failure |
| NIS2 audit | Single log per session — simple but coarse |
| Dimension | Assessment |
|---|---|
| Interviewer | Conducts the practice session; adversarial tone; no scoring vocabulary |
| Examiner | Receives obfuscated transcript; scores against rubric; produces structured result |
| Aftersales | Triggered only on score below threshold; recommends learning resources; never sees raw conversation |
| Context window | Each agent receives only its own relevant context — avoids the 4,096-token ceiling per turn |
| Role isolation | Clean — each agent has one system prompt and one responsibility |
| Observability | Orchestrator knows exactly which agent is in flight (ADR-0007) |
| EU AI Act oversight | Human gate at Examiner → Aftersales boundary; Aftersales trigger threshold is auditable |
| Compensating action | Per-agent: retry interview turn; re-run examiner with stored transcript; skip aftersales on failure |
| NIS2 audit | Per-agent turn log; PII obfuscated before Examiner receives transcript |
| Aftersales isolation | Agent never receives the candidate’s name or raw session ID — only topic gaps |
Hypothetical split: KnowledgeRetriever, QuestionSelector, Interviewer, Scorer, ResourceRecommender, PersonalisationEngine.
| Dimension | Assessment |
|---|---|
| Isolation | Maximum |
| Orchestration complexity | High — six failure points, six compensating actions |
| Observability | Requires distributed tracing across all agent threads |
| EU AI Act | Multiple models = multiple transparency obligations (Article 52 per model) |
| Build cost | Disproportionate for current team size |
| Video verdict | “Figuring out the current state becomes really difficult… digging through logs across a dozen different services” |
The video’s conclusion applies directly: teams start with choreography (or fine-grained decomposition), then converge on orchestration with the right granularity. Over-decomposing from day one is the anti-pattern.
Three agents: Interviewer, Examiner, Aftersales, orchestrated sequentially by Azure AI Agent Service.
Interviewer - Conducts the mock interview for the selected topic (topicId from ADR-0004) - Adversarial but fair tone; does not score or evaluate - Terminates when the candidate submits or the session expires - Output: raw conversation transcript (stored in Azure thread)
Examiner - Receives the transcript with all
candidate names replaced by [CANDIDATE] (EU AI Act data
minimisation, NIS2 breach scope reduction) - Scores against a structured
rubric per topic; produces a JSON result with per-concept scores -
Output: structured score object (not the raw transcript — scope
isolation) - Human-in-the-loop gate: examiner output is reviewable
before Aftersales fires (configurable, default on in production)
Aftersales - Triggered only when overall score is below configurable threshold (default: < 7/10) - Receives only topic identifier and concept gap list — no conversation content, no candidate name - Recommends learning resources from a curated catalogue - Regulatory note: recommendations must not be based on inferred personal characteristics (EU AI Act Article 5 prohibited practices); input is topic gaps only
| Step | Failure | Compensating action |
|---|---|---|
| Interviewer | Timeout / model error | Retry current turn; full session restart preserves topic context |
| Examiner | Failure after transcript stored | Re-run examiner with stored transcript (idempotent — same input, same score) |
| Aftersales | Failure | Skip silently; user is not aware aftersales fired — no visible inconsistency |
The Aftersales agent is the natural point for a “soft” compensating action: unlike a card charge, a missed resource recommendation leaves no inconsistent state visible to the user.
TopicData.cs server-side prompt builder
(ADR-0002) is repurposed as the Interviewer system prompt source;
Examiner and Aftersales get separate prompt templates