0008-agent-count-and-boundaries.md

8. Agent Count and Boundaries — Three Specialised Agents

Status

Proposed

Links to 0007

Date: 2026-05-11
Deciders: Engineering team

Context

Having chosen Azure AI Agent Service as the orchestration platform (ADR-0007), the next decision is how many agents to define and where their responsibilities begin and end.

This maps directly to the service boundary question in distributed systems. The distributed transactions video makes the trade-off explicit for both extremes:

Too few (one agent / monolith): > “When you’re first building an application, transactions are easy… wrap the whole thing in a transaction.” A single agent with a compound system prompt handles all roles. Simple to start, but coupling means the interviewer tone, scoring rubric, and sales pitch compete in one context window — and any change to one affects all.

Too many (five or more agents): > “Once you get to five or six services all publishing and reacting to each other’s events, figuring out the current state of any given transaction becomes really difficult. Where exactly did it fail? Which compensating actions have already run?” Fine-grained agents (separate knowledge retrieval, scoring sub-agent, resource catalogue agent, personalisation agent…) maximise isolation but create an observability problem. Each handoff is a potential failure point requiring its own compensating action.

Three agents: > “Choreography is usually where teams start and for simple flows it works great. If you have three or four steps, the services are truly independent… orchestration keeps things simple.” Three clearly-bounded agents with sequential handoffs and one orchestrator matches the pattern the video endorses as the industry default for flows of this complexity.

Additionally, EU AI Act Article 14 requires human oversight to be feasible. More agents = more handoff boundaries to gate; fewer agents = oversight points are clear and auditable.

Options Considered

Option A: Single agent (one compound system prompt)

Dimension Assessment
Context window All conversation history + interview + exam + sales in one 4,096-token context (ADR-0004 constraint) — will overflow for longer sessions
Role isolation None — interviewer tone bleeds into examiner scoring
Observability Trivial — one thread, one log
EU AI Act oversight Human can review single output but cannot intervene at phase boundary
Compensating action Restart whole session on failure
NIS2 audit Single log per session — simple but coarse

Option B: Three agents — Interviewer, Examiner, Aftersales ✅

Dimension Assessment
Interviewer Conducts the practice session; adversarial tone; no scoring vocabulary
Examiner Receives obfuscated transcript; scores against rubric; produces structured result
Aftersales Triggered only on score below threshold; recommends learning resources; never sees raw conversation
Context window Each agent receives only its own relevant context — avoids the 4,096-token ceiling per turn
Role isolation Clean — each agent has one system prompt and one responsibility
Observability Orchestrator knows exactly which agent is in flight (ADR-0007)
EU AI Act oversight Human gate at Examiner → Aftersales boundary; Aftersales trigger threshold is auditable
Compensating action Per-agent: retry interview turn; re-run examiner with stored transcript; skip aftersales on failure
NIS2 audit Per-agent turn log; PII obfuscated before Examiner receives transcript
Aftersales isolation Agent never receives the candidate’s name or raw session ID — only topic gaps

Option C: Five or more agents (granular decomposition)

Hypothetical split: KnowledgeRetriever, QuestionSelector, Interviewer, Scorer, ResourceRecommender, PersonalisationEngine.

Dimension Assessment
Isolation Maximum
Orchestration complexity High — six failure points, six compensating actions
Observability Requires distributed tracing across all agent threads
EU AI Act Multiple models = multiple transparency obligations (Article 52 per model)
Build cost Disproportionate for current team size
Video verdict “Figuring out the current state becomes really difficult… digging through logs across a dozen different services”

The video’s conclusion applies directly: teams start with choreography (or fine-grained decomposition), then converge on orchestration with the right granularity. Over-decomposing from day one is the anti-pattern.

Decision

Three agents: Interviewer, Examiner, Aftersales, orchestrated sequentially by Azure AI Agent Service.

Agent responsibilities

Interviewer - Conducts the mock interview for the selected topic (topicId from ADR-0004) - Adversarial but fair tone; does not score or evaluate - Terminates when the candidate submits or the session expires - Output: raw conversation transcript (stored in Azure thread)

Examiner - Receives the transcript with all candidate names replaced by [CANDIDATE] (EU AI Act data minimisation, NIS2 breach scope reduction) - Scores against a structured rubric per topic; produces a JSON result with per-concept scores - Output: structured score object (not the raw transcript — scope isolation) - Human-in-the-loop gate: examiner output is reviewable before Aftersales fires (configurable, default on in production)

Aftersales - Triggered only when overall score is below configurable threshold (default: < 7/10) - Receives only topic identifier and concept gap list — no conversation content, no candidate name - Recommends learning resources from a curated catalogue - Regulatory note: recommendations must not be based on inferred personal characteristics (EU AI Act Article 5 prohibited practices); input is topic gaps only

Compensating actions (saga pattern)

Step Failure Compensating action
Interviewer Timeout / model error Retry current turn; full session restart preserves topic context
Examiner Failure after transcript stored Re-run examiner with stored transcript (idempotent — same input, same score)
Aftersales Failure Skip silently; user is not aware aftersales fired — no visible inconsistency

The Aftersales agent is the natural point for a “soft” compensating action: unlike a card charge, a missed resource recommendation leaves no inconsistent state visible to the user.

PII handling (EU AI Act + NIS2)

Consequences