Accepted
Links to 0005
Date: 2026-05-11
Deciders: Engineering team
The chat practice feature requires a language model capable of conducting technical interviews, scoring answers, and streaming responses. Multiple providers were evaluated across cost, capability, latency, and alignment with the project’s educational mission (teaching AI engineering concepts).
| Dimension | Assessment |
|---|---|
| Capability | Very high |
| Cost | Pay-per-token — no free tier for API access |
| Vendor lock-in | High — proprietary API format |
| Relevance to AI engineering curriculum | Moderate — not a topic in the interview guide |
| Dimension | Assessment |
|---|---|
| Capability | High — same models as OpenAI |
| Cost | Pay-per-token — provisioned throughput adds cost |
| Setup | Requires Azure subscription + model deployment |
| Relevance | Low |
| Dimension | Assessment |
|---|---|
| Capability | High |
| Cost | Free tier available but limited |
| API format | Proprietary (though OpenAI-compatible mode exists) |
| Relevance | Low |
| Dimension | Assessment |
|---|---|
| Cost | Zero inference cost |
| Deployment | Requires user to run local inference — not viable for a hosted web app |
| Capability | Variable — depends on user hardware |
| Dimension | Assessment |
|---|---|
| Capability | Good for conversational Q&A and RAG — well-suited to interview simulation |
| Cost | Free tier: 1,000 inference credits, 40 req/min |
| API format | OpenAI-compatible (/v1/chat/completions) |
| Relevance | High — NVIDIA NIM is a topic in the AI engineering interview guide itself |
| SSE streaming | Full support |
NVIDIA NIM
(nvidia/nemotron-mini-4b-instruct) as the default inference
provider, called via an OpenAI-compatible API.
Two principles drove this decision:
Provider agnosticism: The backend calls a
standard /v1/chat/completions endpoint with OpenAI message
format. Switching to a different provider (OpenAI, Mistral, Groq, etc.)
requires changing only the base URL and model name — no code
changes.
Curriculum alignment: NVIDIA NIM is one of the eight core topics in the interview guide. Using it as the inference backend is intentional — candidates practise with the technology they are being tested on.
BaseAddress and Model in the backend
config.