Accepted
Links to 0006
Date: 2026-05-11
Deciders: Engineering team
The application needs to call NVIDIA NIM on behalf of the user. Two models exist: the platform holds one API key shared across all users, or each user supplies their own key. This decision has significant implications for cost, security, privacy, and operational complexity.
| Dimension | Assessment |
|---|---|
| User experience | Seamless — no setup required |
| Cost | Platform operator pays all inference costs |
| Abuse risk | High — a single key can be exhausted by a small number of abusive users |
| Key rotation | Requires redeployment or secret update affecting all users |
| Rate limiting | Shared 40 req/min cap across all concurrent users |
| Liability | Operator responsible for all API usage, including misuse |
| Key exposure | Secret must be stored securely in backend config — one breach affects everyone |
| Dimension | Assessment |
|---|---|
| User experience | One-time key entry per session |
| Cost | Zero to operator — each user pays for their own usage |
| Abuse risk | None to operator — each user is rate-limited by their own key |
| Rate limiting | Per-user quota — no contention between users |
| Liability | User is responsible for their own key usage |
| Key exposure | Key stored in backend memory cache only for session duration (30 min); never logged, never persisted to disk or database |
| Privacy | User controls their own inference usage and data sent to NVIDIA |
User-supplied API keys.
Beyond cost, the security rationale is decisive:
The UX trade-off (one-time key entry per session) is acceptable given the tool’s target audience — engineers who already hold NVIDIA NIM accounts as part of their AI development workflow.
POST /api/session, stored in IMemoryCache with
a 30-minute sliding expiration, and resolved server-side for all
subsequent requests by sessionId only.