Production challenges, on your machine, with your agent. Scored on how you actually work, not what you can recall.
Coding interviews still measure the wrong skill. They ask if you can solve a puzzle on a whiteboard, in a sandbox, alone, while a stranger watches. None of that resembles how software actually ships now.
The skill that matters now is invisible to the puzzle test. How clearly you direct an agent through ambiguous work. How fast you notice when it's wrong. How thoroughly you verify what it produced. Whether your commits read like an engineer wrote them or a model did.
Kodwai measures that. Real problems, your own tools, a transcript another engineer can read. The score isn't a number you memorize. It's a number you can argue with.
Sarah Chen · 43 minutes · final score 94 / 100
Engineers running submissions from
How cleanly the work was broken down before any code was written. The grader reads your first prompts, the file structure that resulted, the order in which decisions appeared.
How clearly you steered the agent. Vague prompts that wander cost points. Tight prompts that constrain scope earn them. A re-prompt that recovers from a bad output earns more.
Whether you trusted or tested. Tests written before code. Edge cases the agent missed. The moments you overruled a confident-but-wrong suggestion.
Independent of time. Are modules well bounded. Is error handling deliberate. Is anything left as a sharp edge for the next reader.
Commits with meaning. A submission note another engineer could open six months later and follow. Variable names that explain themselves.
Weighted 70 AI · 30 objective. Audited weekly against human review.
Fifty-plus production problems. Backend, infra, payments, realtime. Pick one, work it on your own machine, submit. New problems each week.
Backend · Infra
Redis sorted sets, sliding window counters, Express middleware. 10M req/s.
60 minBackend · Auth
Authorization code flow, PKCE, refresh-token rotation, replay detection.
75 minBackend · Payments
Signature verify, dedupe via idempotency keys, retries with backoff.
60 minBackend · Caching
Consistent hashing, hot-key mitigation, write-through and read-aside paths.
90 minFrontend · Realtime
CRDT-backed presence, conflict-free merges, sub-50ms perceived latency.
75 minDevOps · Pipeline
Multipart upload, virus scan, resize ladder, CDN warm, signed-URL delivery.
90 min| 01 | jamie.b | Distributed rate limiter | claude-code | 96 / 100 |
| 02 | sarah.c | OAuth refresh rotation | claude-code | 94 / 100 |
| 03 | k.tanaka | Payment webhook handler | cursor | 93 / 100 |
| 04 | alex.m | Distributed cache TTL | claude-code | 91 / 100 |
| 05 | priya.r | Image upload pipeline | claude-code | 90 / 100 |
Most interview tools grade the output and hide the process. Kodwai shows the process. Every prompt, every commit, every moment your candidate overruled the agent. Or didn't.
Custom rubrics, time limits, your own challenges. Team review on a shared dashboard. Free tier for first hires.
Set up an interview →Free to start. Sixty-second signup. You'll pick your path on the way in.