Live · Free to start · No card required

Measure real
AI collaboration.

Production challenges, on your machine, with your agent. Scored on how you actually work, not what you can recall.

Start a challengeHiring? Set up interviews
Ch. IThe Premise

Coding interviews still measure the wrong skill. They ask if you can solve a puzzle on a whiteboard, in a sandbox, alone, while a stranger watches. None of that resembles how software actually ships now.

The skill that matters now is invisible to the puzzle test. How clearly you direct an agent through ambiguous work. How fast you notice when it's wrong. How thoroughly you verify what it produced. Whether your commits read like an engineer wrote them or a model did.

Kodwai measures that. Real problems, your own tools, a transcript another engineer can read. The score isn't a number you memorize. It's a number you can argue with.

Ch. IIThe Demo
kodwai · sarah.chen · rate-limiter
═══════════════════════════════════════════════
AI Collaboration Score: 94 / 100
═══════════════════════════════════════════════
Problem decomposition ████████░░ 87%
AI agent direction █████████░ 92%
Verification & testing ██████████ 98%
Code quality █████████░ 93%
Communication clarity ████████░░ 89%
═══════════════════════════════════════════════

Sarah Chen · 43 minutes · final score 94 / 100

Engineers running submissions from

Google
Meta
Apple
Microsoft
Amazon
Stripe
Netflix
Vercel
Ch. IIIThe Score

What the score actually measures.

  1. 01

    Problem decomposition.

    How cleanly the work was broken down before any code was written. The grader reads your first prompts, the file structure that resulted, the order in which decisions appeared.

  2. 02

    Agent direction.

    How clearly you steered the agent. Vague prompts that wander cost points. Tight prompts that constrain scope earn them. A re-prompt that recovers from a bad output earns more.

  3. 03

    Verification.

    Whether you trusted or tested. Tests written before code. Edge cases the agent missed. The moments you overruled a confident-but-wrong suggestion.

  4. 04

    Code quality.

    Independent of time. Are modules well bounded. Is error handling deliberate. Is anything left as a sharp edge for the next reader.

  5. 05

    Communication.

    Commits with meaning. A submission note another engineer could open six months later and follow. Variable names that explain themselves.

Weighted 70 AI · 30 objective. Audited weekly against human review.

Ch. IVFor Engineers

Build something you'd actually ship.

Fifty-plus production problems. Backend, infra, payments, realtime. Pick one, work it on your own machine, submit. New problems each week.

Selected · Week 1950+ live · added weekly
  1. 01

    Distributed rate limiter

    Backend · Infra

    Redis sorted sets, sliding window counters, Express middleware. 10M req/s.

    60 min
  2. 02

    OAuth with refresh rotation

    Backend · Auth

    Authorization code flow, PKCE, refresh-token rotation, replay detection.

    75 min
  3. 03

    Idempotent webhook handler

    Backend · Payments

    Signature verify, dedupe via idempotency keys, retries with backoff.

    60 min
  4. 04

    Distributed cache with TTL

    Backend · Caching

    Consistent hashing, hot-key mitigation, write-through and read-aside paths.

    90 min
  5. 05

    Collaborative cursor sync

    Frontend · Realtime

    CRDT-backed presence, conflict-free merges, sub-50ms perceived latency.

    75 min
  6. 06

    Image upload pipeline

    DevOps · Pipeline

    Multipart upload, virus scan, resize ladder, CDN warm, signed-URL delivery.

    90 min
Top runs · This weekFull leaderboard →
01jamie.bDistributed rate limiterclaude-code96 / 100
02sarah.cOAuth refresh rotationclaude-code94 / 100
03k.tanakaPayment webhook handlercursor93 / 100
04alex.mDistributed cache TTLclaude-code91 / 100
05priya.rImage upload pipelineclaude-code90 / 100
Ch. VFor Hiring Teams

Watch the work,
not the answer.

Most interview tools grade the output and hide the process. Kodwai shows the process. Every prompt, every commit, every moment your candidate overruled the agent. Or didn't.

Backend EngineerRound 2 · Acme
3 submissions · 1 reviewed
CandidateScore by dimensionFinal·

Sarah Chen

Distributed rate limiter

92
88
96
91
89

94 / 100

Review →

K. Tanaka

Distributed rate limiter

78
82
90
85
73

87 / 100

Review →

Alex Mendez

Distributed rate limiter

85
79
72
88
80

81 / 100

Review →
Each bar shows score on one of the five dimensionsOpen transcript on review

Custom rubrics, time limits, your own challenges. Team review on a shared dashboard. Free tier for first hires.

Set up an interview →
Ch. VIBegin

Open the app.
Ship something real.

Free to start. Sixty-second signup. You'll pick your path on the way in.