Live · Free to start · No card required

Measure real
AI collaboration.

Production challenges, on your machine, with your agent. Scored on how you actually work, not what you can recall.

Start a challenge→Hiring? Set up interviews

Ch. IThe Premise

Coding interviews still measure the wrong skill. They ask if you can solve a puzzle on a whiteboard, in a sandbox, alone, while a stranger watches. None of that resembles how software actually ships now.

The skill that matters now is invisible to the puzzle test. How clearly you direct an agent through ambiguous work. How fast you notice when it's wrong. How thoroughly you verify what it produced. Whether your commits read like an engineer wrote them or a model did.

Kodwai measures that. Real problems, your own tools, a transcript another engineer can read. The score isn't a number you memorize. It's a number you can argue with.

Ch. IIThe Demo

kodwai · sarah.chen · rate-limiter
═══════════════════════════════════════════════
  AI Collaboration Score: 94 / 100
═══════════════════════════════════════════════
  Problem decomposition   ████████░░  87%
  AI agent direction      █████████░  92%
  Verification & testing  ██████████  98%
  Code quality            █████████░  93%
  Communication clarity   ████████░░  89%
═══════════════════════════════════════════════

Sarah Chen · 43 minutes · final score 94 / 100

Engineers running submissions from

Ch. IIIThe Score

What the score actually measures.

01
Problem decomposition.
How cleanly the work was broken down before any code was written. The grader reads your first prompts, the file structure that resulted, the order in which decisions appeared.
02
Agent direction.
How clearly you steered the agent. Vague prompts that wander cost points. Tight prompts that constrain scope earn them. A re-prompt that recovers from a bad output earns more.
03
Verification.
Whether you trusted or tested. Tests written before code. Edge cases the agent missed. The moments you overruled a confident-but-wrong suggestion.
04
Code quality.
Independent of time. Are modules well bounded. Is error handling deliberate. Is anything left as a sharp edge for the next reader.
05
Communication.
Commits with meaning. A submission note another engineer could open six months later and follow. Variable names that explain themselves.

Weighted 70 AI · 30 objective. Audited weekly against human review.

Ch. IVFor Engineers

Build something you'd actually ship.

Fifty-plus production problems. Backend, infra, payments, realtime. Pick one, work it on your own machine, submit. New problems each week.

Selected · Week 1950+ live · added weekly

01
Distributed rate limiter
Backend · Infra
Redis sorted sets, sliding window counters, Express middleware. 10M req/s.
60 min
02
OAuth with refresh rotation
Backend · Auth
Authorization code flow, PKCE, refresh-token rotation, replay detection.
75 min
03
Idempotent webhook handler
Backend · Payments
Signature verify, dedupe via idempotency keys, retries with backoff.
60 min
04
Distributed cache with TTL
Backend · Caching
Consistent hashing, hot-key mitigation, write-through and read-aside paths.
90 min
05
Collaborative cursor sync
Frontend · Realtime
CRDT-backed presence, conflict-free merges, sub-50ms perceived latency.
75 min
06
Image upload pipeline
DevOps · Pipeline
Multipart upload, virus scan, resize ladder, CDN warm, signed-URL delivery.
90 min

Top runs · This weekFull leaderboard →

01	jamie.b	Distributed rate limiter	claude-code	96 / 100
02	sarah.c	OAuth refresh rotation	claude-code	94 / 100
03	k.tanaka	Payment webhook handler	cursor	93 / 100
04	alex.m	Distributed cache TTL	claude-code	91 / 100
05	priya.r	Image upload pipeline	claude-code	90 / 100

Ch. VFor Hiring Teams

Watch the work,
not the answer.

Most interview tools grade the output and hide the process. Kodwai shows the process. Every prompt, every commit, every moment your candidate overruled the agent. Or didn't.

Backend EngineerRound 2 · Acme

3 submissions · 1 reviewed

CandidateScore by dimensionFinalSubmitted·

Sarah Chen

Distributed rate limiter

94 / 100

2h agoReview →

K. Tanaka

Distributed rate limiter

87 / 100

YesterdayReview →

Alex Mendez

Distributed rate limiter

81 / 100

2 days agoReview →

Each bar shows score on one of the five dimensionsOpen transcript on review

Custom rubrics, time limits, your own challenges. Team review on a shared dashboard. Free tier for first hires.

Set up an interview →

Ch. VIBegin

Open the app.
Ship something real.

Free to start. Sixty-second signup. You'll pick your path on the way in.

For developersStart a challenge →For hiring teamsSet up an interview →

Measure realAI collaboration.

What the score actually measures.

Problem decomposition.

Agent direction.

Verification.

Code quality.

Communication.

Build something you'd actually ship.

Distributed rate limiter

OAuth with refresh rotation

Idempotent webhook handler

Distributed cache with TTL

Collaborative cursor sync

Image upload pipeline

Watch the work,not the answer.

Open the app.Ship something real.

Measure real
AI collaboration.

Watch the work,
not the answer.

Open the app.
Ship something real.