Matt Pocock's TDD Skill for Claude Code: How to Make AI Testing Actually Work

Ask Claude to write tests for your code and it will probably do this: write every test first, implement everything at once, then pass all the tests together. This looks like test-driven development. It is not. It is a pattern that produces tests which feel comprehensive but are fundamentally unreliable.

Matt Pocock, the TypeScript educator behind Total TypeScript, built a Claude Code skill specifically to fix this.

Why AI testing usually fails

The core problem is what you might call horizontal slicing. When an LLM writes all the tests at once and then writes implementation to pass them, the tests end up validating the implementation the model imagined, not the code that actually runs. They pass immediately. They feel thorough. And when you change something later, half of them break in ways that have nothing to do with whether the feature still works.

This is because the tests were written to match a specific imagined implementation rather than to verify observable behavior. They are testing internal details, private methods, or mocked collaborators instead of what the system actually does from the outside. Good tests survive refactors. These do not.

What the skill enforces instead

The TDD skill constrains Claude to work in strict cycles, one behavior at a time:

Red: Write exactly one failing test for the next behavior
Green: Write the minimum code needed to make only that test pass
Refactor: Clean up once all passing tests remain green

Then repeat. Claude never writes implementation code before the test exists. It never writes multiple tests before implementing the first one. The discipline of the cycle is what makes the difference.

This is vertical slicing: one complete behavior implemented end to end, then the next. Because each test is written first and implemented immediately, it ends up exercising real code through real public interfaces. These tests survive refactors because they describe what the system does, not how it does it internally.

The planning phase before any code

Before any code is written at all, the skill requires answering three questions:

What interface changes are needed for the new behavior?
Which behaviors are most critical to test first?
Can the design be structured for deep modules and testability?

This upfront step sounds small but it catches a surprising number of design problems before they become bugs. When you have to articulate what the interface should look like before you write a single test, you often discover that the approach you had in mind was more complicated than it needed to be.

Who this is most useful for

If you are working on critical features where correctness is not optional, this skill is worth the two-minute install. The extra structure it adds is friction in a deliberate way: it slows down the parts of the process where rushing produces bad outcomes and lets you move fast where the path is clear.

It also produces a much cleaner test history. Each red-green pair is one behavior. When a test breaks six months later, you know exactly what behavior broke and exactly what the test was trying to verify, because they were written together to describe the same thing.

Install it with:

npx skills add mattpocock/skills/tdd

The full documentation and background on the approach is at aihero.dev/skill-test-driven-development-claude-code.

Writing reliable tests is about knowing what actually matters to verify. The same discipline applies to decisions: clarity about what you are really evaluating changes everything. When you are facing a hard call, Resolve helps you get to that clarity.