Teamplay 01

January 14, 2026

On the purpose of testing

Looking back at my work on wp-browser and my blogging activity, there is an objective I've been chasing for some time: seamless testing of WordPress projects moving across levels of testing without limits.
(In case you're wondering: yes, I've used the word "seamless" in the sentence above to suggest and recall AI, but AI is not writing this post).

What does "moving across levels of testing without limits" mean, for me?
It means I do not care about what level of testing something is supposed to be (unit, integration, functional, end-to-end, you-name-it), I care about testing as many things as I can with as little code as possible with the fastest possible test.

As a freelance helping companies set up and use automated tests, I've come to believe the major obstacle to adoption is dogmatism.
I've seen developers stress more about whether the test they would like to write qualifies as a proper "unit test" than they did about introducing a potential security risk or planning a new feature.
"Which test to write first?", "Is this an integration test or a unit test?", "Can we write integration tests without writing unit tests?", "What should this suite be called?".

In my view, the purpose of testing is shipping with confidence and in time.
Anything that undermines that confidence (or avoids it from forming in the first place, more on that later) is not helpful and should be avoided.

As the author and maintainer of a testing solution for WordPress projects I would like to think I'm part of the solution.
The current documentation mentions testing levels here and there, but for the purpose of explaining what modules go typically together in what type of tests.
The previous documentation contained a detailed break down of testing levels. I think I was part of the problem then, my first step for companies willing to adopt automated testing was a theoretical introduction.
Today my first question is: what is the thing that breaks the most? We'll find the best testing tool for the job.

The best tool for the job

I have to admit that, for some time, it was really difficult to accept that Codeception plus wp-browser could not always be the best tool for the testing job.

As testing and testing technologies became more and more approachable, it became harder and harder to ignore alternative solutions like Playwright and defend the argument that "everything you can do with Playwright, you can do with the WPWebDriver module".
Technically, yes. Efficiently and practically? Not so much.

The API offered by Playwright beats the one provided by the WPWebDriver module. In no small measure because Playwright speaks the same language (JavaScript or TypeScript) of the thing under test: a website.
And I'm not even talking about how good the code-generation feature is.

But (preparing the ground for the big idea here), there is still something missing.
Playwright is a browser automation tool. While advanced and powerful, in the context of testing WordPress projects (or PHP projects for that matter) the fine-grained control of state is missing.
How to load a database dump between tests? So far I've done it using page objects or hooks using WP-CLI or other home-brew scripts. It works, but it's clunky and requires knowledge of the project and the testing harness.
And things get even more complicated when trying to manipulate files like uploads.

I think the convenience of using a testing solution like wp-browser or Playwright is that I have to learn a moderately complex setup once, and then can just port that knowledge over to other projects without having to re-learn the project specific solution again.
Furthermore documenting a standard solution is easier; the dissemination and sharing of knowledge helped by mere numbers.

What advantage does the WPWebDriver module have over Playwright, then?
Its friends.
I've rarely seen a testing project setup that would not use, together with the WPWebDriver module the WPDb and the WPFilesystem one.
Those modules, together with the WPWebDriver one, collaborate in the building of a suite that allows writing test code like this:

// Method provided by the WPDb module.
$postId = $I->havePostInDatabase([
    'post_title' => 'Hello world!',
    'post_status' => 'publish'
]);

$this->amOnPage('/index.php?p=' . $postId);

$I->waitForText('Hello world!');

Adding Playwright to the team

I guess I gave it away with the post title, but the big idea here is giving Playwright friends.
Specifically, the suite wp-browser modules.

I want to be able to write this Playwright spec file:

// Import a `test` function that will use the `teamplay` fixture.
// The `expect` function is re-exported from @playwright/test.
import { test, expect } from '../teamplay';

// Write a test that will use the `teamplay` fixture.
// Aliased to `I` for show-off value.
test('view published post', async ({ page, teamplay: I }) => {
    // Create post in database via the WPDb module.
    const postId = await I.havePostInDatabase({
      post_title: 'Hello world!',
      post_status: 'publish'
    });

    // The rest of the test uses standard Playwright functions.

    await page.goto(`/index.php?p=${postId}`);

    await expect(page.getByText('Hello world!')).toBeVisible();
});

I think the example contains all the interesting components of the solution I'm envisioning: Playwright can invoke PHP methods provided by the Codeception and wp-browser modules.
How? Requests to a server that will execute them and return their value.

The use of the teamplay fixture makes it possible to add hooks to the test execution life-cycle and invoke the suite before/after methods correctly.

The concept is pretty simple, but I'm sure it will be a valley of tears down the road.

Building with AI

The first piece of code I've written for the Teamplay feature is not code, is a specification file.

I use AI (Claude Code specifically) in my day-to-day work but have never used it to build a full feature. My hesitation is not about AI capability, it's about my personal knowledge and understanding of what is being done and why. I think, as many, that writing is a thinking tool and code-writing is not different in my opinion. If AI writes all the code, then it will be no different from what I have to do when dropping into a new client codebase: look around and understand approaches and, more generally, the theory of it.
I can still get that theory building phase by being the architect and "orchestrator" of the solution, I think.

So, what is a specification file?
It's a file that contains the "what" and "why" of the feature, not the "how".
Tools like Claude Code have a "plan mode" that signals to the tool and model powering it that it's not the time to write or update code, it's the time to think about what to write.
Still, plan mode is about how the code to accomplish a task should be written. Claude Code will ask questions about glaringly missing information ("You talk about a server, what is the server URL?"), but will take most implementation decisions based on common practice (emphasis on "common" not "best", it's a statistical model after all). In plan mode, the "what" and "why" are relegated to the prompt, too easily confused in the case of larger features, with the "how".
It's common to find sentences like these in the same prompt:

"Add support for XYZ feature" - what
"End all comments in a full-stop and run code sniffer on the generated files" - how

A specification file can be reviewed by another AI instance to find further issues.
A specification file is a better way to recover information when working with AI across sessions and possible failures.
Claude Code, as many other similar tools, provide prompt history, but to me it's akin to searching my bash history looking for that command I did run at a point. I prefer having a script.
A specification file can be used later, much later, to better infer differences between iterations.

Finally, based on the specification file multiple implementation plans can be written.

How do I write a specification file?
Manually, at first.
Then I will use AI to review it and ask questions about it until we agree all the required information is there.
This phase takes a lot of research and experimentation (not exclusively done by AI) to converge on a document detailing purpose and "ergonomics" of the thing to build.
By "ergonomics" I mean the kind of code or tool I want to end up with.
I use Claude Code without any MCPs or plugins to keep star-gazing at a minimum. Writing a specification is deep work and the more time passes between my thinking phases (i.e. when I need to answer the AI questions, review and approve changes or research something) the higher the chance to "get out of the zone".

I use a prompt like this one to write a specification file and be able to resume between sessions:

{description of idea or reference to a brainstorm file}

Start by creating a spec file called `spec-<feature_name>.md` in the current working directory if it doesn’t exist already.
If a spec file already exists, then read it before asking any question.
Your objective is not to write an implementation plan (the “how”), but to write a specification file (the “what” and “why").
Your task is to spot potential issues and missing information in my idea and ask me questions until you think not a single piece of information is missing and the spec file is complete.
When you spot an issue or missing information, ask me a single question, evaluate the answer, and keep asking me questions until you think no information is missing.
Always follow this procedure:
- ask me a single question
- evaluate the answer
- ask the next question if required
Every time you have new information, update the spec file with the new information.

The prompt is saved as keyboard shortcut so I can use the prompt everywhere I can type, not just in Claude Code.

I know about Claude Code slash commands and plugins, but they still concur in the building of a prompt at the end; my preference is to be extremely deterministic about what that prompt contains instead of relying on the model to use the right tool at the right time.
Also, I like building my own tools and prompts are tools in this new, wild age of AI.

Building in public and next steps

For my learning and accountability, I will build this new feature in public; documenting my work and findings through posts.
As anything I chronicle here, it might crash and burn.

You can find the first commit of this new feature here and it is, of course, just a specification file.