Teaching Claude Code to Run Scripts and Check Browsers

In earlier posts, I described how I used ChatGPT for architectural reasoning and Claude Code for implementation. That workflow continues to evolve.

Recently I ran into two friction points that exposed a larger issue.

The problem was not capability. It was structure.

Fast Is Not Deterministic

Claude Code is very good at reading files, writing code, and running shell commands. The issue appears when the task becomes procedural.

Install dependencies
Run a build
Execute a gate script
Validate output
Capture evidence

Claude runs commands sequentially, interprets results, and decides what to do next. That works. But it is interpretive.

If there are twenty commands with specific stop conditions, there are twenty opportunities for divergence.

Run the same step twice and you may not get the same execution path.

The second issue was visibility. Claude can start a development server, but it cannot see what renders in a browser. When UI validation is required, a human becomes the bridge.

Both issues share the same root cause. The workflow lacked structure that encoded intent.

Manifest Driven Execution

The first solution was a structured step runner.

Instead of asking Claude to read a document and improvise commands, I define the execution plan explicitly in a JSONL manifest. Each line declares a command and its failure conditions.

Example:

1
2
3

{"type": "cmd", "id": "check-node", "label": "Verify Node.js version", "cmd": "node --version", "stop": [{"on": "version_lt", "version": "18.0.0"}]}
{"type": "cmd", "id": "install-deps", "label": "Install dependencies", "cmd": "npm ci", "stop": [{"on": "exit_nonzero"}]}
{"type": "cmd", "id": "run-build", "label": "Build project", "cmd": "npm run build", "stop": [{"on": "exit_nonzero"}]}

Claude executes the entire step with a single command.

`1`	`node dev/scripts/run-step.mjs tmp/STEP_020_execution_manifest.jsonl --step 020`

Failure conditions are declared, not inferred.

The runner evaluates exit codes, output patterns, and version checks mechanically. Claude does not need to interpret error text.

There is also a dry run mode to validate manifests before execution.

This shifts execution from interpretive to deterministic.

Runtime Verification Requires a Browser

A terminal tool such as curl can fetch raw HTML. It cannot execute client side JavaScript.

Modern applications construct the user visible DOM at runtime. Hydration, routing, and asynchronous data loading occur after JavaScript executes.

If you only inspect the initial response, you are not verifying the actual system state.

To close that gap, I built a headless browser verification tool using Python and Selenium. It launches Chrome, executes JavaScript, waits for declared conditions, and captures evidence.

Example:

`1`	`./dev/scripts/browser-check.sh --url http://localhost:5173/projects --out tmp/browser_check`

Each check produces:

Screenshot
Page source
Console log capture
JSON report
Markdown report

Wait conditions are composable.

`1`	`{"any": [{"visible": "#project-list"}, {"visible": ".error-banner"}]}`

The result is structured runtime evidence that Claude can evaluate directly.

Execution becomes:

Start the server
Run browser checks
Read structured results
Continue based on evidence

No manual bridge required.

The Pattern

Both tools implement the same principle.

Move decisions out of real time AI interpretation and into declared structure.

Instead of asking the model to infer success, define what success means.

Instead of describing what a browser rendered, declare the assertions and capture the output.

The boundary often matters more than the prompt.

What Comes Next

Deterministic execution is necessary. It is not sufficient.

The next layer is deterministic review.

Execution can be declared. Verification can be structured. Review must be governed.

That is the subject of the next post.