How I built a spec-driven browser automation agent by extending OpenCode with dynamic task graphs and stateful JavaScript execution
Introduction
I needed an agent that can run browser-heavy expert workflows with branching SOPs, preserve state across long decision chains, and still remain programmable by power users, when I set out to build it with existing coding agents, I faced a fundamental challenge: existing coding agents, like claude code, excel at writing code, but they struggle with complex, multi-step workflows that require:
- Decision-making at runtime - choosing different paths based on what’s discovered
- State persistence - storing and retrieving data across workflow steps
- Rich ecosystem access - leveraging mature tools like Playwright and Stagehand
My solution was to build oh-my-kiro, a plugin for OpenCode that combines:
- Graph-based spec execution - tasks.md files with branching, loops, and parallel execution
- Stateful JavaScript runtime - Jupyter kernels for computation, storage, and browser control
his post explains the implementation behind my approach, dives into the architectural decisions and implementation details that make this it work beautifully.
0: The Foundation - OpenCode + oh-my-kiro
The base runtime is OpenCode. On top of it, I built oh-my-kiro, an OpenCode plugin centered on spec-driven execution plus meta-tooling.
I chose this only because it is open source and powerful.
In practice, oh-my-kiro provides the scaffold: orchestrator/subagents, spec workflows, background tasks, and a stateful execute_code tool that runs JavaScript in Jupyter-backed sessions.
1. From Kiro to a spec-driven coding agent
I first implemented a classic spec-driven development agent copy from Kiro , it was released here
Every spec consists of three files:
requirements.md: what to builddesign.md: how to build ittasks.md: executable task plan which is a linear checklist
This already improved implementation quality versus direct vibe-coding, because the agent had to reason through explicit artifacts before writing code.
But I quickly hit a ceiling: many real-world expert procedures are not linear checklists. They are branching workflows with judgment points and retries.
2. From Linear Tasks to Graph Tasks
2.1 The Problem with Linear Workflows
Trending spec-driven development (Kiro) follows a simple pattern:
1 | Requirements → Design → Tasks → Implementation |
Tasks are executed linearly: task 1, then task 2, then task 3, etc.
This works for straightforward coding tasks. But browser automation scenarios often involve:
- Expert decision trees: Security audits, compliance checks, multi-variant testing
- Conditional branching: “If XSS found, do A; if SQL injection, do B”
- Retry loops: Poll until condition met, then continue
- Parallel execution: Test multiple vulnerability types simultaneously
A linear task model cannot express them cleanly. A graph can.
2.2 The Graph-Task Solution
I extended the original tasks.md format with special markers that transform a linear task list into a dynamic task graph:
| Marker | Type | Description |
|---|---|---|
[ ]^ |
Start | Entry point - can have multiple for parallel starts |
[ ] |
Regular | Standard implementation task |
[ ]? |
Judgment | Decision point - returns conclusions for routing |
[ ]$ |
Terminal | Execution ends here |
Example: Security Audit Workflow
1 | - [ ]^ 1. Setup test environment |
This keeps authoring friction low (still plain markdown), while giving the orchestrator enough structure to execute dynamic flows.
2.3 Iteration Tracking
For loops, iteration count is tracked in the status:
1 | - [ ]^ 1. Initialize → [x] (1st completion) |
This allows the agent to know how many times a task has been executed and make decisions accordingly. By default , the agent can loop forever.
3. Stateful execute_code in JS runtime (Jupyter) as meta-tooling
The most important design choice is running execute_code in a stateful JavaScript runtime (Jupyter-backed notebook sessions).
3.1 Why Stateful Execution Matters
Complex graph-based workflows have a critical requirement: state management.
Consider a security audit workflow:
- Task 1: Detect authentication type → store result
- Task 2: If “Custom auth”, try bypass → use auth type from Task 1
- Task 3: Scan endpoints → use bypass results from Task 2
- Task 4: Generate report → aggregate all findings
Many agents struggle here because:
- Variables don’t persist across tool calls
- File I/O is cumbersome for temporary state
- No native way to share state between tasks
3.2 The Solution: JS running in Jupyter Kernels
I implemented execute_code, a tool that runs JavaScript in Jupyter kernels with full Node.js runtime access:
For example , to control browser with stagehand:
1 | // Step 1: initialize once (stateful kernel session) |
3.3 Architecture
1 | ┌─────────────────────────────────────────────────────────────┐ |
Key Components:
- Jupyter Server: REST API for kernel lifecycle, WebSocket for code execution
- jslab Kernel: JavaScript kernel based on tslab, provides Node.js runtime
- CodeSessionDB: SQLite-backed session persistence, manages kernel connections apart
- Notebook Storage:
.ipynbfiles preserve execution history and state
3.4 The Benefits of the JS Runtime
3.4.1 Benefit A: compute + data store + OS control in one place
A single runtime gives the agent three powers:
- Compute: transform data, run analysis, derive decisions.
- Data store: persist variables across tasks/branches/iterations.
- OS control: use Node APIs, shell commands, network, filesystem.
For graph execution, data store is the unlock. Complex branching requires shared variables (auth_type, bypass_result, risk_findings, counters, branch evidence). Traditional coding agents struggle to manage this state elegantly; notebook-scoped JS variables solve it naturally.
3.4.2 Benefit B: mature JS ecosystem for browser-use
Because runtime is JavaScript, the agent can directly load rich browser ecosystems:
- Playwright
- Stagehand
- project-specific internal SDKs
The kiro.require() syntax makes package loading practical for automation workflows, so users can bring their own SDKs and treat the agent as an orchestration layer rather than a closed system.
Why JavaScript?
For browser-use scenarios, JS is the native habitat:
- Modern web frontends are overwhelmingly built on the JS/TS ecosystem.
- Browser automation APIs (DOM events, page context, network hooks, extension-like patterns) are first-class in JS tooling.
- Many production browser SDKs and wrappers are shipped JS-first, with faster updates and better examples in JS.
4. Skills - Guiding the Agent to Write Better Code
When the agent runs JavaScript code with execute_code, it needs to know the best practices and common patterns for the target use.
In oh-my-kiro, I wrote a skill named browser-use which provide a context-specific guidance of how to use stagehand in this specific js runtime , for example , you should use kiro.require("@browserbasehq/stagehand") instead of require("@browserbasehq/stagehand").
Especially, when you write Custom/DIY packages: your own tools that LLMs have never seen, it is better to arm a skill to trigger the agent how to write the code correctly
5. Conclusion
Implementation pattern in one sentence: treat markdown specs as the control plane (graph + routing) and stateful JS-in-Jupyter as the data plane (memory + tooling + execution) and skills as the knowledge plane.
6. Getting Started
If you want to try this approach
- Install OpenCode
1 | npm install -g @opencode-ai/opencode |
- Install the oh-my-kiro plugin:
Edit ~/.config/opencode/opencode.jsonc:
1 | { |
- Set up Jupyter + tslab:
1 | pip install jupyter && npm install -g tslab && tslab install |
Then write your first spec with a graph-based tasks.md and let the agent handle the complexity!
This project is open source. Check out Oh My Kiro for more details.