Browser-use Agent based on Graph-Spec and JS-in-Jupyter Meta Tooling

How I built a spec-driven browser automation agent by extending OpenCode with dynamic task graphs and stateful JavaScript execution


Introduction

I needed an agent that can run browser-heavy expert workflows with branching SOPs, preserve state across long decision chains, and still remain programmable by power users, when I set out to build it with existing coding agents, I faced a fundamental challenge: existing coding agents, like claude code, excel at writing code, but they struggle with complex, multi-step workflows that require:

  1. Decision-making at runtime - choosing different paths based on what’s discovered
  2. State persistence - storing and retrieving data across workflow steps
  3. Rich ecosystem access - leveraging mature tools like Playwright and Stagehand

My solution was to build oh-my-kiro, a plugin for OpenCode that combines:

  • Graph-based spec execution - tasks.md files with branching, loops, and parallel execution
  • Stateful JavaScript runtime - Jupyter kernels for computation, storage, and browser control

his post explains the implementation behind my approach, dives into the architectural decisions and implementation details that make this it work beautifully.


0: The Foundation - OpenCode + oh-my-kiro

The base runtime is OpenCode. On top of it, I built oh-my-kiro, an OpenCode plugin centered on spec-driven execution plus meta-tooling.

I chose this only because it is open source and powerful.

In practice, oh-my-kiro provides the scaffold: orchestrator/subagents, spec workflows, background tasks, and a stateful execute_code tool that runs JavaScript in Jupyter-backed sessions.


1. From Kiro to a spec-driven coding agent

I first implemented a classic spec-driven development agent copy from Kiro , it was released here

Every spec consists of three files:

  • requirements.md: what to build
  • design.md: how to build it
  • tasks.md: executable task plan which is a linear checklist

This already improved implementation quality versus direct vibe-coding, because the agent had to reason through explicit artifacts before writing code.

But I quickly hit a ceiling: many real-world expert procedures are not linear checklists. They are branching workflows with judgment points and retries.


2. From Linear Tasks to Graph Tasks

2.1 The Problem with Linear Workflows

Trending spec-driven development (Kiro) follows a simple pattern:

1
Requirements → Design → Tasks → Implementation

Tasks are executed linearly: task 1, then task 2, then task 3, etc.

This works for straightforward coding tasks. But browser automation scenarios often involve:

  • Expert decision trees: Security audits, compliance checks, multi-variant testing
  • Conditional branching: “If XSS found, do A; if SQL injection, do B”
  • Retry loops: Poll until condition met, then continue
  • Parallel execution: Test multiple vulnerability types simultaneously

A linear task model cannot express them cleanly. A graph can.

2.2 The Graph-Task Solution

I extended the original tasks.md format with special markers that transform a linear task list into a dynamic task graph:

Marker Type Description
[ ]^ Start Entry point - can have multiple for parallel starts
[ ] Regular Standard implementation task
[ ]? Judgment Decision point - returns conclusions for routing
[ ]$ Terminal Execution ends here

Example: Security Audit Workflow

1
2
3
4
5
6
7
- [ ]^ 1. Setup test environment
- [ ] 2. Run initial scan
- [ ]? 3. Classify vulnerability type
- [ ]? 3.1 parallel jump: If XSS found, jump to 4, If SQL injection, jump to 5, If none found, jump to 6
- [ ] 4. XSS exploitation
- [ ] 5. SQL injection testing
- [ ]$ 6. Generate report

This keeps authoring friction low (still plain markdown), while giving the orchestrator enough structure to execute dynamic flows.

2.3 Iteration Tracking

For loops, iteration count is tracked in the status:

1
2
3
4
- [ ]^ 1. Initialize        → [x]   (1st completion)
- [ ] 2. Poll for result → [/1] (queued, 1 prior completion)
[-1] (in_progress, 1 prior)
[x2] (2nd completion)

This allows the agent to know how many times a task has been executed and make decisions accordingly. By default , the agent can loop forever.


3. Stateful execute_code in JS runtime (Jupyter) as meta-tooling

The most important design choice is running execute_code in a stateful JavaScript runtime (Jupyter-backed notebook sessions).

3.1 Why Stateful Execution Matters

Complex graph-based workflows have a critical requirement: state management.

Consider a security audit workflow:

  • Task 1: Detect authentication type → store result
  • Task 2: If “Custom auth”, try bypass → use auth type from Task 1
  • Task 3: Scan endpoints → use bypass results from Task 2
  • Task 4: Generate report → aggregate all findings

Many agents struggle here because:

  • Variables don’t persist across tool calls
  • File I/O is cumbersome for temporary state
  • No native way to share state between tasks

3.2 The Solution: JS running in Jupyter Kernels

I implemented execute_code, a tool that runs JavaScript in Jupyter kernels with full Node.js runtime access:

For example , to control browser with stagehand:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Step 1: initialize once (stateful kernel session)
execute_code({
notebook_file: "/workspace/.kiro/scripts/browser.ipynb",
description: "Initialize Stagehand and open target page",
code: `
const { Stagehand, CustomOpenAIClient } = kiro.require("@browserbasehq/stagehand")
const OpenAI = kiro.require("openai").default

const stagehand = new Stagehand({
env: "LOCAL",
llmClient: new CustomOpenAIClient({
modelName: process.env.BROWSER_LLM_MODEL,
client: new OpenAI({
apiKey: process.env.BROWSER_LLM_API_KEY,
baseURL: process.env.BROWSER_LLM_BASE_URL,
}),
}),
})

await stagehand.init()
const page = stagehand.context.pages()[0]
await page.goto("https://demo.site/login")
console.log("Stagehand ready")
`,
})

// Step 2: Variables persist, reuse existing variables in the same notebook_file
execute_code({
notebook_file: "/workspace/.kiro/scripts/browser.ipynb",
description: "Perform actions and extract structured result",
code: `
await stagehand.act("type user@example.com into email input")
await stagehand.act("type correct-password into password input")
await stagehand.act("click sign in button")

const result = await stagehand.extract(
"extract current page title and whether login succeeded",
{
type: "object",
properties: {
title: { type: "string" },
login_success: { type: "boolean" },
},
required: ["title", "login_success"],
}
)

console.log(result)
`,
})

3.3 Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
┌─────────────────────────────────────────────────────────────┐
│ OpenCode / Kiro │
│ │
│ execute_code({ notebook_file, code }) │
│ │ │
│ ▼ │
│ ┌─────────────────┐ REST API ┌──────────────┐ │
│ │ KernelClient │ ◄─────────────────► │ Jupyter │ │
│ │ (WebSocket) │ WebSocket │ Server │ │
│ └────────┬────────┘ ◄─────────────────► └──────┬───────┘ │
│ │ │ │
│ │ ┌──────▼───────┐ │
│ └──────────────────────────────│ jslab kernel │ │
│ │ (JavaScript) │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key Components:

  1. Jupyter Server: REST API for kernel lifecycle, WebSocket for code execution
  2. jslab Kernel: JavaScript kernel based on tslab, provides Node.js runtime
  3. CodeSessionDB: SQLite-backed session persistence, manages kernel connections apart
  4. Notebook Storage: .ipynb files preserve execution history and state

3.4 The Benefits of the JS Runtime

3.4.1 Benefit A: compute + data store + OS control in one place

A single runtime gives the agent three powers:

  1. Compute: transform data, run analysis, derive decisions.
  2. Data store: persist variables across tasks/branches/iterations.
  3. OS control: use Node APIs, shell commands, network, filesystem.

For graph execution, data store is the unlock. Complex branching requires shared variables (auth_type, bypass_result, risk_findings, counters, branch evidence). Traditional coding agents struggle to manage this state elegantly; notebook-scoped JS variables solve it naturally.

3.4.2 Benefit B: mature JS ecosystem for browser-use

Because runtime is JavaScript, the agent can directly load rich browser ecosystems:

  • Playwright
  • Stagehand
  • project-specific internal SDKs

The kiro.require() syntax makes package loading practical for automation workflows, so users can bring their own SDKs and treat the agent as an orchestration layer rather than a closed system.

Why JavaScript?
For browser-use scenarios, JS is the native habitat:

  • Modern web frontends are overwhelmingly built on the JS/TS ecosystem.
  • Browser automation APIs (DOM events, page context, network hooks, extension-like patterns) are first-class in JS tooling.
  • Many production browser SDKs and wrappers are shipped JS-first, with faster updates and better examples in JS.

4. Skills - Guiding the Agent to Write Better Code

When the agent runs JavaScript code with execute_code, it needs to know the best practices and common patterns for the target use.
In oh-my-kiro, I wrote a skill named browser-use which provide a context-specific guidance of how to use stagehand in this specific js runtime , for example , you should use kiro.require("@browserbasehq/stagehand") instead of require("@browserbasehq/stagehand").

Especially, when you write Custom/DIY packages: your own tools that LLMs have never seen, it is better to arm a skill to trigger the agent how to write the code correctly

5. Conclusion

Implementation pattern in one sentence: treat markdown specs as the control plane (graph + routing) and stateful JS-in-Jupyter as the data plane (memory + tooling + execution) and skills as the knowledge plane.


6. Getting Started

If you want to try this approach

  1. Install OpenCode
1
npm install -g @opencode-ai/opencode
  1. Install the oh-my-kiro plugin:

Edit ~/.config/opencode/opencode.jsonc:

1
2
3
{
"plugin": ["@oh-my-kiro/oh-my-kiro"],
}
  1. Set up Jupyter + tslab:
1
pip install jupyter && npm install -g tslab && tslab install

Then write your first spec with a graph-based tasks.md and let the agent handle the complexity!


This project is open source. Check out Oh My Kiro for more details.