Skip to content

Add cdp.browser_control: high-level browser automation API#157

Open
Copilot wants to merge 3 commits intomasterfrom
copilot/add-browser-control-api
Open

Add cdp.browser_control: high-level browser automation API#157
Copilot wants to merge 3 commits intomasterfrom
copilot/add-browser-control-api

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

CDP has all the primitives for browser automation but no ergonomic high-level API. This adds cdp.browser_control — a Playwright-style coroutine layer over the existing CDPConnection and generated domain modules, with full Sphinx docs.

New: cdp/browser_control.py

Category Functions
Navigation navigate, reload, go_back, go_forward, wait_for_load
Selection query_selector, query_selector_all
Interaction click, double_click, hover, type_text, clear_and_type, press_key, focus, select_option
Inspection get_text, get_attribute, get_bounding_box, is_visible
Screenshots screenshot, screenshot_element
JavaScript evaluate, evaluate_on_node
Waiting wait_for_selector, wait_for_event
from cdp.connection import CDPConnection
from cdp import browser_control as bc
from cdp import page

async def main():
    async with CDPConnection("ws://localhost:9222/devtools/page/ID") as conn:
        await conn.execute(page.enable())
        await bc.navigate(conn, "https://example.com")
        await bc.wait_for_load(conn)

        heading = await bc.get_text(conn, "h1")
        await bc.click(conn, "a.next")
        await bc.type_text(conn, "input[name='q']", "hello", delay=0.05)
        await bc.press_key(conn, "Enter")

        png = await bc.screenshot(conn)

Docs

  • docs/browser_control.rst — full API reference + common recipes, wired into the Sphinx toctree
  • docs/overview.rst — updated to reflect I/O and browser automation as first-class; removed stale "no I/O" statement

Tests

28 new unit tests in test/test_browser_control.py using a mocked CDPConnection — no real browser needed.


Note

Medium Risk
Mostly additive, but introduces a large new public API surface that drives DOM/input/screenshot CDP commands and may expose edge cases in real-browser behavior despite unit tests using mocks.

Overview
Adds new cdp.browser_control module providing a Playwright-style coroutine layer over CDPConnection for navigation, element querying/interaction, screenshots, JS evaluation, and waiting primitives.

Documentation is expanded with a new docs/browser_control.rst reference/recipes page (wired into docs/index.rst) and docs/overview.rst updated to describe the new I/O + automation mode and the chrome-devtools-protocol[io] extra. Includes a runnable examples/browser_control_example.py and a mocked-connection unit test suite in test/test_browser_control.py covering the new helpers.

Reviewed by Cursor Bugbot for commit 2168c87. Configure here.

Copilot AI linked an issue Apr 8, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Add browser control API with updated documentation Add cdp.browser_control: high-level browser automation API Apr 8, 2026
Copilot AI requested a review from P4X-ng April 8, 2026 11:12
@P4X-ng P4X-ng marked this pull request as ready for review April 8, 2026 14:35
@P4X-ng
Copy link
Copy Markdown
Contributor

P4X-ng commented Apr 8, 2026

/gemini review

@codex review please

@gemini-code-assist
Copy link
Copy Markdown

Thank you for the request. I have reviewed the cdp.browser_control implementation. The addition of a high-level, Playwright-style API is a significant improvement for usability. The code is well-structured, follows the existing project patterns, and includes comprehensive documentation and mocked unit tests. I have no further changes to suggest at this time.

Copy link
Copy Markdown

@gcassist gcassist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a high-level browser automation API in the cdp.browser_control module, providing Playwright-style helpers for navigation, element interaction, and screenshots. It includes comprehensive documentation, usage examples, and a suite of unit tests. Feedback identifies a critical issue where wait_for_event destructively consumes events from the shared connection queue, potentially breaking concurrent listeners. Additionally, improvements are suggested for error handling in navigate, event dispatching in select_option to support modern frameworks, and ensuring elements are scrolled into view before capturing screenshots.

Comment on lines +731 to +733
async for event in conn.listen():
if isinstance(event, event_type):
return event # type: ignore[return-value]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation of wait_for_event is destructive because it consumes events from the shared CDPConnection event queue. Since conn.listen() uses an asyncio.Queue.get() call (as seen in cdp/connection.py), any event received that does not match the requested event_type is effectively discarded and will never be seen by other concurrent listeners or subsequent calls. This makes the helper unreliable for multi-purpose automation scripts.

Comment on lines +118 to +120
result = await conn.execute(page.navigate(url=url), timeout=timeout)
frame_id: page.FrameId = result[0]
return frame_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The page.navigate command returns a tuple where the third element is errorText. If navigation fails (e.g., due to a DNS error or connection timeout), this field will contain a description of the error, but the current implementation ignores it and returns the frame_id as if it succeeded. It's better to check for error_text and raise an exception if it's present.

Suggested change
result = await conn.execute(page.navigate(url=url), timeout=timeout)
frame_id: page.FrameId = result[0]
return frame_id
frame_id, _, error_text, _ = await conn.execute(page.navigate(url=url), timeout=timeout)
if error_text:
raise RuntimeError(f"Navigation failed: {error_text}")
return frame_id

raise ValueError("Could not resolve node to a remote object")
# Use json.dumps to safely embed the value as a JS string literal.
js_value = _json.dumps(value)
expr = f"function() {{ this.value = {js_value}; this.dispatchEvent(new Event('change', {{bubbles: true}})); }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When programmatically changing the value of a <select> element, it is recommended to dispatch both input and change events. Many modern web frameworks (like React or Vue) rely on the input event to detect changes in form fields.

Suggested change
expr = f"function() {{ this.value = {js_value}; this.dispatchEvent(new Event('change', {{bubbles: true}})); }}"
expr = f"function() {{ this.value = {js_value}; this.dispatchEvent(new Event('input', {{bubbles: true}})); this.dispatchEvent(new Event('change', {{bubbles: true}})); }}"

:param quality: JPEG quality ``0``–``100`` (ignored for PNG).
:returns: Raw image data as :class:`bytes`.
"""
bbox = await get_bounding_box(conn, selector_or_node)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The element should be scrolled into view before taking a screenshot. If the element is currently off-screen, dom.get_box_model will return coordinates that are outside the current viewport, which can lead to empty or incorrect screenshots when using the clip parameter in page.capture_screenshot.

    node = (
        await query_selector(conn, selector_or_node)
        if isinstance(selector_or_node, str)
        else selector_or_node
    )
    await conn.execute(dom.scroll_into_view_if_needed(node_id=node))
    bbox = await get_bounding_box(conn, node)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2168c877b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

:raises cdp.connection.CDPCommandError: If navigation fails.
"""
result = await conn.execute(page.navigate(url=url), timeout=timeout)
frame_id: page.FrameId = result[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Raise when Page.navigate returns errorText

navigate() discards all return values from Page.navigate except frame_id, but the protocol reports navigation failures via errorText in the normal response payload (not as a command error). In cases like DNS failures or blocked navigations, this helper will appear successful and downstream code will wait on load events until timeout, obscuring the real failure cause. The helper should check the returned errorText and raise immediately.

Useful? React with 👍 / 👎.

object_id=obj.object_id,
)
)
await type_text(conn, node, text, delay=delay)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear value when clear_and_type receives empty text

clear_and_type() only selects existing content and then delegates to type_text(). When text == "", type_text() emits no key events, so the selected content is never deleted and the field remains unchanged. This breaks legitimate “clear only” workflows (e.g., resetting an input before submit) that call clear_and_type with an empty replacement.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new high-level, Playwright-style coroutine layer (cdp.browser_control) on top of existing generated CDP domains and CDPConnection, plus Sphinx documentation, an example script, and unit tests using a mocked connection.

Changes:

  • Introduces cdp/browser_control.py with navigation, DOM selection, interaction, screenshots, JS evaluation, and waiting helpers.
  • Adds Sphinx docs for the new API and updates the project overview to reflect the new I/O + automation mode.
  • Adds a new test suite (test/test_browser_control.py) and an end-to-end example script (examples/browser_control_example.py).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
cdp/browser_control.py New high-level automation API built on CDPConnection and generated CDP domains.
test/test_browser_control.py New unit tests for a subset of browser_control helpers using mocked CDPConnection.
examples/browser_control_example.py Example script demonstrating navigation, selection, evaluation, and screenshots.
docs/browser_control.rst New Sphinx page documenting the browser_control API and usage recipes.
docs/index.rst Adds browser_control to the Sphinx toctree.
docs/overview.rst Updates overview to describe sans-I/O vs I/O mode and provides a browser_control quick example.

Comment on lines +118 to +120
result = await conn.execute(page.navigate(url=url), timeout=timeout)
frame_id: page.FrameId = result[0]
return frame_id
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

navigate() ignores the errorText value returned by Page.navigate. Per cdp.page.navigate’s return type, navigation can fail without raising a command error; in that case errorText will be non-null and this helper currently returns a frame_id as if it succeeded. Consider checking error_text and raising an exception (or update the docstring/return type to make failures explicit).

Copilot uses AI. Check for mistakes.
)
obj = await conn.execute(dom.resolve_node(node_id=node))
if obj is None:
raise ValueError("Could not resolve node to a remote object")
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select_option() checks obj is None but does not verify obj.object_id before passing it to Runtime.callFunctionOn. DOM.resolveNode returns a RemoteObject with an optional object_id, so this can end up sending object_id=None and fail later with a less clear error. Please validate obj.object_id and raise a clear exception if it’s missing.

Suggested change
raise ValueError("Could not resolve node to a remote object")
raise ValueError("Could not resolve node to a remote object")
if obj.object_id is None:
raise ValueError(
"Resolved node does not have a remote object id; cannot call Runtime.callFunctionOn"
)
# Summary: fail early with a clear error when DOM.resolveNode does not
# return an object usable by Runtime.callFunctionOn.
# Checklist left to do:
# - None in this function for the reported issue.

Copilot uses AI. Check for mistakes.
Comment on lines +699 to +706
deadline = asyncio.get_event_loop().time() + timeout
while True:
try:
node_id = await query_selector(conn, selector, root=root)
return node_id
except ValueError:
pass
if asyncio.get_event_loop().time() >= deadline:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_selector() calls query_selector() with root=None on every poll, which triggers an extra DOM.getDocument round-trip each time. You can avoid repeated commands by resolving the document root once up-front (when root is not provided) and then only polling DOM.querySelector.

Suggested change
deadline = asyncio.get_event_loop().time() + timeout
while True:
try:
node_id = await query_selector(conn, selector, root=root)
return node_id
except ValueError:
pass
if asyncio.get_event_loop().time() >= deadline:
loop = asyncio.get_event_loop()
deadline = loop.time() + timeout
# Summary: resolve the default document root once, then poll using the
# same root to avoid an extra DOM.getDocument round-trip on each attempt.
# Left to do:
# - Verify whether document replacement during navigation should refresh
# the cached root in any future enhancement.
resolved_root = root
if resolved_root is None:
document = await conn.execute(dom.get_document())
resolved_root = document.node_id
while True:
try:
node_id = await query_selector(conn, selector, root=resolved_root)
return node_id
except ValueError:
pass
if loop.time() >= deadline:

Copilot uses AI. Check for mistakes.
Comment on lines +699 to +706
deadline = asyncio.get_event_loop().time() + timeout
while True:
try:
node_id = await query_selector(conn, selector, root=root)
return node_id
except ValueError:
pass
if asyncio.get_event_loop().time() >= deadline:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside an async function, asyncio.get_event_loop() is deprecated in modern Python and can behave differently depending on policy. Prefer asyncio.get_running_loop() (store it once) for the time() calls used to compute/compare the deadline.

Suggested change
deadline = asyncio.get_event_loop().time() + timeout
while True:
try:
node_id = await query_selector(conn, selector, root=root)
return node_id
except ValueError:
pass
if asyncio.get_event_loop().time() >= deadline:
loop = asyncio.get_running_loop()
deadline = loop.time() + timeout
while True:
try:
node_id = await query_selector(conn, selector, root=root)
return node_id
except ValueError:
pass
if loop.time() >= deadline:

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +8
"""
Browser Control Module

High-level browser automation API built on top of the CDP domain modules and
``CDPConnection``. This module provides Playwright-style helpers for common
browser automation tasks: element selection, clicking, typing, waiting,
navigation, and screenshots.

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is ~740 lines and contains many distinct concerns (navigation, querying, input, screenshots, JS eval, waiting). Per the repo guideline to keep code under a few hundred lines, consider splitting this into smaller, focused modules (e.g., browser_control/navigation.py, .../selectors.py, .../input.py, .../wait.py) and re-exporting the public API from browser_control/__init__.py.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +315 to +320
async def clear_and_type(
conn: CDPConnection,
selector_or_node: typing.Union[str, dom.NodeId],
text: str,
delay: float = 0.0,
) -> None:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several exported helpers in cdp.browser_control don’t appear to be covered by the new unit tests (e.g., clear_and_type, focus, select_option, is_visible, wait_for_load). Adding tests for these (including error paths like resolve-node failures) would help prevent regressions since this is a new high-level API.

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +9
import json
import pytest
from unittest.mock import AsyncMock, MagicMock, patch, call
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unused imports in this new test module (json, and patch/call from unittest.mock). Removing them will avoid lint noise and keep the tests easier to read.

Suggested change
import json
import pytest
from unittest.mock import AsyncMock, MagicMock, patch, call
import pytest
from unittest.mock import AsyncMock, MagicMock

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +97
png = await bc.screenshot(conn)
with open("/tmp/page.png", "wb") as fh:
fh.write(png)
print(f"Full-page screenshot saved ({len(png)} bytes) → /tmp/page.png")
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example writes screenshots to /tmp/..., which won’t work on Windows and some restricted environments. Consider using tempfile.gettempdir()/pathlib.Path (or writing relative to the current working directory) to make the example portable.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2168c87. Configure here.

object_id=obj.object_id,
)
)
await type_text(conn, node, text, delay=delay)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear_and_type re-focuses element, potentially deselecting text

Medium Severity

clear_and_type selects all text in the element via JavaScript (lines 338–355), then delegates to type_text which unconditionally calls focus(conn, selector_or_node) again (line 303). This second DOM.focus call can reset the text selection/cursor position in the element, causing the subsequent keystrokes to append rather than replace the selected content — defeating the purpose of "clear and type."

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2168c87. Configure here.

function_declaration=expr,
object_id=obj.object_id,
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select_option missing object_id None check

Low Severity

select_option checks if obj is None but does not check obj.object_id before passing it to runtime.call_function_on. If object_id is None, the CDP command is sent without a target object identifier and will fail. Other similar functions like evaluate_on_node, get_text, and is_visible all guard against obj.object_id is None.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2168c87. Configure here.

async def _wait() -> _T_Event:
async for event in conn.listen():
if isinstance(event, event_type):
return event # type: ignore[return-value]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-matching events permanently lost during wait_for_event

Medium Severity

wait_for_event iterates conn.listen(), which consumes events from the shared _event_queue. Events that don't match the target event_type are silently discarded. Any non-matching events received during the wait (e.g., DOM mutations while waiting for LoadEventFired) are permanently lost and unavailable to subsequent listeners.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2168c87. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Browser control

3 participants