Add cdp.browser_control: high-level browser automation API#157
Add cdp.browser_control: high-level browser automation API#157
cdp.browser_control: high-level browser automation API#157Conversation
Agent-Logs-Url: https://github.com/HyperionGray/python-chrome-devtools-protocol/sessions/f5aedfd0-e970-4dc1-bc96-e614a946247f Co-authored-by: P4X-ng <223870169+P4X-ng@users.noreply.github.com>
…Command replacement, test clarity Agent-Logs-Url: https://github.com/HyperionGray/python-chrome-devtools-protocol/sessions/f5aedfd0-e970-4dc1-bc96-e614a946247f Co-authored-by: P4X-ng <223870169+P4X-ng@users.noreply.github.com>
cdp.browser_control: high-level browser automation API
|
/gemini review @codex review please |
|
Thank you for the request. I have reviewed the |
There was a problem hiding this comment.
Code Review
This pull request introduces a high-level browser automation API in the cdp.browser_control module, providing Playwright-style helpers for navigation, element interaction, and screenshots. It includes comprehensive documentation, usage examples, and a suite of unit tests. Feedback identifies a critical issue where wait_for_event destructively consumes events from the shared connection queue, potentially breaking concurrent listeners. Additionally, improvements are suggested for error handling in navigate, event dispatching in select_option to support modern frameworks, and ensuring elements are scrolled into view before capturing screenshots.
| async for event in conn.listen(): | ||
| if isinstance(event, event_type): | ||
| return event # type: ignore[return-value] |
There was a problem hiding this comment.
This implementation of wait_for_event is destructive because it consumes events from the shared CDPConnection event queue. Since conn.listen() uses an asyncio.Queue.get() call (as seen in cdp/connection.py), any event received that does not match the requested event_type is effectively discarded and will never be seen by other concurrent listeners or subsequent calls. This makes the helper unreliable for multi-purpose automation scripts.
| result = await conn.execute(page.navigate(url=url), timeout=timeout) | ||
| frame_id: page.FrameId = result[0] | ||
| return frame_id |
There was a problem hiding this comment.
The page.navigate command returns a tuple where the third element is errorText. If navigation fails (e.g., due to a DNS error or connection timeout), this field will contain a description of the error, but the current implementation ignores it and returns the frame_id as if it succeeded. It's better to check for error_text and raise an exception if it's present.
| result = await conn.execute(page.navigate(url=url), timeout=timeout) | |
| frame_id: page.FrameId = result[0] | |
| return frame_id | |
| frame_id, _, error_text, _ = await conn.execute(page.navigate(url=url), timeout=timeout) | |
| if error_text: | |
| raise RuntimeError(f"Navigation failed: {error_text}") | |
| return frame_id |
| raise ValueError("Could not resolve node to a remote object") | ||
| # Use json.dumps to safely embed the value as a JS string literal. | ||
| js_value = _json.dumps(value) | ||
| expr = f"function() {{ this.value = {js_value}; this.dispatchEvent(new Event('change', {{bubbles: true}})); }}" |
There was a problem hiding this comment.
When programmatically changing the value of a <select> element, it is recommended to dispatch both input and change events. Many modern web frameworks (like React or Vue) rely on the input event to detect changes in form fields.
| expr = f"function() {{ this.value = {js_value}; this.dispatchEvent(new Event('change', {{bubbles: true}})); }}" | |
| expr = f"function() {{ this.value = {js_value}; this.dispatchEvent(new Event('input', {{bubbles: true}})); this.dispatchEvent(new Event('change', {{bubbles: true}})); }}" |
| :param quality: JPEG quality ``0``–``100`` (ignored for PNG). | ||
| :returns: Raw image data as :class:`bytes`. | ||
| """ | ||
| bbox = await get_bounding_box(conn, selector_or_node) |
There was a problem hiding this comment.
The element should be scrolled into view before taking a screenshot. If the element is currently off-screen, dom.get_box_model will return coordinates that are outside the current viewport, which can lead to empty or incorrect screenshots when using the clip parameter in page.capture_screenshot.
node = (
await query_selector(conn, selector_or_node)
if isinstance(selector_or_node, str)
else selector_or_node
)
await conn.execute(dom.scroll_into_view_if_needed(node_id=node))
bbox = await get_bounding_box(conn, node)There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2168c877b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| :raises cdp.connection.CDPCommandError: If navigation fails. | ||
| """ | ||
| result = await conn.execute(page.navigate(url=url), timeout=timeout) | ||
| frame_id: page.FrameId = result[0] |
There was a problem hiding this comment.
Raise when Page.navigate returns errorText
navigate() discards all return values from Page.navigate except frame_id, but the protocol reports navigation failures via errorText in the normal response payload (not as a command error). In cases like DNS failures or blocked navigations, this helper will appear successful and downstream code will wait on load events until timeout, obscuring the real failure cause. The helper should check the returned errorText and raise immediately.
Useful? React with 👍 / 👎.
| object_id=obj.object_id, | ||
| ) | ||
| ) | ||
| await type_text(conn, node, text, delay=delay) |
There was a problem hiding this comment.
Clear value when clear_and_type receives empty text
clear_and_type() only selects existing content and then delegates to type_text(). When text == "", type_text() emits no key events, so the selected content is never deleted and the field remains unchanged. This breaks legitimate “clear only” workflows (e.g., resetting an input before submit) that call clear_and_type with an empty replacement.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a new high-level, Playwright-style coroutine layer (cdp.browser_control) on top of existing generated CDP domains and CDPConnection, plus Sphinx documentation, an example script, and unit tests using a mocked connection.
Changes:
- Introduces
cdp/browser_control.pywith navigation, DOM selection, interaction, screenshots, JS evaluation, and waiting helpers. - Adds Sphinx docs for the new API and updates the project overview to reflect the new I/O + automation mode.
- Adds a new test suite (
test/test_browser_control.py) and an end-to-end example script (examples/browser_control_example.py).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
cdp/browser_control.py |
New high-level automation API built on CDPConnection and generated CDP domains. |
test/test_browser_control.py |
New unit tests for a subset of browser_control helpers using mocked CDPConnection. |
examples/browser_control_example.py |
Example script demonstrating navigation, selection, evaluation, and screenshots. |
docs/browser_control.rst |
New Sphinx page documenting the browser_control API and usage recipes. |
docs/index.rst |
Adds browser_control to the Sphinx toctree. |
docs/overview.rst |
Updates overview to describe sans-I/O vs I/O mode and provides a browser_control quick example. |
| result = await conn.execute(page.navigate(url=url), timeout=timeout) | ||
| frame_id: page.FrameId = result[0] | ||
| return frame_id |
There was a problem hiding this comment.
navigate() ignores the errorText value returned by Page.navigate. Per cdp.page.navigate’s return type, navigation can fail without raising a command error; in that case errorText will be non-null and this helper currently returns a frame_id as if it succeeded. Consider checking error_text and raising an exception (or update the docstring/return type to make failures explicit).
| ) | ||
| obj = await conn.execute(dom.resolve_node(node_id=node)) | ||
| if obj is None: | ||
| raise ValueError("Could not resolve node to a remote object") |
There was a problem hiding this comment.
select_option() checks obj is None but does not verify obj.object_id before passing it to Runtime.callFunctionOn. DOM.resolveNode returns a RemoteObject with an optional object_id, so this can end up sending object_id=None and fail later with a less clear error. Please validate obj.object_id and raise a clear exception if it’s missing.
| raise ValueError("Could not resolve node to a remote object") | |
| raise ValueError("Could not resolve node to a remote object") | |
| if obj.object_id is None: | |
| raise ValueError( | |
| "Resolved node does not have a remote object id; cannot call Runtime.callFunctionOn" | |
| ) | |
| # Summary: fail early with a clear error when DOM.resolveNode does not | |
| # return an object usable by Runtime.callFunctionOn. | |
| # Checklist left to do: | |
| # - None in this function for the reported issue. |
| deadline = asyncio.get_event_loop().time() + timeout | ||
| while True: | ||
| try: | ||
| node_id = await query_selector(conn, selector, root=root) | ||
| return node_id | ||
| except ValueError: | ||
| pass | ||
| if asyncio.get_event_loop().time() >= deadline: |
There was a problem hiding this comment.
wait_for_selector() calls query_selector() with root=None on every poll, which triggers an extra DOM.getDocument round-trip each time. You can avoid repeated commands by resolving the document root once up-front (when root is not provided) and then only polling DOM.querySelector.
| deadline = asyncio.get_event_loop().time() + timeout | |
| while True: | |
| try: | |
| node_id = await query_selector(conn, selector, root=root) | |
| return node_id | |
| except ValueError: | |
| pass | |
| if asyncio.get_event_loop().time() >= deadline: | |
| loop = asyncio.get_event_loop() | |
| deadline = loop.time() + timeout | |
| # Summary: resolve the default document root once, then poll using the | |
| # same root to avoid an extra DOM.getDocument round-trip on each attempt. | |
| # Left to do: | |
| # - Verify whether document replacement during navigation should refresh | |
| # the cached root in any future enhancement. | |
| resolved_root = root | |
| if resolved_root is None: | |
| document = await conn.execute(dom.get_document()) | |
| resolved_root = document.node_id | |
| while True: | |
| try: | |
| node_id = await query_selector(conn, selector, root=resolved_root) | |
| return node_id | |
| except ValueError: | |
| pass | |
| if loop.time() >= deadline: |
| deadline = asyncio.get_event_loop().time() + timeout | ||
| while True: | ||
| try: | ||
| node_id = await query_selector(conn, selector, root=root) | ||
| return node_id | ||
| except ValueError: | ||
| pass | ||
| if asyncio.get_event_loop().time() >= deadline: |
There was a problem hiding this comment.
Inside an async function, asyncio.get_event_loop() is deprecated in modern Python and can behave differently depending on policy. Prefer asyncio.get_running_loop() (store it once) for the time() calls used to compute/compare the deadline.
| deadline = asyncio.get_event_loop().time() + timeout | |
| while True: | |
| try: | |
| node_id = await query_selector(conn, selector, root=root) | |
| return node_id | |
| except ValueError: | |
| pass | |
| if asyncio.get_event_loop().time() >= deadline: | |
| loop = asyncio.get_running_loop() | |
| deadline = loop.time() + timeout | |
| while True: | |
| try: | |
| node_id = await query_selector(conn, selector, root=root) | |
| return node_id | |
| except ValueError: | |
| pass | |
| if loop.time() >= deadline: |
| """ | ||
| Browser Control Module | ||
|
|
||
| High-level browser automation API built on top of the CDP domain modules and | ||
| ``CDPConnection``. This module provides Playwright-style helpers for common | ||
| browser automation tasks: element selection, clicking, typing, waiting, | ||
| navigation, and screenshots. | ||
|
|
There was a problem hiding this comment.
This module is ~740 lines and contains many distinct concerns (navigation, querying, input, screenshots, JS eval, waiting). Per the repo guideline to keep code under a few hundred lines, consider splitting this into smaller, focused modules (e.g., browser_control/navigation.py, .../selectors.py, .../input.py, .../wait.py) and re-exporting the public API from browser_control/__init__.py.
| async def clear_and_type( | ||
| conn: CDPConnection, | ||
| selector_or_node: typing.Union[str, dom.NodeId], | ||
| text: str, | ||
| delay: float = 0.0, | ||
| ) -> None: |
There was a problem hiding this comment.
Several exported helpers in cdp.browser_control don’t appear to be covered by the new unit tests (e.g., clear_and_type, focus, select_option, is_visible, wait_for_load). Adding tests for these (including error paths like resolve-node failures) would help prevent regressions since this is a new high-level API.
| import json | ||
| import pytest | ||
| from unittest.mock import AsyncMock, MagicMock, patch, call |
There was a problem hiding this comment.
There are unused imports in this new test module (json, and patch/call from unittest.mock). Removing them will avoid lint noise and keep the tests easier to read.
| import json | |
| import pytest | |
| from unittest.mock import AsyncMock, MagicMock, patch, call | |
| import pytest | |
| from unittest.mock import AsyncMock, MagicMock |
| png = await bc.screenshot(conn) | ||
| with open("/tmp/page.png", "wb") as fh: | ||
| fh.write(png) | ||
| print(f"Full-page screenshot saved ({len(png)} bytes) → /tmp/page.png") |
There was a problem hiding this comment.
This example writes screenshots to /tmp/..., which won’t work on Windows and some restricted environments. Consider using tempfile.gettempdir()/pathlib.Path (or writing relative to the current working directory) to make the example portable.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2168c87. Configure here.
| object_id=obj.object_id, | ||
| ) | ||
| ) | ||
| await type_text(conn, node, text, delay=delay) |
There was a problem hiding this comment.
clear_and_type re-focuses element, potentially deselecting text
Medium Severity
clear_and_type selects all text in the element via JavaScript (lines 338–355), then delegates to type_text which unconditionally calls focus(conn, selector_or_node) again (line 303). This second DOM.focus call can reset the text selection/cursor position in the element, causing the subsequent keystrokes to append rather than replace the selected content — defeating the purpose of "clear and type."
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2168c87. Configure here.
| function_declaration=expr, | ||
| object_id=obj.object_id, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
select_option missing object_id None check
Low Severity
select_option checks if obj is None but does not check obj.object_id before passing it to runtime.call_function_on. If object_id is None, the CDP command is sent without a target object identifier and will fail. Other similar functions like evaluate_on_node, get_text, and is_visible all guard against obj.object_id is None.
Reviewed by Cursor Bugbot for commit 2168c87. Configure here.
| async def _wait() -> _T_Event: | ||
| async for event in conn.listen(): | ||
| if isinstance(event, event_type): | ||
| return event # type: ignore[return-value] |
There was a problem hiding this comment.
Non-matching events permanently lost during wait_for_event
Medium Severity
wait_for_event iterates conn.listen(), which consumes events from the shared _event_queue. Events that don't match the target event_type are silently discarded. Any non-matching events received during the wait (e.g., DOM mutations while waiting for LoadEventFired) are permanently lost and unavailable to subsequent listeners.
Reviewed by Cursor Bugbot for commit 2168c87. Configure here.


CDP has all the primitives for browser automation but no ergonomic high-level API. This adds
cdp.browser_control— a Playwright-style coroutine layer over the existingCDPConnectionand generated domain modules, with full Sphinx docs.New:
cdp/browser_control.pynavigate,reload,go_back,go_forward,wait_for_loadquery_selector,query_selector_allclick,double_click,hover,type_text,clear_and_type,press_key,focus,select_optionget_text,get_attribute,get_bounding_box,is_visiblescreenshot,screenshot_elementevaluate,evaluate_on_nodewait_for_selector,wait_for_eventDocs
docs/browser_control.rst— full API reference + common recipes, wired into the Sphinx toctreedocs/overview.rst— updated to reflect I/O and browser automation as first-class; removed stale "no I/O" statementTests
28 new unit tests in
test/test_browser_control.pyusing a mockedCDPConnection— no real browser needed.Note
Medium Risk
Mostly additive, but introduces a large new public API surface that drives DOM/input/screenshot CDP commands and may expose edge cases in real-browser behavior despite unit tests using mocks.
Overview
Adds new
cdp.browser_controlmodule providing a Playwright-style coroutine layer overCDPConnectionfor navigation, element querying/interaction, screenshots, JS evaluation, and waiting primitives.Documentation is expanded with a new
docs/browser_control.rstreference/recipes page (wired intodocs/index.rst) anddocs/overview.rstupdated to describe the new I/O + automation mode and thechrome-devtools-protocol[io]extra. Includes a runnableexamples/browser_control_example.pyand a mocked-connection unit test suite intest/test_browser_control.pycovering the new helpers.Reviewed by Cursor Bugbot for commit 2168c87. Configure here.