Skip to content
Open
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,19 @@ I/O using [Trio](https://trio.readthedocs.io/). This library handles the
WebSocket negotiation and session management, allowing you to transparently
multiplex commands, responses, and events over a single connection.

## Features

- **Pure CDP**: Direct access to Chrome DevTools Protocol
- **Async/Await**: Built on Trio for structured concurrency
- **Type Safety**: Full type hints for better IDE support
- **High-Level Utilities**: Puppeteer-inspired abstractions for common tasks
- Keyboard and mouse simulation
- Element interaction and querying
- Wait for elements to appear
- Pure CDP implementation (no JavaScript injection)

## Basic Example

The example below demonstrates the salient features of the library by navigating to a
web page and extracting the document title.

Expand Down Expand Up @@ -40,5 +53,30 @@ async with open_cdp(cdp_url) as conn:
print(html)
```

## High-Level Utilities Example

The library also provides high-level utilities for common automation tasks:

```python
from trio_cdp import open_cdp, page, target
from trio_cdp.util import query_selector, Keyboard

async with open_cdp(cdp_url) as conn:
async with conn.open_session(target_id) as session:
# Navigate to a page
await page.enable()
await page.navigate(url)

# Find an input field and type into it
input_field = await query_selector(session, 'input[name="search"]')
if input_field:
await input_field.type('Hello, World!')

# Press Enter to submit
keyboard = Keyboard(session)
await keyboard.press('Enter')
```

This example code is explained [in the documentation](https://trio-cdp.readthedocs.io)
and more example code can be found in the `examples/` directory of this repository.
and more example code can be found in the `examples/` directory of this repository,
including examples for taking screenshots and monitoring network events.
177 changes: 177 additions & 0 deletions UTILITIES_IMPLEMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Utilities Module Implementation Summary

## Overview

This implementation addresses the GitHub issue about extending `trio-chrome-devtools-protocol` with higher-level utility functions and classes for common browser automation tasks, inspired by Puppeteer/Pyppeteer.

## Decision: Integrated Approach

Rather than creating a separate `trio-puppeteer` package, the utilities are integrated directly into the main `trio_cdp` package as a `util` module. This approach was chosen because:

1. **Lightweight**: The utilities are thin wrappers around CDP commands
2. **No External Dependencies**: Everything uses native CDP, no JavaScript injection
3. **Tight Integration**: Direct access to session and connection objects
4. **Simplicity**: Users don't need to install/manage a separate package

## Implementation

### New Module: `trio_cdp/util.py`

Contains three main classes and utility functions:

#### 1. Keyboard Class
Provides keyboard input simulation:
- `down(key, text=None)` - Press key down
- `up(key)` - Release key
- `press(key, delay=0)` - Complete key press (down + up)
- `type(text, delay=0)` - Type a string character by character

**Example:**
```python
keyboard = Keyboard(session)
await keyboard.type("Hello, World!")
await keyboard.press("Enter")
```

#### 2. Mouse Class
Provides mouse action simulation:
- `move(x, y, steps=1)` - Move mouse with optional smooth interpolation
- `click(x, y, button='left', click_count=1, delay=0)` - Click at position
- `down(button='left', click_count=1)` - Mouse button down
- `up(button='left', click_count=1)` - Mouse button up

**Example:**
```python
mouse = Mouse(session)
await mouse.move(100, 200, steps=10) # Smooth movement
await mouse.click(100, 200)
```

#### 3. ElementHandle Class
Represents a handle to a DOM element with convenient interaction methods:
- `click(button='left', click_count=1, delay=0)` - Click the element
- `type(text, delay=0)` - Focus and type into element
- `get_attribute(name)` - Get HTML attribute value
- `get_property(name)` - Get JavaScript property value
- `get_text_content()` - Extract text content

**Example:**
```python
input_field = await query_selector(session, 'input[name="email"]')
if input_field:
await input_field.type('user@example.com')
```

#### Element Selection Functions
- `query_selector(session, selector, node_id=None)` - Find first matching element
- `query_selector_all(session, selector, node_id=None)` - Find all matching elements
- `wait_for_selector(session, selector, timeout=30, visible=False)` - Wait for element

**Example:**
```python
# Find and interact with elements
button = await query_selector(session, 'button.submit')
if button:
await button.click()

# Wait for dynamic content
result = await wait_for_selector(session, '.result', timeout=10, visible=True)
```

## Documentation

### Added Files
1. **docs/utilities.rst** - Comprehensive documentation for all utilities
2. **examples/form_interaction.py** - Example showing form interaction
3. **examples/keyboard_mouse.py** - Example demonstrating keyboard/mouse usage
4. **tests/test_util.py** - Unit tests for utility functions
5. **validate_utilities.py** - Validation script to verify module structure

### Updated Files
1. **README.md** - Added utilities section with examples
2. **docs/index.rst** - Added utilities to documentation table of contents
3. **trio_cdp/__init__.py** - Export util module

## Key Design Principles

1. **Pure CDP**: No JavaScript injection, all interactions use native CDP commands
2. **Async-First**: Fully compatible with Trio's async/await patterns
3. **Lightweight**: Minimal abstractions, close to underlying CDP
4. **Type-Safe**: Complete type hints for IDE support
5. **Composable**: Small, focused utilities that work well together
6. **Optional**: Core CDP functionality remains available; utilities are opt-in

## Benefits

### For Users
- **Intuitive API**: Familiar patterns for anyone coming from Puppeteer
- **Less Boilerplate**: Common tasks simplified with high-level methods
- **Type Safety**: Full IDE support with autocomplete and type checking
- **Pure Python**: No JavaScript knowledge required

### For the Project
- **Maintains Philosophy**: Stays true to lightweight, CDP-focused approach
- **No Breaking Changes**: Completely additive, existing code unaffected
- **Extensible**: Users can easily add custom utilities following same patterns
- **Well-Documented**: Comprehensive docs and examples

## Technical Details

### Generator Fix
Fixed `generator/generate.py` to handle `typing.Optional` type hints, which was preventing regeneration of CDP bindings with newer Python versions.

### CDP Bindings Regenerated
Regenerated all CDP binding code to be compatible with `chrome-devtools-protocol==0.4.0`, resolving import errors with the generated code.

## Testing & Validation

1. **Unit Tests**: Comprehensive test suite in `tests/test_util.py`
2. **Validation Script**: `validate_utilities.py` verifies all classes and methods exist
3. **Code Quality**: Passed CodeQL security scan with 0 alerts
4. **Examples**: Two working examples demonstrate real-world usage

## Usage Example

Here's a complete example showing the utilities in action:

```python
import trio
from trio_cdp import open_cdp, page, target
from trio_cdp.util import query_selector, wait_for_selector, Keyboard

async def automate_form(cdp_url):
async with open_cdp(cdp_url) as conn:
# Get a target
targets = await target.get_targets()
target_id = targets[0].target_id

async with conn.open_session(target_id) as session:
# Navigate
await page.enable()
await page.navigate('https://example.com/form')

# Wait for and fill form
name_field = await wait_for_selector(session, 'input[name="name"]', timeout=10)
if name_field:
await name_field.type('John Doe')

# Use keyboard for submission
keyboard = Keyboard(session)
await keyboard.press('Enter')
```

## Future Enhancements

Potential additions that maintain the same design philosophy:

1. **Page utilities**: Screenshot helpers, PDF generation utilities
2. **Network utilities**: Request interception helpers, mock response utilities
3. **Cookie utilities**: Easy cookie management
4. **Dialog utilities**: Alert/prompt/confirm handlers
5. **File upload**: File chooser utilities

Each would follow the same pattern: lightweight wrappers around CDP commands with convenient async interfaces.

## Conclusion

This implementation successfully extends `trio-chrome-devtools-protocol` with higher-level utilities while maintaining the library's core principles of being lightweight, pure-CDP, and Trio-native. The utilities provide a more intuitive interface for common automation tasks without sacrificing the power and flexibility of the underlying CDP protocol.
6 changes: 6 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Changelog
=========

Unreleased
----------

* Add ``find_chrome_debugger_url()`` function for programmatic discovery of Chrome's WebSocket URL.
* ``open_cdp()`` now accepts HTTP URLs (e.g., ``http://localhost:9222``) which are automatically resolved to WebSocket URLs.

0.6.0
-----

Expand Down
110 changes: 107 additions & 3 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,53 @@ Getting Started

.. highlight:: python

Connecting to Chrome
--------------------

Trio CDP provides flexible ways to connect to a Chrome browser (or any browser that
supports the Chrome DevTools Protocol).

Starting Chrome with Remote Debugging
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First, start Chrome with remote debugging enabled::

$ chrome --remote-debugging-port=9222

You can use any port number you prefer. Chrome will display the debugging URL in the
console when it starts.

Connecting Programmatically
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The simplest way to connect is by using an HTTP URL::

from trio_cdp import open_cdp

async with open_cdp('http://localhost:9222') as conn:
# Your code here
...

The library will automatically discover the WebSocket URL from Chrome's HTTP endpoint.

Alternatively, you can use the discovery function explicitly::

from trio_cdp import find_chrome_debugger_url, open_cdp

# Discover the WebSocket URL
browser_url = find_chrome_debugger_url(port=9222)

async with open_cdp(browser_url) as conn:
...

You can also provide a WebSocket URL directly if you already have it::

async with open_cdp('ws://localhost:9222/devtools/browser/...') as conn:
...

Basic Example
-------------

The following example shows how to connect to browser, navigate to a specified web page,
and then extract the page title.

Expand Down Expand Up @@ -120,6 +167,63 @@ we get the outer HTML of the node. This snippet shows some new APIs, but the
mechanics of sending commands and getting responses are the same as the previous
snippets.

A more complete version of this example can be found in ``examples/get_title.py`` in
the repository. There is also a screenshot example in ``examples/screenshot.py``. The
unit tests in ``tests/`` also provide some helpful sample code.
Listening to Events
-------------------

Trio CDP provides two patterns for handling browser events:

Using ``wait_for()``
~~~~~~~~~~~~~~~~~~~~

The ``wait_for()`` method is useful when you need to wait for a single event before
continuing execution. We've already seen this in the navigation example above, where
we wait for ``page.LoadEventFired``. Here's the pattern:

.. code::

async with session.wait_for(page.LoadEventFired) as event_proxy:
# Trigger an action that will cause the event
await page.navigate(url='https://example.com')
# After the context exits, event_proxy.value contains the event
print(f"Page loaded at timestamp: {event_proxy.value.timestamp}")

Using ``listen()``
~~~~~~~~~~~~~~~~~~

The ``listen()`` method returns an async iterator that continuously yields events as
they occur. This is useful for monitoring ongoing activity, such as network requests:

.. code::

# Enable network events
await network.enable()

# Listen for network events
async for event in session.listen(
network.RequestWillBeSent,
network.ResponseReceived
):
if isinstance(event, network.RequestWillBeSent):
print(f"Request: {event.request.url}")
elif isinstance(event, network.ResponseReceived):
print(f"Response: {event.response.url} (status: {event.response.status})")

You can listen to multiple event types at once by passing them all to ``listen()``.
The iterator will yield events of any of the specified types as they occur.

**Important:** Don't forget to enable events for the domain you're interested in!
For example, call ``await network.enable()`` before listening to network events,
or ``await page.enable()`` before listening to page events. You can also use the
context managers ``session.page_enable()`` or ``session.dom_enable()`` for automatic
cleanup.

Examples
--------

A more complete version of the basic example can be found in ``examples/get_title.py`` in
the repository. There are additional examples showing:

- ``examples/screenshot.py`` - Taking screenshots of web pages
- ``examples/network_events.py`` - Monitoring network events using both ``wait_for()`` and ``listen()``

The unit tests in ``tests/`` also provide helpful sample code.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ responses, and events over a single connection.
installation
getting_started
api
utilities
changelog
Loading