44![ Python Versions] ( https://img.shields.io/pypi/pyversions/trio-chrome-devtools-protocol )
55![ MIT License] ( https://img.shields.io/github/license/HyperionGray/trio-chrome-devtools-protocol.svg )
66[ ![ Build Status] ( https://img.shields.io/travis/com/HyperionGray/trio-chrome-devtools-protocol.svg?branch=master )] ( https://travis-ci.com/HyperionGray/trio-chrome-devtools-protocol )
7+ [ ![ Read the Docs] ( https://img.shields.io/readthedocs/trio-cdp.svg )] ( https://trio-cdp.readthedocs.io )
78
89This Python library performs remote control of any web browser that implements
910the Chrome DevTools Protocol. It is built using the type wrappers in
@@ -12,180 +13,32 @@ I/O using [Trio](https://trio.readthedocs.io/). This library handles the
1213WebSocket negotiation and session management, allowing you to transparently
1314multiplex commands, responses, and events over a single connection.
1415
15- The example demonstrates the salient features of the library.
16+ The example below demonstrates the salient features of the library by navigating to a
17+ web page and extracting the document title.
1618
1719``` python
20+ from trio_cdp import open_cdp, page, dom
21+
1822async with open_cdp(cdp_url) as conn:
1923 # Find the first available target (usually a browser tab).
20- targets = await conn.execute( target.get_targets() )
24+ targets = await target.get_targets()
2125 target_id = targets[0 ].id
2226
2327 # Create a new session with the chosen target.
24- session = await conn.open_session(target_id)
25-
26- # Navigate to a website.
27- await session.execute(page.enable())
28- async with session.wait_for(page.LoadEventFired):
29- await session.execute(page.navigate(target_url))
30-
31- # Extract the page title.
32- root_node = await session.execute(dom.get_document())
33- title_node_id = await session.execute(dom.query_selector(root_node.node_id,
34- ' title' ))
35- html = await session.execute(dom.get_outer_html(title_node_id))
36- print (html)
37- ```
38-
39- We'll go through this example bit by bit. First, it starts with a context
40- manager:
41-
42- ``` python
43- async with open_cdp(cdp_url) as conn:
44- ```
45-
46- This context manager opens a connection to the browser when the block is entered
47- and closes the connection automatically when the block exits. Now we have a
48- connection to the browser, but the browser has multiple targets that can be
49- operated independently. For example, each browser tab is a separate target. In
50- order to interact with one of them, we have to create a session for it.
51-
52- ``` python
53- targets = await conn.execute(target.get_targets())
54- target_id = targets[0 ].id
55- ```
56-
57- The first line here executes the ` get_targets() ` command in the browser. Note
58- the form of the command: ` await conn.execute(...) ` will send a command to the
59- browser, parse its response, and return a value (if any). The command is one of
60- the methods in the PyCDP package. Trio CDP multiplexes commands and responses on
61- a single connection, so we can send commands concurrently if we want, and the
62- responses will be routed back to the correct task.
63-
64- In this case, the command is ` target.get_targets() ` , which returns a list of
65- ` TargetInfo ` objects. We grab the first object and extract its ` target_id ` .
66-
67- ``` python
68- session = await conn.open_session(target_id)
69- ```
70-
71- In order to connect to a target, we open a session based on the target ID.
72-
73- ``` python
74- await session.execute(page.enable())
75- async with session.wait_for(page.LoadEventFired):
76- await session.execute(page.navigate(target_url))
77- ```
78-
79- Here we use the session (remember, it corresponds to a tab in the browser) to
80- navigate to the target URL. Just like the connection object, the session object
81- has an ` execute(...) ` method that sends a command to the target, parses the
82- response, and returns a value (if any).
83-
84- This snippet also introduces another concept: events. When we ask the browser to
85- navigate to a URL, it acknowledges our request with a response, then starts the
86- navigation process. How do we know when the page is actually loaded, though?
87- Easy: the browser can send us an event!
88-
89- We first have to enable page-level events by calling ` page.enable() ` . Then we
90- use ` session.wait_for(...) ` to wait for an event of the desired type. In this
91- example, the script will suspend until it receives a ` page.LoadEventFired `
92- event. (After this block finishes executing, you can run ` page.disable() ` to
93- turn off page-level events if you want to save some bandwidth and processing
94- power, or you can use the context manager ` async with session.page_enable(): ... `
95- to automatically enable page-level events just for a specific block.)
96-
97- Note that we wait for the event inside an ` async with ` block, and we do this
98- _ before_ executing the command that will trigger this event. This order of
99- operations may be surprising, but it avoids race conditions. If we executed a
100- command and then tried to listen for an event, the browser might fire the event
101- very quickly before we have had a chance to set up our event listener, and then
102- we would miss it! The ` async with ` block sets up the listener before we run the
103- command, so that no matter how fast the event fires, we are guaranteed to catch
104- it.
105-
106- ``` python
107- root_node = await session.execute(dom.get_document())
108- title_node_id = await session.execute(
109- dom.query_selector(root_node.node_id, ' title' ))
110- html = await session.execute(dom.get_outer_html(title_node_id))
111- print (html)
112- ```
113-
114- The last part of the script navigates the DOM to find the ` <title> ` element.
115- First we get the document's root node, then we query for a CSS selector, then
116- we get the outer HTML of the node. This snippet shows some new APIs, but the
117- mechanics of sending commands and getting responses are the same as the previous
118- snippets.
28+ async with conn.open_session(target_id) as session:
11929
120- A more complete version of this example can be found in ` examples/get_title.py ` .
121- There is also a screenshot example in ` examples/screenshot.py ` . The unit tests
122- in ` tests/ ` also provide more examples.
30+ # Navigate to a website.
31+ async with session.page_enable()
32+ async with session.wait_for(page.LoadEventFired):
33+ await session.execute(page.navigate(target_url))
12334
124- To run the examples, you need a Chrome binary in your system. You can get one like this:
125-
126- ## Running Examples on MacOS
127-
128- ** Terminal 1**
129-
130- This sets up the chrome browser in a specific version, and runs it in debug mode with Tor proxy for network traffic.
131-
132- ```
133- wget https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Mac%2F678035%2Fchrome-mac.zip?generation=1563322360871926&alt=media
134- unzip chrome-mac.zip && rm chrome-mac.zip
135- ./chrome-mac/Chromium.app/Contents/MacOS/Chromium --remote-debugging-port=9000
136- > DevTools listening on ws://127.0.0.1:9000/devtools/browser/<DEV_SESSION_GUID>
137- ```
138-
139- ** Terminal 2**
140-
141- This runs the example browser automation script on the instantiated browser window.
142-
143- ``` bash
144- python examples/get_title.py ws://127.0.0.1:9000/devtools/browser/< DEV_SESSION_GUID> https://hyperiongray.com
145- ```
146-
147- ## Running Examples on Linux
148-
149- ** Terminal 1**
150-
151- This sets up the chrome browser in a specific version, and runs it in debug mode with Tor proxy for network traffic.
152-
153- ```
154- wget https://storage.googleapis.com/chromium-browser-snapshots/Linux_x64/678025/chrome-linux.zip
155- unzip chrome-linux.zip && rm chrome-linux.zip
156- ./chrome-linux/chrome --remote-debugging-port=9000
157- > DevTools listening on ws://127.0.0.1:9000/devtools/browser/<DEV_SESSION_GUID>
35+ # Extract the page title.
36+ root_node = await session.execute(dom.get_document())
37+ title_node_id = await session.execute(dom.query_selector(root_node.node_id,
38+ ' title' ))
39+ html = await session.execute(dom.get_outer_html(title_node_id))
40+ print (html)
15841```
15942
160- ** Terminal 2**
161-
162- This runs the example browser automation script on the instantiated browser window.
163-
164- ``` bash
165- python examples/get_title.py ws://127.0.0.1:9000/devtools/browser/< DEV_SESSION_GUID> https://hyperiongray.com
166- ```
167-
168- ## Changelog
169-
170- ### 0.5.0
171-
172- * ** Backwards Compability Break:** Rename ` open_cdp_connection() ` to ` open_cdp() ` .
173- * Fix ` ConnectionClosed ` bug.
174-
175- ### 0.4.0
176-
177- * Add support for passing in a nursery. (Supports usage in Jupyter notebook.)
178-
179- ### 0.3.0
180-
181- * New APIs for enabling DOM events and Page events.
182-
183- ### 0.2.0
184-
185- * Restructure event listeners.
186-
187- ### 0.1.0
188-
189- * Initial version
190-
191- <a href =" https://www.hyperiongray.com/?pk_campaign=github&pk_kwd=trio-cdp " ><img alt =" define hyperion gray " width =" 500px " src =" https://hyperiongray.s3.amazonaws.com/define-hg.svg " ></a >
43+ This example code is explained [ in the documentation] ( https://trio-cdp.readthedocs.io )
44+ and more example code can be found in the ` examples/ ` directory of this repository.
0 commit comments