Skip to content

Commit 3e772e9

Browse files
authored
docs: gdb support (#1222)
1 parent c6c4684 commit 3e772e9

12 files changed

Lines changed: 443 additions & 2 deletions

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ The following table lists the connection properties used with the AWS Advanced P
8888
| `topology_refresh_ms` | [Driver Parameters](docs/using-the-python-wrapper/UsingThePythonWrapper.md#aws-advanced-python-wrapper-parameters) |
8989
| `cluster_id` | [Driver Parameters](docs/using-the-python-wrapper/UsingThePythonWrapper.md#aws-advanced-python-wrapper-parameters) |
9090
| `cluster_instance_host_pattern` | [Driver Parameters](docs/using-the-python-wrapper/UsingThePythonWrapper.md#aws-advanced-python-wrapper-parameters) |
91+
| `global_cluster_instance_host_patterns` | [Failover v2 Plugin](docs/using-the-python-wrapper/UsingThePythonWrapper.md#aws-advanced-python-wrapper-parameters) |
9192
| `wrapper_dialect` | [Dialects](docs/using-the-python-wrapper/DatabaseDialects.md), and whether you should include it. |
9293
| `wrapper_driver_dialect` | [Driver Dialect](./docs/using-the-python-wrapper/DriverDialects.md), and whether you should include it. |
9394
| `plugins` | [Connection Plugin Manager](docs/using-the-python-wrapper/UsingThePythonWrapper.md#connection-plugin-manager-parameters) |

docs/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
- [Session State](using-the-python-wrapper/SessionState.md)
99
- [Database Dialects](using-the-python-wrapper/DatabaseDialects.md)
1010
- [Driver Dialects](using-the-python-wrapper/DriverDialects.md)
11+
- [Cluster ID](using-the-python-wrapper/ClusterId.md)
12+
- [Aurora Global Databases](using-the-python-wrapper/GlobalDatabases.md)
1113
- [Telemetry](using-the-python-wrapper/Telemetry.md)
1214
- [Plugins](using-the-python-wrapper/UsingThePythonWrapper.md#plugins)
1315
- [Failover Plugin](using-the-python-wrapper/using-plugins/UsingTheFailoverPlugin.md)
111 KB
Loading
131 KB
Loading
Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
# Understanding the cluster_id Parameter
2+
3+
## Overview
4+
5+
The `cluster_id` parameter is a critical configuration setting when using the AWS Advanced Python Wrapper to **connect to multiple database clusters within a single application**. This parameter serves as a unique identifier that enables the wrapper to maintain separate caches and state for each distinct database cluster your application connects to.
6+
7+
## What is a Cluster?
8+
9+
Understanding what constitutes a cluster is crucial for correctly setting the `cluster_id` parameter. In the context of the AWS Advanced Python Wrapper, a **cluster** is a logical grouping of database instances that should share the same topology cache and monitoring services.
10+
11+
A cluster represents one writer instance (primary) and zero or more reader instances (replicas). These make up shared topology that the wrapper needs to track, and are the group of instances the wrapper can reconnect to when a failover is detected.
12+
13+
### Examples of Clusters
14+
15+
- Aurora DB Cluster (one writer + multiple readers)
16+
- RDS Multi-AZ DB Cluster (one writer + two readers)
17+
- Aurora Global Database (when supplying a global db endpoint, the wrapper considers them as a single cluster)
18+
19+
> **Rule of thumb:** If the wrapper should track separate topology information and perform independent failover operations, use different `cluster_id` values. If instances share the same topology and failover domain, use the same `cluster_id`.
20+
21+
## Why cluster_id is Important
22+
23+
The AWS Advanced Python Wrapper uses the `cluster_id` as a **key for internal caching mechanisms** to optimize performance and maintain cluster-specific state. Without proper `cluster_id` configuration, your application may experience:
24+
25+
- Cache collisions between different clusters
26+
- Incorrect topology information
27+
- Degraded performance due to cache invalidation
28+
29+
## Why Not Use AWS DB Cluster Identifiers?
30+
31+
Host information can take many forms:
32+
33+
- **IP Address Connections:** `10.0.1.50` ← No cluster info!
34+
- **Custom Domain Names:** `db.mycompany.com` ← Custom domain
35+
- **Custom Endpoints:** `my-custom-endpoint.cluster-custom-abc.us-east-1.rds.amazonaws.com` ← Custom endpoint
36+
- **Proxy Connections:** `my-proxy.proxy-abc.us-east-1.rds.amazonaws.com` ← Proxy, not actual cluster
37+
38+
In fact, all of these could reference the exact same cluster. Therefore, because the wrapper cannot reliably parse cluster information from all connection types, **it is up to the user to explicitly provide the `cluster_id`**.
39+
40+
## How cluster_id is Used Internally
41+
42+
The wrapper uses `cluster_id` as a cache key for topology information and monitoring services. This enables multiple connections to the same cluster to share cached data and avoid redundant db meta-data.
43+
44+
### Example: Single Cluster with Multiple Connections
45+
46+
The following diagram shows how connections with the same `cluster_id` share cached resources:
47+
48+
![Single Cluster Example](../images/cluster_id_one_cluster_example.png)
49+
50+
**Key Points:**
51+
- Three connections use different connection strings (custom endpoint, IP address, cluster endpoint) but all specify **`cluster_id="foo"`**
52+
- All three connections share the same Topology Cache and Monitor Threads in the wrapper
53+
- The Topology Cache stores a key-value mapping where `"foo"` maps to `["instance-1", "instance-2", "instance-3"]`
54+
- Despite different connection URLs, all connections monitor and query the same physical database cluster
55+
56+
**The Impact:**
57+
Shared resources eliminate redundant topology queries and reduce monitoring overhead.
58+
59+
### Example: Multiple Clusters with Separate Cache Isolation
60+
61+
The following diagram shows how different `cluster_id` values maintain separate caches for different clusters.
62+
63+
![Two Cluster Example](../images/cluster_id_two_cluster_example.png)
64+
65+
**Key Points:**
66+
- Connection 1 and 3 use **`cluster_id="foo"`** and share the same cache entries
67+
- Connection 2 uses **`cluster_id="bar"`** and has completely separate cache entries
68+
- Each `cluster_id` acts as a key in the cache dictionary
69+
- Topology Cache maintains separate entries: `"foo"``[instance-1, instance-2, instance-3]` and `"bar"``[instance-4, instance-5]`
70+
- Monitor Cache maintains separate monitor threads for each cluster
71+
- Monitors poll their respective database clusters and update the corresponding topology cache entries
72+
73+
**The Impact:**
74+
This isolation prevents cache collisions and ensures correct failover behavior for each cluster.
75+
76+
## When to Specify cluster_id
77+
78+
### Required: Multiple Clusters in One Application
79+
80+
You **must** specify a unique `cluster_id` for every DB cluster when your application connects to multiple database clusters:
81+
82+
```python
83+
from aws_advanced_python_wrapper import AwsWrapperConnection
84+
from psycopg import Connection
85+
86+
# Source cluster connection
87+
with AwsWrapperConnection.connect(
88+
Connection.connect,
89+
"host=source-db.us-east-1.rds.amazonaws.com dbname=mydb user=admin password=pwd",
90+
cluster_id="source-cluster",
91+
autocommit=True
92+
) as source_conn:
93+
source_cursor = source_conn.cursor()
94+
source_cursor.execute("SELECT * FROM users")
95+
rows = source_cursor.fetchall()
96+
97+
# Destination cluster connection - different cluster_id!
98+
with AwsWrapperConnection.connect(
99+
Connection.connect,
100+
"host=dest-db.us-west-2.rds.amazonaws.com dbname=mydb user=admin password=pwd",
101+
cluster_id="destination-cluster",
102+
autocommit=True
103+
) as dest_conn:
104+
dest_cursor = dest_conn.cursor()
105+
# ... migration logic
106+
107+
# Connecting to source-db via IP - same cluster_id as source_conn
108+
with AwsWrapperConnection.connect(
109+
Connection.connect,
110+
"host=10.0.0.1 dbname=mydb user=admin password=pwd",
111+
cluster_id="source-cluster",
112+
autocommit=True
113+
) as source_ip_conn:
114+
pass
115+
```
116+
117+
### Optional: Single Cluster Applications
118+
119+
If your application only connects to one cluster, you can omit `cluster_id` (defaults to `"1"`):
120+
121+
```python
122+
from aws_advanced_python_wrapper import AwsWrapperConnection
123+
from psycopg import Connection
124+
125+
# Single cluster - cluster_id defaults to "1"
126+
with AwsWrapperConnection.connect(
127+
Connection.connect,
128+
"host=my-cluster.us-east-1.rds.amazonaws.com dbname=mydb user=admin password=pwd",
129+
autocommit=True
130+
) as conn:
131+
cursor = conn.cursor()
132+
cursor.execute("SELECT 1")
133+
```
134+
135+
This also includes if you have multiple connections using different host information:
136+
137+
```python
138+
# cluster_id defaults to "1"
139+
with AwsWrapperConnection.connect(
140+
Connection.connect,
141+
"host=my-cluster.us-east-1.rds.amazonaws.com dbname=mydb user=admin password=pwd",
142+
autocommit=True
143+
) as url_conn:
144+
pass
145+
146+
# "10.0.0.1" -> IP address of my-cluster. Same cluster, so default cluster_id "1" is fine.
147+
with AwsWrapperConnection.connect(
148+
Connection.connect,
149+
"host=10.0.0.1 dbname=mydb user=admin password=pwd",
150+
autocommit=True
151+
) as ip_conn:
152+
pass
153+
```
154+
155+
## Critical Warnings
156+
157+
### NEVER Share cluster_id Between Different Clusters
158+
159+
Using the same `cluster_id` for different database clusters will cause serious issues:
160+
161+
```python
162+
# ❌ WRONG - Same cluster_id for different clusters
163+
source_conn = AwsWrapperConnection.connect(
164+
Connection.connect,
165+
"host=source-db.us-east-1.rds.amazonaws.com dbname=db user=admin password=pwd",
166+
cluster_id="shared-id" # ← BAD!
167+
)
168+
169+
dest_conn = AwsWrapperConnection.connect(
170+
Connection.connect,
171+
"host=dest-db.us-west-2.rds.amazonaws.com dbname=db user=admin password=pwd",
172+
cluster_id="shared-id" # ← BAD! Same ID for different cluster
173+
)
174+
```
175+
176+
**Problems this causes:**
177+
- Topology cache collision (dest-db's topology could overwrite source-db's)
178+
- Incorrect failover behavior (wrapper may try to failover to wrong cluster)
179+
- Monitor conflicts (Only one monitor instance for both clusters will lead to undefined results)
180+
181+
**Correct approach:**
182+
```python
183+
# ✅ CORRECT - Unique cluster_id for each cluster
184+
source_conn = AwsWrapperConnection.connect(
185+
Connection.connect,
186+
"host=source-db.us-east-1.rds.amazonaws.com dbname=db user=admin password=pwd",
187+
cluster_id="source-cluster"
188+
)
189+
190+
dest_conn = AwsWrapperConnection.connect(
191+
Connection.connect,
192+
"host=dest-db.us-west-2.rds.amazonaws.com dbname=db user=admin password=pwd",
193+
cluster_id="destination-cluster"
194+
)
195+
```
196+
197+
### Always Use Same cluster_id for Same Cluster
198+
199+
Using different `cluster_id` values for the same cluster reduces efficiency:
200+
201+
```python
202+
# ⚠️ SUBOPTIMAL - Different cluster_ids for same cluster
203+
conn1 = AwsWrapperConnection.connect(
204+
Connection.connect,
205+
"host=my-cluster.us-east-1.rds.amazonaws.com dbname=db user=admin password=pwd",
206+
cluster_id="my-cluster-1"
207+
)
208+
209+
conn2 = AwsWrapperConnection.connect(
210+
Connection.connect,
211+
"host=my-cluster.us-east-1.rds.amazonaws.com dbname=db user=admin password=pwd",
212+
cluster_id="my-cluster-2" # Different ID for same cluster
213+
)
214+
```
215+
216+
**Problems this causes:**
217+
- Duplication of caches
218+
- Multiple monitoring threads for the same cluster
219+
220+
**Best practice:**
221+
```python
222+
# ✅ BEST - Same cluster_id for same cluster
223+
CLUSTER_ID = "my-cluster"
224+
225+
conn1 = AwsWrapperConnection.connect(
226+
Connection.connect,
227+
"host=my-cluster.us-east-1.rds.amazonaws.com dbname=db user=admin password=pwd",
228+
cluster_id=CLUSTER_ID
229+
)
230+
231+
conn2 = AwsWrapperConnection.connect(
232+
Connection.connect,
233+
"host=my-cluster.us-east-1.rds.amazonaws.com dbname=db user=admin password=pwd",
234+
cluster_id=CLUSTER_ID # Shared cache and resources
235+
)
236+
```
237+
238+
## Summary
239+
240+
The `cluster_id` parameter is essential for applications connecting to multiple database clusters. It serves as a cache key for topology information and monitoring services. Always use unique `cluster_id` values for different clusters, and consistent values for the same cluster to maximize performance and avoid conflicts.

docs/using-the-python-wrapper/DatabaseDialects.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,12 @@ Dialect codes specify what kind of database any connections will be made to.
2323
| Dialect Code Reference | Value | Database |
2424
|------------------------------|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
2525
| `AURORA_MYSQL` | `aurora-mysql` | Aurora MySQL |
26+
| `GLOBAL_AURORA_MYSQL` | `global-aurora-mysql` | [Aurora Global Database MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-getting-started.html) |
2627
| `RDS_MULTI_AZ_MYSQL_CLUSTER` | `rds-multi-az-mysql-cluster` | [Amazon RDS MySQL Multi-AZ DB Cluster Deployments](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/multi-az-db-clusters-concepts.html) |
2728
| `RDS_MYSQL` | `rds-mysql` | Amazon RDS MySQL |
2829
| `MYSQL` | `mysql` | MySQL |
2930
| `AURORA_PG` | `aurora-pg` | Aurora PostgreSQL |
31+
| `GLOBAL_AURORA_PG` | `global-aurora-pg` | [Aurora Global Database PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-getting-started.html) |
3032
| `RDS_MULTI_AZ_PG_CLUSTER` | `rds-multi-az-pg-cluster` | [Amazon RDS PostgreSQL Multi-AZ DB Cluster Deployments](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/multi-az-db-clusters-concepts.html) |
3133
| `RDS_PG` | `rds-pg` | Amazon RDS PostgreSQL |
3234
| `PG` | `pg` | PostgreSQL |

0 commit comments

Comments
 (0)