feat: Add feature to enable dynamic ec2 config via workflow labels#5003
feat: Add feature to enable dynamic ec2 config via workflow labels#5003edersonbrilhante wants to merge 36 commits intogithub-aws-runners:mainfrom
Conversation
|
@edersonbrilhante great to see this PR. |
5dab5e4 to
afdfb36
Compare
|
This is a really interesting feature! Just one suggestion from my side: would it be possible to support a whitelist of allowed instance types? Also, it could be really powerful to have some kind of feature-flag / policy control over which parts of the configuration are allowed to be dynamic. For example, in my org we don’t want developers to be able to select arbitrary AMIs (only a pre-approved set), but it would be awesome to still allow them to choose the instance type for workflow jobs, as long as it’s constrained to an allowed list. Maybe the "feature flag" is not even necessary as long as we could define the "allowed values" for each configuration, with this we could list only the pre-approved AMIs. |
|
@andrecastro I liked and makes a lot of sense to me. I just need more time to think about the implementation. And tbis PR is already big enough XD. I could create a following for adding this restricted values feature |
stuartp44
left a comment
There was a problem hiding this comment.
I am happy to approve, but I do have a statement about incorrect labels and the effect on the process.
| }), | ||
| }); | ||
| }); | ||
|
|
There was a problem hiding this comment.
I think these tests work with all good values, but because we are in the user space, what about bad values and their effects? Is it maybe worth extending the tests to not trust the user data? I am not sure how the behaviour will be if someone makes a mistake, does it take the whole batch/process out?
There was a problem hiding this comment.
An example would be m5.large, could be m5,large. How will this change behaviour?
There was a problem hiding this comment.
The payload will be sent to aws, and aws will validate it. Example
- self-hosted
- x64
- type:small
- ghr-ec2-instance-type:m5,xlarge
{
"level": "WARN",
"message": "Create fleet failed, error not recognized as scaling error.",
"timestamp": "2026-03-09T17:24:59.008Z",
"service": "test-small-scale-up",
"sampling_rate": 0,
"xray_trace_id": "REDACTED_XRAY_TRACE_ID",
"region": "eu-west-1",
"environment": "test-small",
"module": "runners",
"aws-request-id": "REDACTED_AWS_REQUEST_ID",
"function-name": "test-small-scale-up",
"runner": {
"ephemeral": true,
"type": "Org",
"namePrefix": "",
"n_events": 1
},
"data": [
{
"LaunchTemplateAndOverrides": {
"LaunchTemplateSpecification": {
"LaunchTemplateId": "REDACTED_LAUNCH_TEMPLATE_ID",
"Version": "22"
},
"Overrides": {
"InstanceType": "m5,xlarge",
"SubnetId": "REDACTED_SUBNET_ID_1"
}
},
"Lifecycle": "on-demand",
"ErrorCode": "InvalidFleetConfiguration",
"ErrorMessage": "Your requested instance type (m5,xlarge) is not supported in your requested Availability Zone (eu-west-1b)."
},
{
"LaunchTemplateAndOverrides": {
"LaunchTemplateSpecification": {
"LaunchTemplateId": "REDACTED_LAUNCH_TEMPLATE_ID",
"Version": "22"
},
"Overrides": {
"InstanceType": "m5,xlarge",
"SubnetId": "REDACTED_SUBNET_ID_2"
}
},
"Lifecycle": "on-demand",
"ErrorCode": "InvalidFleetConfiguration",
"ErrorMessage": "Your requested instance type (m5,xlarge) is not supported in your requested Availability Zone (eu-west-1a)."
}
]
}
|
I also agree with what was previously mentioned; we probably need a safelist, as we don't want lateral movement when a compromised pipeline is used, especially with the VPC setting. Maybe worth some "allowed_instance_type" setting or something to that effect that can be checked against, and if not in the list, ignored. |
afdfb36 to
0e4f870
Compare
|
@stuartp44 We can add the safelist, but I think we need a deeper discussion about the implementation |
aae08eb to
750a9ff
Compare
150aa90 to
efd97df
Compare
efd97df to
e528373
Compare
ad2f612 to
e67a37e
Compare
|
@edersonbrilhante tried to test the feature but so far not got it wokring Used labels: |
|
Can you print the logs from dispatch-to-runner? Check if the labels were accepted or not. |
npalm
left a comment
There was a problem hiding this comment.
Great feature. Maybe we can add a small saftety net in the disspachter. A clear warning in the docs. And refer in the variable where users can enable to the risk in the docs.
| ): boolean { | ||
| // Filter out ghr- and ghr-run- labels only if dynamic labels config is enabled | ||
| const filteredLabels = enableDynamicLabels | ||
| ? workflowJobLabels.filter((label) => !label.startsWith('ghr-')) |
There was a problem hiding this comment.
This is a new convention. And I would prefer we add a comment to the variables that labels starting with ghr- will be ingored (main and multi-runenr)
There was a problem hiding this comment.
I added comment in the tf variables
There was a problem hiding this comment.
I th9ink we can add at least a small safety net here, like:
function sanitizeGhrLabels(labels: string[]): string[] {
const GHR_LABEL_MAX_LENGTH = 128;
const GHR_LABEL_VALUE_PATTERN = /^[a-zA-Z0-9._\-:\/]+$/;
return labels.map((label) => {
if (!label.startsWith('ghr-')) return label;
if (label.length > GHR_LABEL_MAX_LENGTH) {
logger.warn('Dynamic label exceeds max length, stripping', { label: label.substring(0, 40) });
return null;
}
if (!GHR_LABEL_VALUE_PATTERN.test(label)) {
logger.warn('Dynamic label contains invalid characters, stripping', { label });
return null;
}
return label;
}).filter((l): l is string => l !== null);
}
next someting like
const sanitizedLabels = enableDynamicLabels
? sanitizeGhrLabels(body.workflow_job.labels)
: body.workflow_job.labels;
There was a problem hiding this comment.
Good point. I added it.
Co-authored-by: Niek Palm <npalm@users.noreply.github.com>
Summary
This PR resumes and completes the work started in #4529.
It also allows to use any other dynamic labels with prefix
ghr-. Giving support for unique labels per job or per group of jobsIt ensures that EC2-specific config can be defined via
run-onsHow to test:
Use your regular labels, and add ghr-ec2-instance-type and ghr-ec2-image-id
In this case:
<regular-labels>