New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/utils/image: Support a single repository #93510
test/utils/image: Support a single repository #93510
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
18194c4
to
1fb7931
Compare
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor for adding this feature! small question on hashing part 😄
// getRepositoryMappedConfig maps an existing image to the provided repo, generating a | ||
// tag that is unique with the input config. The tag will contain the index, a hash of | ||
// the image spec (to be unique) and shorten and make the pull spec "safe" so it will | ||
// fit in the tag to allow a human to recognize the value. If index is -1, then no | ||
// index will be added to the tag. | ||
func getRepositoryMappedConfig(index int, config Config, repo string) Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand this use case -- is it because image tags are mutable and we want to make sure it's consistent through hash?
alternatively, I'm not sure if I follow the what scenario is implied with "safe" in the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since tags have character limits that are shorter than the limit of a full image reference, the hash is to dedup similar images with very long names from one registry. I.e. something like:
docker.io/my-very-long-repo-name/very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-long:lots_of_characters_in_tags
docker.io/my-very-long-repo-name/very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-long:lots_of_characters_in_tags_2
needs to be able to fit into the character limit of an image tag, but we want to be able to guarantee that if you had two of these you could collapse them into a tag in the same repo (shorten the pull_spec, add the hash that uniquifies the rest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I missed this part of PR description:
k8s.gcr.io/prometheus-to-sd:v0.5.0
toquay.io/openshift/community-e2e-images:e2e-30-k8s-gcr-io-prometheus-to-sd-v0-5-0-6JI59Yih4oaj3oQOjRfhyQ
I just realized that all images are defined with the same name (i.e. community-e2e-images
) but with different tags -- that part doesn't seem very intuitive to me. Am I missing something from this proposed approach vs having each image be its own entry in the repository? i.e. quay.io/openshift-k8s-e2e/prometheus-to-sd:v0.5.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly simplification - you may not own 50 different repositories (like on docker.io you couldn't mirror the same way). So then if you have a choice between one repo vs multiple - having one repo is more flexibility if you have quota or cost limits (i.e. on docker.io can't have 30 private repos without paying). So really trying to simplify it all down as much as possible to be most flexible for both people who have cloud provider registries, or docker.io accounts, or quay accounts, or maybe even on prem where you are only allowed to touch one repo in your artifactory server.
Also, think about if you're testing this. If you want to sync for 1.20, and for 1.21, you might actually want to test it first. To test it, you want a way to catch "oops". If you're mirroring into individual repos, then your oops factor could be much higher. If each mirror is one repo (and the code is designed to ensure the same image ends up in the same spot) then you can share AND separate. Once you've tested you can delete that repo - that's easier to do than 50 different repos. A repo only having one type of image is just a convention, it's not a hard rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating. The scenario of mirroring to docker.io isn't something that I considered nor has a use case for. My perspective on supporting this repository override is to make it easy to run conformance on airgapped clusters, which typically is locked to self-hosted container registries, making the cost to host individual image repository somewhat negligible (vs SaaS registry).
I do see value in minimizing "oops" factor though -- I agree it'd be a good idea to have such system!
@smarterclayton @wilsonehusin do we want to pick this back up and get to a consensus? |
@dims yes I'd love to! I'm still trying to understand the use case for the hashing part, awaiting @smarterclayton Generally I'm okay with it, just that I'm wondering if it's a solution that is still relevant today or not |
To really fill this out, I think a second part would be to add a simple stub command that wrote out FROM TO statements that could be used with image tooling (like docker, skopeo, etc etc etc). I would also want to write up docs for "how you run e2e tests from a private mirror without access to public internet". |
In downstream contexts, it's extremely useful to be able to combine all the "testable" images in Kubernetes into a single repo so that a user could mirror these offline in one chunk, and audit the set of images for changes. For instance, within OpenShift we would like to have a single place we can place all the images used by all the tests with a single authentication scheme. While some images are not "real" and can't be mirrored (for instance, the images that point to an auth protected registry), that is not the majority. This code makes it possible to specify an environment variable KUBE_TEST_REPO that maps the static strings of the registry to a single repository by placing the uniqueness in a tag. For instance: KUBE_TEST_REPO=quay.io/openshift/community-e2e-images would translate `k8s.gcr.io/prometheus-to-sd:v0.5.0` to `quay.io/openshift/community-e2e-images:e2e-30-k8s-gcr-io-prometheus-to-sd-v0-5-0-6JI59Yih4oaj3oQOjRfhyQ`. The tag is a safe form of the name, plus the index (the constant within manifest.go), plus a hash of the full input. The length of the tag is constrained to the minimum of hash + index + the safe name. The public method is changed to return two maps - index to original name and index to test repo name. These maps would be the same if the env var is not set.
1fb7931
to
386f94f
Compare
I can do the stub command in this PR or separate - up to you two (if the argument I make about single repo being compelling, obviously) |
The FROM TO statements seem nice to have, but this seem to be useful by itself. I'm leaning towards having this merged in and iterate on the FROM TO statements afterwards, but I don't feel strongly on either approach. /lgtm |
/kind bug |
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Review the full test history for this PR. Silence the bot with an |
In downstream contexts, it's extremely useful to be able to combine all the "testable" images in Kubernetes into a single repo so that a user could mirror these offline in one chunk, and audit the set of images for changes. For instance, within OpenShift we would like to have a single place we can place all the images used by all the tests with a single authentication scheme. While some images are not "real" and can't be mirrored (for instance, the images that point to an auth protected registry), that is not the majority.
This code makes it possible to specify an environment variable KUBE_TEST_REPO that maps the static strings of the registry to a single repository by placing the uniqueness in a tag. For instance:
would translate
k8s.gcr.io/prometheus-to-sd:v0.5.0
toquay.io/openshift/community-e2e-images:e2e-30-k8s-gcr-io-prometheus-to-sd-v0-5-0-6JI59Yih4oaj3oQOjRfhyQ
. When running the e2e tests with this variable set all code that uses the images will pull from the later location.The tag is a safe form of the name, plus the index (the constant within manifest.go), plus a hash of the full input. The length of the tag is constrained to the minimum of hash + index + the safe name. Since we will continue to add new images, there should be no assumption that the tag creation cannot evolve over time - someone downstream will need to use a process to map the source images to destination images no matter what, and that is a mechanical transformation that should be independent of the tag value. Old tags should definitely be preserved for older versions of code, but there is no hard requirement that this have stability across code versions.
The public method is changed to return two maps - index to original name and index to test repo name. These maps would be the same if the env var is not set. A consumer that wished to build that mapping (not implemented here, but fairly easy) is to read the map and output the transformation in a form that could be scripted by users using image mirroring command line tools.
Follow up