New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix azure file migration issue #97877
fix azure file migration issue #97877
Conversation
/retest |
2 similar comments
/retest |
/retest |
This seems like pretty powerful permissions for a per-node entity to have. Generally, these kinds of operations are delegated to admins or controllers that can be restricted to more secure nodes. Are those options not possible here? @kubernetes/sig-storage-pr-reviews |
Hey Andy, I want to understand how does Azurefile uses secrets. From the https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/deploy/v0.10.0/rbac-csi-azurefile-node.yaml driver deployment, it seems it only have Is secret create a necessary only in CSI migration scenario? |
For Azure File CSI driver: only controller has @msau42 let's use this way to fix the secret authorizer issue in Azure File CSI migration, is that ok? |
Thanks for the clarification! I am wondering is the secret get/list permission also optional if CSI migration is not enabled? I thought secret is used only to get cloud-config. But apparently you can also use a json file instead of a secret? |
Does that mean the node driver needs permissions to be able to create secrets for migration? Considering that csi migration is needed for all existing clusters/workloads, I want to make sure we're not requiring existing users to relax their security posture. |
@Jiawei0227 there are two requirements for secret permission:
|
@msau42 No, node daemonset only has |
Ah ok, I'm misunderstanding this, can you explain it more detail when and where the secrets are created?
|
@msau42 In CreateVolume func, CSI driver controller would create a file share first, and then store account key in secret, and later on, CSI driver node would get that secret to get account key and then mount file share. Actually this is CSI driver behavior, not related to CSI migration, this PR did not use |
ReadOnly: azureSource.ReadOnly, | ||
VolumeAttributes: map[string]string{ | ||
azureFileShareName: azureSource.ShareName, | ||
secretNameField: azureSource.SecretName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the CSI driver on the node needs permissions to read secrets (and it can read any secrets)?
For better security posture, we have the CSI NodeStageSecretRefs so that kubelet and node authorizer can control access to secrets for node entities. So I think it's still better that we figure out how to get node authorizer to support migration.
cc @kubernetes/sig-auth-pr-reviews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Azure File CSI driver already supports this(has read secrets permission already), so shall we merge this PR first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR just does the in-tree PV translation, I think we could go on this PR first, and then we can use #96508 to track the NodeStageSecretRef
permission issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good security practice to give the csi node driver secret read permission, and from a CSI migration perspective is a regression of security requirements on the node (in-tree plugin didn't need broad secret read permissions).
So I think we should work on fixing the CSI driver first and doing the right thing for csi migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secret read permission is not required in Azure File CSI driver:
if it cannot read secret, node should have azure identity to access Azure account key through Azure API, and then it would leverage azure identity to read account key directly.
This PR just does the in-tree PV translation, it's up to CSI driver config that whether it has secret read permission or not on node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually wondering about Azure API security vs K8s node authorizer. The k8s node authorizer is able to restrict secret access to only pods scheduled to that node. Would Azure API be able to similarly restrict access if the node got compromised?
I suspect the issue with node authorizer is that it's looking for a real PV object that has the secret reference specified in it. Can we set the AzureFilePersistentVolumeSource.Secret when we create a volume for csi migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could revoke azure identity from VM if that node got compromised.
There is already CSI --> in-tree PV translation in CSI migration, while it does not work:
kubernetes/staging/src/k8s.io/csi-translation-lib/plugins/azure_file.go
Lines 180 to 181 in a098b5b
azureSource.ShareName = fileShareName | |
azureSource.SecretName = fmt.Sprintf(secretNameTemplate, storageAccount) |
I would suggest merge this fix first, and then check whether there is better way to leverage NodeStageSecretRef
, that issue was opened more than 2 months ago, need to make it work first, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the azure volume plugins and CSI translation should use/populate the secret name/namespace fields in the API objects for anything they want the node to be authorized to fetch:
For pods:
kubernetes/pkg/api/v1/pod/util.go
Line 133 in d52fe32
if len(source.AzureFile.SecretName) > 0 && !visitor(source.AzureFile.SecretName) { |
kubernetes/pkg/api/v1/pod/util.go
Line 177 in d52fe32
if source.CSI.NodePublishSecretRef != nil && !visitor(source.CSI.NodePublishSecretRef.Name) { |
For persistent volumes:
if len(source.AzureFile.SecretName) > 0 && !visitor(*source.AzureFile.SecretNamespace, source.AzureFile.SecretName, true /* kubeletVisible */) { |
if len(source.AzureFile.SecretName) > 0 && !visitor(getClaimRefNamespace(pv), source.AzureFile.SecretName, true /* kubeletVisible */) { |
if !visitor(source.CSI.NodePublishSecretRef.Namespace, source.CSI.NodePublishSecretRef.Name, true /* kubeletVisible */) { |
if !visitor(source.CSI.NodeStageSecretRef.Namespace, source.CSI.NodeStageSecretRef.Name, true /* kubeletVisible */) { |
Using unstructured volumeattributes to plumb secret name/namespace references is not supported by the node authorizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @liggitt, that's what we do in the beginning(without this PR), secretName
and secretNamespace
is already set (without this PR), while node authorizer does not work only in CSI migration scenario, detailed issue: #96508, this PR did a workaround, and works in CSI migration.
$ k get pv pvc-01bd1d0b-b965-4d37-8331-817a296c8998 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/azure-file-resource-group: andy-1191
pv.kubernetes.io/provisioned-by: file.csi.azure.com
spec:
accessModes:
- ReadWriteMany
azureFile:
secretName: azure-storage-account-f97cecd00f4d4433ebd6809-secret
secretNamespace: null
shareName: pvc-01bd1d0b-b965-4d37-8331-817a296c8998
capacity:
storage: 100Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: pvc-azurefile
namespace: test
persistentVolumeReclaimPolicy: Delete
storageClassName: azurefile-456-kubernetes.io-azure-file-dynamic-sc-4hv88
volumeMode: Filesystem
status:
phase: Bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node authorizer only permits access to secrets in the pod's namespace based on volume definitions in the pod. That is the case for both azureFile and csi volume definitions in a pod.
It looks like the azurefile volume plugin looks in the default namespace for secrets referenced in a pod's azureFile
volume. That means secret access was not permitted by the node authorizer even in non-CSI migration cases.
If non-CSI migration cases were working for pods outside the default namespace, something else must have been granting access to the node. The fix for the CSI-migration case would be to grant access the same way, not to stop using the API fields intended to communicate the name/namespace coordinates of secrets to use.
dd22491
to
ced88df
Compare
31dc489
to
f63a875
Compare
/retest |
SecretName: "azure-storage-account-st-secret", | ||
ShareName: "pvc-file-dynamic", | ||
ReadOnly: true, | ||
SecretName: "azure-storage-account-st-secret", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, could you also add unit tests where the share name gets parsed from the VolumeHandle?
@@ -163,31 +163,48 @@ func (t *azureFileCSITranslator) TranslateCSIPVToInTree(pv *v1.PersistentVolume) | |||
ReadOnly: csiSource.ReadOnly, | |||
} | |||
|
|||
for k, v := range csiSource.VolumeAttributes { | |||
switch strings.ToLower(k) { | |||
case shareNameField: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question for shareName. Will it always be identifiable from the volumeHandle?
It would be simpler to eliminate duplicate fields. Then we don't need to think about cases like if the 2 fields have different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also cases there volumeHandle
is arbitrary string, user input shareName by storage class or PV parameters, we also supports that. If no such input, will parse from volumeHandle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consolidate on the volumeHandle format? For example, if the shareName is a storageclass parameter, then the provisioned PV object will have the shareName in the volumeHandle and not the attributes. And for a preprovisioned PV case, the share name has to be provided in the volumeHandle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor fix comments fix annoation nil map add ut
f63a875
to
1df2fcf
Compare
/retest |
1 similar comment
/retest |
/lgtm Followup items would be to deprecate usage of duplicate fields in the CSI driver volumeAttributes. But for the purposes of migration, it needs to support both. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
…7877-upstream-release-1.20 Automated cherry pick of #97877: fix azure file migration issue
…7877-upstream-release-1.18 Automated cherry pick of #97877: fix azure file migration issue
…7877-upstream-release-1.19 Automated cherry pick of #97877: fix azure file migration issue
resourceGroup = rg | ||
} | ||
|
||
if azureSource.SecretNamespace == nil { | ||
ns := defaultSecretNamespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may need to be adjusted as part of undoing the regression in #97417
see #99061 (comment) for details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in csi migration translation, there is no way to get pod namespace, unless we address this, otherwise in CSI migration would break also. We are planning to turn on csi migration in 1.21, it will break in that scenario too.
The main issue is that original inline azure file volume does not have a secretNamespace
field, while azure file persistent volume has that field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the backwards compatibility issue has to be solved before that can be enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add a new field(secretNamespace
) in azure file inline volume, that's the best way I could see to solve this issue. In csi migration translation lib, there is no way to get pod namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add a new field(
secretNamespace
) in azure file inline volume, that's the best way I could see to solve this issue.
No, secret references in pod specs cannot reference cross-namespace secrets.
In csi migration translation lib, there is no way to get pod namespace.
That sounds like an issue that could be solved by plumbing the pod namespace through the go API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyzhangx do you need the secret only for node operations or also attach? If only node, then plumbing in pod namespace should not be difficult:
kubernetes/pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go
Line 556 in 54449be
volumeSpec, err = csimigration.TranslateInTreeSpecToCSI(volumeSpec, dswp.intreeToCSITranslator) |
Plumbing it through attach will be more effort as we would need to add a field to the VolumeAttachment:
kubernetes/pkg/apis/storage/types.go
Line 164 in 54449be
InlineVolumeSpec *api.PersistentVolumeSpec |
cc @Jiawei0227
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand and delete operations are not supported for inline volumes.
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix azure file migration issue
Azure file CSI migration failed due to following error, that's because in Azure file CSI migration scenario, the namespace is set as nil, this PR set the PV namespace as
default
(by default), I have verified it works well in CSI migration test.Which issue(s) this PR fixes:
Fixes #96508
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/priority important-soon
/sig cloud-provider
/area provider/azure
/triage accepted