Skip to content

feat: support volume cloning #1278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 25, 2023

Conversation

umagnus
Copy link
Contributor

@umagnus umagnus commented May 22, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

feat: support volume cloning in Azure File CSI driver

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:
The Alpine distro is based on a libc variant called musl-libc. azcopy is compiled against glibc, so it will get no such file or directory: unknown error. To resolve this issue, I add glibc to alpine distro by using apk add gcompat.
Reference:

csi copy percent logs:

job state Running:
I0925 06:43:24.962466       1 utils.go:76] GRPC call: /csi.v1.Controller/CreateVolume
I0925 06:43:24.962927       1 utils.go:77] GRPC request: {"capacity_range":{"required_bytes":1084479242240},"name":"pvc-23530abb-da33-4601-9e8c-40c2376b07f8","parameters":{"csi.storage.k8s.io/pv/name":"pvc-23530abb-da33-4601-9e8c-40c2376b07f8","csi.storage.k8s.io/pvc/name":"pvc-azurefile-clone","csi.storage.k8s.io/pvc/namespace":"default","skuName":"Premium_LRS"},"volume_capabilities":[{"AccessType":{"Mount":{"mount_flags":["dir_mode=0777","file_mode=0777","mfsymlinks","cache=strict","nosharesock","actimeo=30"]}},"access_mode":{"mode":5}}],"volume_content_source":{"Type":{"Volume":{"volume_id":"mc_aks-volumeclone_group_aks-volumeclone_westus#fdfadc5603ef5486d889e6e#pvc-678afe18-c453-45ed-969d-df2c74d9cbbb###default"}}}}
I0925 06:43:25.046091       1 controllerserver.go:112] azcopy job status: Running, copy percent: 10.0%, error: <nil>
E0925 06:43:25.049060       1 utils.go:81] GRPC error: rpc error: code = Aborted desc = An operation with the given Volume ID pvc-23530abb-da33-4601-9e8c-40c2376b07f8 already exists

job state Completed:
I0925 07:21:59.268103       1 controllerserver.go:541] begin to create file share(pvc-23530abb-da33-4601-9e8c-40c2376b07f8) on account(fdfadc5603ef5486d889e6e) type(Premium_LRS) subID() rg(mc_aks-volumeclone_group_aks-volumeclone_westus) location() size(1010) protocol(SMB)
I0925 07:21:59.537054       1 azurefile.go:921] generate sas token for account(fdfadc5603ef5486d889e6e)
I0925 07:21:59.556365       1 azurefile.go:933] azcopy job status: Completed, copy percent: 100.0%, error: <nil>
I0925 07:21:59.556566       1 controllerserver.go:573] create file share pvc-23530abb-da33-4601-9e8c-40c2376b07f8 on storage account fdfadc5603ef5486d889e6e successfully

azcopy copy speed

file size copy time
1G 5s
10G 84s
100G 12min10s
1000G 74min23s

8 connections, cpu and memory used:
POD NAME CPU(cores) MEMORY(bytes)
csi-azurefile-controller-5675f96dc9-rs4zh azurefile 60m 66Mi

cpu: around 50-70, memory: around 60-70

Release note:

none

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 22, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @umagnus. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 22, 2023
@k8s-ci-robot k8s-ci-robot requested review from cvvz and ZeroMagic May 22, 2023 07:09
@andyzhangx
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 23, 2023
@umagnus umagnus force-pushed the add_volume_cloning branch 2 times, most recently from 8071cef to 24d8940 Compare May 24, 2023 03:12
@umagnus
Copy link
Contributor Author

umagnus commented May 24, 2023

/retest

Comment on lines 932 to 933
klog.V(2).Infof("copy fileshare %s to %s", srcFileShareName, dstFileShareName)
out, copyErr = exec.Command("azcopy", "copy", srcPath, dstPath, "--recursive", "--check-length=false").CombinedOutput()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a note to docs that it needs azcopy and thus physically copies data? Maybe with a small table about how fast it could be - if I have a volume with 1TB of data (in say 100 MB files), how long will CreateVolume take?

if err := d.accountSearchCache.Delete(lockKey); err != nil {
return nil, status.Errorf(codes.Internal, err.Error())
if req.GetVolumeContentSource() != nil {
if err := d.copyVolume(ctx, req, accountKey, shareOptions, storageEndpointSuffix, secret); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be blind, where is the destination share actually created? And how do you make sure it's deleted when azcopy permanently fails and you want to return a final error to the external-provisioner? Because the external-provisioner may not call CreateVolume again if user has deleted their PVC, as the volume is not needed any longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destination share is created in func CopyFileShare in azurefile.go by azcopy. If azcopy fails, it will return error in func copyVolume and func CreateVolume will return a final error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but where is the share deleted after failed CopyFileShare?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new file share would be created by azcopy, and if azcopy returns failure, it does not makes sense to delete the file share explicitly by the csi driver since the csi driver would retry volume cloning after the failure within a period of time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that the driver leaks a volume when user deletes corresponding pvc. I understand the interface between the provisioner and the driver does not allow much options here, perhaps we can enhance the protocol if you have a smart idea.

@@ -21,6 +21,11 @@ COPY ${binary} /azurefileplugin
RUN apk upgrade --available --no-cache && \
apk add --no-cache ca-certificates cifs-utils util-linux e2fsprogs-extra udev xfsprogs-extra nfs-utils

RUN apk add gcompat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for this package install

@@ -21,6 +21,11 @@ COPY ${binary} /azurefileplugin
RUN apk upgrade --available --no-cache && \
apk add --no-cache ca-certificates cifs-utils util-linux e2fsprogs-extra udev xfsprogs-extra nfs-utils

RUN apk add gcompat
RUN wget -c https://azcopyvnext.azureedge.net/release20230420/azcopy_linux_amd64_10.18.1.tar.gz && tar -zxvf azcopy_linux_amd64_10.18.1.tar.gz && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an arm64 support for this binary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not supported because azcopy has another arm64 preview binary (https://aka.ms/downloadazcopy-v10-linux-arm64).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then copy arm64 binary if it's arm64 image

@umagnus umagnus force-pushed the add_volume_cloning branch from e883746 to e132808 Compare June 9, 2023 08:39
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 30, 2023
@umagnus umagnus force-pushed the add_volume_cloning branch from a99867c to ba0b890 Compare June 30, 2023 09:19
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 30, 2023
@umagnus umagnus force-pushed the add_volume_cloning branch from b75bb56 to 7e278a7 Compare June 30, 2023 09:57
@umagnus
Copy link
Contributor Author

umagnus commented Jun 30, 2023

/retest

1 similar comment
@umagnus
Copy link
Contributor Author

umagnus commented Jun 30, 2023

/retest

@umagnus umagnus force-pushed the add_volume_cloning branch 5 times, most recently from d850744 to 2e666d4 Compare July 6, 2023 08:13
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2023
@umagnus umagnus force-pushed the add_volume_cloning branch 3 times, most recently from 6003e6c to 2454d9e Compare September 22, 2023 03:34
@andyzhangx andyzhangx changed the title [WIP] feat: support volume cloning in Azure File CSI driver feat: support volume cloning in Azure File CSI driver Sep 22, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2023
@andyzhangx andyzhangx changed the title feat: support volume cloning in Azure File CSI driver feat: support volume cloning Sep 22, 2023
@umagnus
Copy link
Contributor Author

umagnus commented Sep 22, 2023

/retest

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in L526, it's

mc := metrics.NewMetricContext(azureFileCSIDriverName, "controller_create_volume", d.cloud.ResourceGroup, subsID, d.Name)

let's make it separate, for volume cloning, use controller_create_volume_from_volume, and for snapshot restore, use controller_create_volume_from_snapshot in the future.

// logging the job status if it's volume cloning
if req.GetVolumeContentSource() != nil {
jobState, percent, err := getAzcopyJob(volName)
klog.V(2).Infof("azcopy job status: %s, copy percent: %s%, error: %v", jobState, percent, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy percent: %.1f%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

percent is a string parsed by azcopy and it is already %1.f format.

Copy link
Contributor Author

@umagnus umagnus Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use copy percent: %s%% to fix print nil error

select {
case <-timeTick:
jobState, percent, err := getAzcopyJob(dstFileShareName)
klog.V(2).Infof("azcopy job status: %s, copy percent: %s%, error: %v", jobState, percent, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy percent: %.1f%

dstPath := fmt.Sprintf("https://%s.file.%s/%s%s", accountName, storageEndpointSuffix, dstFileShareName, accountSasToken)

jobState, percent, err := getAzcopyJob(dstFileShareName)
klog.V(2).Infof("azcopy job status: %s, copy percent: %s%, error: %v", jobState, percent, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy percent: %.1f%

@umagnus umagnus force-pushed the add_volume_cloning branch 2 times, most recently from c4bfab7 to 20188a7 Compare September 25, 2023 06:34
@umagnus
Copy link
Contributor Author

umagnus commented Sep 25, 2023

/retest

2 similar comments
@andyzhangx
Copy link
Member

/retest

@andyzhangx
Copy link
Member

/retest

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, umagnus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 25, 2023
@k8s-ci-robot k8s-ci-robot merged commit 938f084 into kubernetes-sigs:master Sep 25, 2023
@andyzhangx
Copy link
Member

/cherrypick release-1.29

@k8s-infra-cherrypick-robot

@andyzhangx: new pull request created: #1462

In response to this:

/cherrypick release-1.29

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx
Copy link
Member

/cherrypick release-1.28

@k8s-infra-cherrypick-robot

@andyzhangx: #1278 failed to apply on top of branch "release-1.28":

Applying: support volume cloning in Azure File CSI driver
.git/rebase-apply/patch:4683: trailing whitespace.
        } 
warning: 1 line adds whitespace errors.
Using index info to reconstruct a base tree...
M	go.mod
M	go.sum
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 support volume cloning in Azure File CSI driver
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants