Skip to content

Commit ad288fb

Browse files
tasherif-msftjhendrixMSFTealsurrichardpark-msftazure-sdk
authored
[AzDatalake] File Client Upload/Download Support (#21261)
* Enable gocritic during linting (#20715) Enabled gocritic's evalOrder to catch dependencies on undefined behavior on return statements. Updated to latest version of golangci-lint. Fixed issue in azblob flagged by latest linter. * Cosmos DB: Enable merge support (#20716) * Adding header and value * Wiring and tests * format * Fixing value * change log * [azservicebus, azeventhubs] Stress test and logging improvement (#20710) Logging improvements: * Updating the logging to print more tracing information (per-link) in prep for the bigger release coming up. * Trimming out some of the verbose logging, seeing if I can get it a bit more reasonable. Stress tests: * Add a timestamp to the log name we generate and also default to append, not overwrite. * Use 0.5 cores, 0.5GB as our baseline. Some pods use more and I'll tune them more later. * update proxy version (#20712) Co-authored-by: Scott Beddall <[email protected]> * Return an error when you try to send a message that's too large. (#20721) This now works just like the message batch - you'll get an ErrMessageTooLarge if you attempt to send a message that's too large for the link's configured size. NOTE: there's a patch to `internal/go-amqp/Sender.go` to match what's in go-amqp's main so it returns a programmatically useful error when the message is too large. Fixes #20647 * Changes in test that is failing in pipeline (#20693) * [azservicebus, azeventhubs] Treat 'entity full' as a fatal error (#20722) When the remote entity is full we get a resource-limit-exceeded condition. This isn't something we should keep retrying on and it's best to just abort and let the user know immediately, rather than hoping it might eventually clear out. This affected both Event Hubs and Service Bus. Fixes #20647 * [azservicebus/azeventhubs] Redirect stderr and stdout to tee (#20726) * Update changelog with latest features (#20730) * Update changelog with latest features Prepare for upcoming release. * bump minor version * pass along the artifact name so we can override it later (#20732) Co-authored-by: scbedd <[email protected]> * [azeventhubs] Fixing checkpoint store race condition (#20727) The checkpoint store wasn't guarding against multiple owners claiming for the first time - fixing this by using IfNoneMatch Fixes #20717 * Fix azidentity troubleshooting guide link (#20736) * [Release] sdk/resourcemanager/paloaltonetworksngfw/armpanngfw/0.1.0 (#20437) * [Release] sdk/resourcemanager/paloaltonetworksngfw/armpanngfw/0.1.0 generation from spec commit: 85fb4ac6f8bfefd179e6c2632976a154b5c9ff04 * client factory * fix * fix * update * add sdk/resourcemanager/postgresql/armpostgresql live test (#20685) * add sdk/resourcemanager/postgresql/armpostgresql live test * update assets.json * set subscriptionId default value * format * add sdk/resourcemanager/eventhub/armeventhub live test (#20686) * add sdk/resourcemanager/eventhub/armeventhub live test * update assets * add sdk/resourcemanager/compute/armcompute live test (#20048) * add sdk/resourcemanager/compute/armcompute live test * skus filter * fix subscriptionId default value * fix * gofmt * update recording * sdk/resourcemanager/network/armnetwork live test (#20331) * sdk/resourcemanager/network/armnetwork live test * update subscriptionId default value * update recording * add sdk/resourcemanager/cosmos/armcosmos live test (#20705) * add sdk/resourcemanager/cosmos/armcosmos live test * update assets.json * update assets.json * update assets.json * update assets.json * Increment package version after release of azcore (#20740) * [azeventhubs] Improperly resetting etag in the checkpoint store (#20737) We shouldn't be resetting the etag to nil - it's what we use to enforce a "single winner" when doing ownership claims. The bug here was two-fold: I had bad logic in my previous claim ownership, which I fixed in a previous PR, but we need to reflect that same constraint properly in our in-memory checkpoint store for these tests. * Eng workflows sync and branch cleanup additions (#20743) Co-authored-by: James Suplizio <[email protected]> * [azeventhubs] Latest start position can also be inclusive (ie, get the latest message) (#20744) * Update GitHubEventProcessor version and remove pull_request_review procesing (#20751) Co-authored-by: James Suplizio <[email protected]> * Rename DisableAuthorityValidationAndInstanceDiscovery (#20746) * fix (#20707) * AzFile (#20739) * azfile: Fixing connection string parsing logic (#20798) * Fixing connection string parse logic * Update README * [azadmin] fix flaky test (#20758) * fix flaky test * charles suggestion * Prepare azidentity v1.3.0 for release (#20756) * Fix broken podman link (#20801) Co-authored-by: Wes Haggard <[email protected]> * [azquery] update doc comments (#20755) * update doc comments * update statistics and visualization generation * prep-for-release * Fixed contribution section (#20752) Co-authored-by: Bob Tabor <[email protected]> * [azeventhubs,azservicebus] Some API cleanup, renames (#20754) * Adding options to UpdateCheckpoint(), just for future potential expansion * Make Offset an int64, not a *int64 (it's not optional, it'll always come back with ReceivedEvents) * Adding more logging into the checkpoint store. * Point all imports at the production go-amqp * Add supporting features to enable distributed tracing (#20301) (#20708) * Add supporting features to enable distributed tracing This includes new internal pipeline policies and other supporting types. See the changelog for a full description. Added some missing doc comments. * fix linter issue * add net.peer.name trace attribute sequence custom HTTP header policy before logging policy. sequence logging policy after HTTP trace policy. keep body download policy at the end. * add span for iterating over pages * Restore ARM CAE support for azcore beta (#20657) This reverts commit 9020972. * Upgrade to stable azcore (#20808) * Increment package version after release of data/azcosmos (#20807) * Updating changelog (#20810) * Add fake package to azcore (#20711) * Add fake package to azcore This is the supporting infrastructure for the generated SDK fakes. * fix doc comment * Updating CHANGELOG.md (#20809) * changelog (#20811) * Increment package version after release of storage/azfile (#20813) * Update changelog (azblob) (#20815) * Updating CHANGELOG.md * Update the changelog with correct version * [azquery] migration guide (#20742) * migration guide * Charles feedback * Richard feedback --------- Co-authored-by: Charles Lowell <[email protected]> * Increment package version after release of monitor/azquery (#20820) * [keyvault] prep for release (#20819) * prep for release * perf tests * update date * added all upload methods * added more tests for upload stream * added more tests * added downloaders * added more tests * cleanup * feedback --------- Co-authored-by: Joel Hendrix <[email protected]> Co-authored-by: Matias Quaranta <[email protected]> Co-authored-by: Richard Park <[email protected]> Co-authored-by: Azure SDK Bot <[email protected]> Co-authored-by: Scott Beddall <[email protected]> Co-authored-by: siminsavani-msft <[email protected]> Co-authored-by: scbedd <[email protected]> Co-authored-by: Charles Lowell <[email protected]> Co-authored-by: Peng Jiahui <[email protected]> Co-authored-by: James Suplizio <[email protected]> Co-authored-by: Sourav Gupta <[email protected]> Co-authored-by: gracewilcox <[email protected]> Co-authored-by: Wes Haggard <[email protected]> Co-authored-by: Bob Tabor <[email protected]> Co-authored-by: Bob Tabor <[email protected]>
1 parent 68d465f commit ad288fb

18 files changed

+2277
-46
lines changed
+193
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
//go:build go1.18
2+
// +build go1.18
3+
4+
// Copyright (c) Microsoft Corporation. All rights reserved.
5+
// Licensed under the MIT License. See License.txt in the project root for license information.
6+
7+
package file
8+
9+
import (
10+
"bytes"
11+
"context"
12+
"errors"
13+
"github.com/Azure/azure-sdk-for-go/sdk/azcore/streaming"
14+
"io"
15+
"sync"
16+
)
17+
18+
// chunkWriter provides methods to upload chunks that represent a file to a server.
19+
// This allows us to provide a local implementation that fakes the server for hermetic testing.
20+
type chunkWriter interface {
21+
AppendData(context.Context, int64, io.ReadSeekCloser, *AppendDataOptions) (AppendDataResponse, error)
22+
FlushData(context.Context, int64, *FlushDataOptions) (FlushDataResponse, error)
23+
}
24+
25+
// bufferManager provides an abstraction for the management of buffers.
26+
// this is mostly for testing purposes, but does allow for different implementations without changing the algorithm.
27+
type bufferManager[T ~[]byte] interface {
28+
// Acquire returns the channel that contains the pool of buffers.
29+
Acquire() <-chan T
30+
31+
// Release releases the buffer back to the pool for reuse/cleanup.
32+
Release(T)
33+
34+
// Grow grows the number of buffers, up to the predefined max.
35+
// It returns the total number of buffers or an error.
36+
// No error is returned if the number of buffers has reached max.
37+
// This is called only from the reading goroutine.
38+
Grow() (int, error)
39+
40+
// Free cleans up all buffers.
41+
Free()
42+
}
43+
44+
// copyFromReader copies a source io.Reader to file storage using concurrent uploads.
45+
func copyFromReader[T ~[]byte](ctx context.Context, src io.Reader, dst chunkWriter, options UploadStreamOptions, getBufferManager func(maxBuffers int, bufferSize int64) bufferManager[T]) error {
46+
options.setDefaults()
47+
actualSize := int64(0)
48+
wg := sync.WaitGroup{} // Used to know when all outgoing chunks have finished processing
49+
errCh := make(chan error, 1) // contains the first error encountered during processing
50+
var err error
51+
52+
buffers := getBufferManager(int(options.Concurrency), options.ChunkSize)
53+
defer buffers.Free()
54+
55+
// this controls the lifetime of the uploading goroutines.
56+
// if an error is encountered, cancel() is called which will terminate all uploads.
57+
// NOTE: the ordering is important here. cancel MUST execute before
58+
// cleaning up the buffers so that any uploading goroutines exit first,
59+
// releasing their buffers back to the pool for cleanup.
60+
ctx, cancel := context.WithCancel(ctx)
61+
defer cancel()
62+
63+
// This goroutine grabs a buffer, reads from the stream into the buffer,
64+
// then creates a goroutine to upload/stage the chunk.
65+
for chunkNum := uint32(0); true; chunkNum++ {
66+
var buffer T
67+
select {
68+
case buffer = <-buffers.Acquire():
69+
// got a buffer
70+
default:
71+
// no buffer available; allocate a new buffer if possible
72+
if _, err := buffers.Grow(); err != nil {
73+
return err
74+
}
75+
76+
// either grab the newly allocated buffer or wait for one to become available
77+
buffer = <-buffers.Acquire()
78+
}
79+
80+
var n int
81+
n, err = io.ReadFull(src, buffer)
82+
83+
if n > 0 {
84+
// some data was read, upload it
85+
wg.Add(1) // We're posting a buffer to be sent
86+
87+
// NOTE: we must pass chunkNum as an arg to our goroutine else
88+
// it's captured by reference and can change underneath us!
89+
go func(chunkNum uint32) {
90+
// Upload the outgoing chunk, matching the number of bytes read
91+
offset := int64(chunkNum) * options.ChunkSize
92+
appendDataOpts := options.getAppendDataOptions()
93+
actualSize += int64(len(buffer[:n]))
94+
_, err := dst.AppendData(ctx, offset, streaming.NopCloser(bytes.NewReader(buffer[:n])), appendDataOpts)
95+
if err != nil {
96+
select {
97+
case errCh <- err:
98+
// error was set
99+
default:
100+
// some other error is already set
101+
}
102+
cancel()
103+
}
104+
buffers.Release(buffer) // The goroutine reading from the stream can reuse this buffer now
105+
106+
// signal that the chunk has been staged.
107+
// we MUST do this after attempting to write to errCh
108+
// to avoid it racing with the reading goroutine.
109+
wg.Done()
110+
}(chunkNum)
111+
} else {
112+
// nothing was read so the buffer is empty, send it back for reuse/clean-up.
113+
buffers.Release(buffer)
114+
}
115+
116+
if err != nil { // The reader is done, no more outgoing buffers
117+
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
118+
// these are expected errors, we don't surface those
119+
err = nil
120+
} else {
121+
// some other error happened, terminate any outstanding uploads
122+
cancel()
123+
}
124+
break
125+
}
126+
}
127+
128+
wg.Wait() // Wait for all outgoing chunks to complete
129+
130+
if err != nil {
131+
// there was an error reading from src, favor this error over any error during staging
132+
return err
133+
}
134+
135+
select {
136+
case err = <-errCh:
137+
// there was an error during staging
138+
return err
139+
default:
140+
// no error was encountered
141+
}
142+
143+
// All chunks uploaded, return nil error
144+
flushOpts := options.getFlushDataOptions()
145+
_, err = dst.FlushData(ctx, actualSize, flushOpts)
146+
return err
147+
}
148+
149+
// mmbPool implements the bufferManager interface.
150+
// it uses anonymous memory mapped files for buffers.
151+
// don't use this type directly, use newMMBPool() instead.
152+
type mmbPool struct {
153+
buffers chan mmb
154+
count int
155+
max int
156+
size int64
157+
}
158+
159+
func newMMBPool(maxBuffers int, bufferSize int64) bufferManager[mmb] {
160+
return &mmbPool{
161+
buffers: make(chan mmb, maxBuffers),
162+
max: maxBuffers,
163+
size: bufferSize,
164+
}
165+
}
166+
167+
func (pool *mmbPool) Acquire() <-chan mmb {
168+
return pool.buffers
169+
}
170+
171+
func (pool *mmbPool) Grow() (int, error) {
172+
if pool.count < pool.max {
173+
buffer, err := newMMB(pool.size)
174+
if err != nil {
175+
return 0, err
176+
}
177+
pool.buffers <- buffer
178+
pool.count++
179+
}
180+
return pool.count, nil
181+
}
182+
183+
func (pool *mmbPool) Release(buffer mmb) {
184+
pool.buffers <- buffer
185+
}
186+
187+
func (pool *mmbPool) Free() {
188+
for i := 0; i < pool.count; i++ {
189+
buffer := <-pool.buffers
190+
buffer.delete()
191+
}
192+
pool.count = 0
193+
}

0 commit comments

Comments
 (0)