Release 1.4.0 · GoogleCloudPlatform/gcs-connector-for-pytorch

What's Changed

Lightning multinode parquet by @jdnurme in #73
Install pytest on running continuous test for dataflux pypi package. by @akansha1812 in #75
Make DatafluxPytTrain a wrapper of DataFluxMapStyleDataset by @abhibyreddi in #74
updated base docker, example yaml, added readme by @jdnurme in #76
Fix DatafluxPytTrain.getitem by @abhibyreddi in #77
Run continuous test on the pypi installed package on presubmit by @akansha1812 in #78
Add code to make it possible to deploy training on a multi-node GKE cluster by @abhibyreddi in #81
Configure shared memory size by @abhibyreddi in #82
Reorder Dockerfile and add dockerignore to speed up builds by @MattIrv in #84
Correcting the checkpointing functions to handle the Path object. by @Yash9060 in #83
Parse bucket name from ckpt directory name instead of separate parameter for bucket name by @Yash9060 in #85
Make Lightning checkpoint demo work with Bernard's GKE framework and with FSDP strategy by @MattIrv in #86
Initialize new storage_client.bucket on every request by @Yash9060 in #87
Add README file for lightning image segmentation workload by @abhibyreddi in #89
Check in initial Parquet benchmark based on MaxText data loading benchmark by @MattIrv in #90
Add GKE deployment for MaxText Parquet training benchmark by @MattIrv in #91
Skip training when demo is run to benchmark Dataflux by @abhibyreddi in #92
Update the definition of the local flag by @abhibyreddi in #93
Allow running demo code in listing-only mode by @abhibyreddi in #95
Raise exception when ADC are missing by @abhibyreddi in #94
Update defaults for batch_size and num_workers by @abhibyreddi in #96
Faster Lightning Checkpoint download by @MattIrv in #99
Adding custom GCS Writer. by @Yash9060 in #98
update to latest dataflux client by @jdnurme in #101
add continuous benchmark with kokoro by @jdnurme in #102
Run image training demo as part of continuous integ tests by @abhibyreddi in #104
Adding GCS Custom reader by @Yash9060 in #105
MultiNode demo by @Yash9060 in #106
add benchmark code and update kokoro scripts by @jdnurme in #108
Parameterizing min_epochs, max_epochs & max_steps by @Yash9060 in #107
Add a helper method to create storage_client when needed. by @awonak in #109
Make step time configurable by @abhibyreddi in #110
Remove client initialization for fast listing from dataflux-pytorch by @akansha1812 in #111
Multipart checkpoint upload by @jdnurme in #114
adds unit tests, adds presubmit integration test, updated demo code by @jdnurme in #117
Add code to clear kernel cache after saving checkpoints by @abhibyreddi in #122
update continuous to run full benchmark by @jdnurme in #123
Adding benchmarking code for multi node checkpointing. by @Yash9060 in #121
set multipart upload to default behavior by @jdnurme in #127
Introduce AsyncCheckpointIO option for non-blocking checkpoint saves by @awonak in #116
Print average times to save and load checkpoints together by @abhibyreddi in #129
Changing hardcoded values to placeholders by @Yash9060 in #128
Make num_nodes configurable by @abhibyreddi in #130
update lightning bench with multipart and 10k info by @jdnurme in #131
update default dataflux to use multipart by @jdnurme in #133
Run unit tests on x86 Mac by @abhibyreddi in #115
implement fast download for df checkpoint by @jdnurme in #134
Add image segmentation benchmark results to README by @abhibyreddi in #118
Add single node async benchmark execution to integration tests by @awonak in #135
Refactor benchmark tables by @awonak in #136
add option to run benchmark without lightning by @jdnurme in #137
Fix AsyncCheckpointIO race condition by @awonak in #138
Update image segmentation benchmark README by @abhibyreddi in #139
add upload and download improvements to multinode by @jdnurme in #141
Update documented step time by @abhibyreddi in #142
CPU simulated benchmarking for GKE cluster. by @Yash9060 in #143
Simulated CPU benchmarking code by @Yash9060 in #145
Add support for multi-node checkpointing with fsspec by @abhibyreddi in #144
Correcting the code for simulated benchmarks by @Yash9060 in #146
Multi-node checkpoint benchmark improvements by @MattIrv in #149
Set pytorch version to 2.3.1 by @abhibyreddi in #148
update main readme with checkpoing bench results by @jdnurme in #150
Add support to benchmark multi-node checkpointing with default FSDP strategy by @abhibyreddi in #151
Remove duplicative pip install instructions from multi-node checkpoint benchmark readme by @MattIrv in #152
Skip saving checkpoints during training by @abhibyreddi in #153
Install checkpoint benchmark dependencies before running the benchmark by @abhibyreddi in #155
Update checkpoint readmes by @MattIrv in #159
Implement a custom FSDP strategy for benchmarking loads from boot disk by @abhibyreddi in #157
Added debug flag to GCSReader/Writer by @Yash9060 in #154
Correcting load_checkpoint for simulated benchmarks. by @Yash9060 in #161
Add support for benchmarking checkpoint save/restore to/from distributed filesystems by @abhibyreddi in #162
Correct table header row by @abhibyreddi in #163
Adding option to use FSspec with simulated benchmarks by @Yash9060 in #164
Create client for each processs by @akansha1812 in #166
update bench script to run simulated multinode bench by @jdnurme in #167
Move client initialization to getitem and getitems by @akansha1812 in #170
Add additional timing info to multi-node simulated demo/benchmark. by @MattIrv in #172
add simple llama load benchmark and results by @jdnurme in #171
Revert "add simple llama load benchmark and results" by @jdnurme in #177
Update save/load print statements for FSDP benchmark by @MattIrv in #176
Update auto wrap policy and remove duplicate load in trainer.fit by @MattIrv in #175
Refactor DemoTransformer model to match Lightning demo by @MattIrv in #174
Remove model and path arguments from FSDP strategy constructors by @MattIrv in #173
PyTorch distributed checkpoint async save demo by @awonak in #168
updated readme to include async and multinode features by @jdnurme in #183
Update multi-node checkpoint benchmark instructions by @abhibyreddi in #180
Adding simulated benchmark stuff. by @Yash9060 in #181
Add 24-hour triage and PyTorch naming lines to README by @MattIrv in #185
Updating .gitignore by @Yash9060 in #184
move demo model class to common lib by @akansha1812 in #182
Reference const, not value by @awonak in #189
Update README.md with consistent naming by @MattIrv in #192
Remove image segmentation duplicates by @akansha1812 in #186
update benchmark with multinode numbers by @jdnurme in #191
Add demo.image_segmentation.model by @akansha1812 in #195
LLAMA2 Simulated version by @Yash9060 in #198
release 1.4.0 by @jdnurme in #199

New Contributors

@Yash9060 made their first contribution in #80
@awonak made their first contribution in #109

Full Changelog: v1.3.0...v1.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1.4.0

What's Changed

New Contributors

Contributors

Uh oh!