Open
Description
I'm experimenting with the new implementation of CUDA acceleration for quantized models and wondering how to use sharded tensors in this context. I'm having a hard time adapting the ShardedVarBuilder
to load like quantized_var_builder::VarBuilder::from_gguf
.
Do you have any recommendations on the best approach in this case?
Metadata
Metadata
Assignees
Labels
No labels