Implement Secure & Fast Inference Checkpoints with Safetensors #250

JesperDramsch · 2025-04-10T13:09:03Z

Our current checkpoint system has several limitations:

Uses pickle format which is insecure (can execute arbitrary code)
Slow loading, especially on CPU (impacts inference startup time)
Metadata for traceability breaks some checkpoints
Limited modularity (must load entire model)
Challenging to maintain backward compatibility with breaking changes

Proposed Solution

I plan to implement a new inference checkpoint system using safetensors format to address these issues:

Features:

Secure & Fast Checkpoints
- Replace pickle with safetensors format
- Gain ~76x faster loading on CPU, ~2x faster on GPU
- Eliminate security risks from loading untrusted models (eliminates arbitrary code execution)
Traceability & Metadata
- Store model metadata (version, training params, etc.)
- Enable inspection of model architecture without loading weights
Modular Loading
- Support for lazy loading of specific weights
- Allow partial model loading for distributed inference
Backward Compatibility
- Define "noop"/"identity" operations for new features
- Handle missing tensors in older checkpoints gracefully
- Maintain compatibility across (most) updates

Consideration of alternatives

When selecting a storage format for ML model weights, especially for inference, we need to balance multiple critical factors: security, performance, compatibility, and future-proofing. Let's analyze our options, starting with general-purpose formats before examining ML-specific solutions.

General-Purpose Storage Formats

Arrow

Strengths: Industry standard for columnar data, excellent for tabular data processing
Limitations:
- Not designed with neural network weights in mind
- Lacks native support for ML-specific datatypes like BFloat16
- Optimization targets are data analytics workloads, not ML inference patterns
For our use case: Would require custom extensions to handle our ML-specific needs

Cap'n Proto

Strengths: Zero-copy capabilities, high security standard
Limitations:
- Missing native support for key ML datatypes (Float16/BFloat16)
- Designed for RPC and general serialization rather than ML weight storage
- Implementation complexity exceeds our specific requirements
For our use case: Would require workarounds for modern ML datatype support

Numerical Computing Formats

NumPy (npy/npz)

Strengths: Widespread in scientific computing, relatively simple format
Limitations:
- NPZ format susceptible to zip bombs (security concern)
- No zero-copy capabilities, impacting performance
- Limited datatype support (no BFloat16)
- Internal format wasn't designed for partial access patterns
For our use case: Performance and security issues make it inadequate for production

ML-Specific Formats

PyTorch Pickle (Our Current Solution)

Critical issues:
- Security vulnerabilities that allow arbitrary code execution
- Performance bottlenecks: 76x slower on CPU, 2x slower on GPU than alternatives
- No support for metadata inspection without loading the full model
- Difficult to implement backward compatibility without loading the model
For our use case: The security risks alone justify migration to a safer format

HDF5 (TensorFlow)

Strengths: Better metadata handling, partial loading capability
Limitations:
- Complex codebase (~210k lines) introducing security surface area
- History of CVEs and memory safety issues
- Memory overhead due to lack of zero-copy capabilities
- Even TensorFlow is moving away from this format
For our use case: Unnecessary complexity with suboptimal performance characteristics

Protobuf (ONNX)

Strengths: Cross-platform compatibility, well-established format
Limitations:
- 2GB file size limit makes it impractical for larger weather models
- Not zero-copy, adding memory overhead during loading
- Limited layout control for efficient distributed loading
For our use case: The size limitation is a dealbreaker for many of our models

MsgPack (Flax)

Strengths: Simple, lightweight binary serialization
Limitations:
- Lacks layout control needed for efficient tensor access patterns
- No metadata scheme for model inspection without loading
- Not optimized for large model loading workflows
For our use case: Would require significant extensions to meet our needs

Key Advantages of Safetensors for Anemoi

Safetensors uniquely addresses the specific requirements for Anemoi's inference checkpoint system:

Security and Performance Balance
- Provides guaranteed safety without arbitrary code execution risks
- Zero-copy architecture delivers dramatic speedups (76x on CPU, 2x on GPU)
- Small codebase (~400 lines) minimizes security surface area
Weather-Specific Model Requirements
- Support for large tensor dimensions typical in weather models
- No file size limitations
- Efficient architecture for distributed inference patterns
Practical Implementation Advantages
- JSON header allows inspection without loading the entire model
- Standardized metadata system for version tracking and model lineage
- Layout control minimizes disk I/O during distributed loading
Backward Compatibility Enabler
- Format structure makes it easy to implement "identity operations" for backward compatibility
- Clear path for handling architecture changes between versions
- Metadata can store transformation information for version migration
Future-Proofing
- Support for modern datatypes (FP16, BFloat16, FP8)
- Growing ecosystem adoption ensures continued maintenance
- Cross-framework support preserves our flexibility for future architecture changes
- Efficient lazy loading for distributed settings

To steal a summary from the safetensors docs:

Format	Safe	Zero-copy	Lazy loading	No file size limit	Layout control	Flexibility	Bfloat16/Fp8
pickle (PyTorch)	✗	✗	✗	🗸	✗	🗸	🗸
H5 (Tensorflow)	🗸	✗	🗸	🗸	~	~	✗
SavedModel (Tensorflow)	🗸	✗	✗	🗸	🗸	✗	🗸
MsgPack (flax)	🗸	🗸	✗	🗸	✗	✗	🗸
Protobuf (ONNX)	🗸	✗	✗	✗	✗	✗	🗸
Cap'n'Proto	🗸	🗸	~	🗸	🗸	~	✗
Arrow	?	?	?	?	?	?	✗
Numpy (npy,npz)	🗸	?	?	✗	🗸	✗	✗
pdparams (Paddle)	✗	✗	✗	🗸	✗	🗸	🗸
SafeTensors	🗸	🗸	🗸	🗸	🗸	✗	🗸

While several alternatives provide some of the benefits we need, safetensors is the only solution that comprehensively addresses all our requirements: security, performance, lazy loading, and framework agnosticism.

The implementation costs are low compared to alternatives, with high ROI in terms of security improvements, performance gains, and enabling future architecture flexibility.

The format's adoption by major ML projects (Hugging Face, MLX, StabilityAI) demonstrates its viability for production environments. Its simplicity and focus on ML-specific requirements make it far more suitable than general-purpose formats or older ML solutions with known limitations.

For Anemoi's inference checkpoints, safetensors offers the most direct path to secure, efficient, and future-proof model storage with minimal implementation overhead.

Practical Implementation Considerations

Migration Path
- Established conversion patterns from pickle to safetensors
- Conversion tools available (HF spaces, scripts) and adaptable
Metadata Handling
- JSON-based header enables simple parsing and extension
- Ability to store traceability information vital for MLOps
Weather-Specific Requirements
- Format's simplicity allows for domain-specific metadata
- Handles large tensors efficiently, which is important for weather models

Implementation Plan

Phase 1: Checkpoint Layout

Integrate safetensors into Anemoi training
Implement checkpoint saving utilities
Prepare model schema versioning

Phase 2: Metadata layout

Implement metadata layout in safetensors
Ensure compatibility with inference toolset
Verify full traceability incl. model versioning

Phase 3: Model Loading

Develop model loading capability for safetensors checkpoints
Create adapter layer for backward compatibility
Implement tensor mapping for architecture changes

Phase 4: Inference Integration and Interoperability

Ensure interoperability with https://github.com/ecmwf/anemoi-inference/
Lay out loading tools
Cooperate with Checkpoint working group

Phase 5: Testing & Migration

Benchmark performance
Validate all existing models with new system
Create migration guide for users
Create conversion utilities (pickle → safetensors)

Technical Details

Format: safetensors (see safetensors schema)
Metadata: Will include model version, architecture config, creation date (full traceability from dataset to inference)
Backward Compatibility: Will implement model schema versioning

As for the model loading, I would suggest implementing a similar loading capability to huggingface.

class AnemoiModel:
    ...

    def load_model(self, path: str | Path):
        metadata, tensors = safe_load(path)
        ...
        self_initialise(metadata)
        ...
        super_smart_magic_loading_that_is_backwards_compatible(tensors)

Questions

Should we support both formats during a transition period?
How should we handle custom user extensions?

Resources

Additional Considerations

I believe this should be in line with design considerations and discussions with some member states during the on-boarding process.

I also believe, if implemented correctly this will solve multiple backwards compatibility issues, which would improve the overall health of the project and decrease tech debt. It could possibly even be a blueprint for certain aspects of training checkpoints, but this is out of scope for this specific work.

The text was updated successfully, but these errors were encountered:

JesperDramsch added breaking change training labels Apr 10, 2025

JesperDramsch self-assigned this Apr 10, 2025

JesperDramsch added this to Anemoi-dev Apr 10, 2025

JesperDramsch added this to the Checkpoint Refactor milestone Apr 10, 2025

JesperDramsch added the enhancement New feature or request label Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Secure & Fast Inference Checkpoints with Safetensors #250

Implement Secure & Fast Inference Checkpoints with Safetensors #250

JesperDramsch commented Apr 10, 2025 •

edited

Loading

Implement Secure & Fast Inference Checkpoints with Safetensors #250

Implement Secure & Fast Inference Checkpoints with Safetensors #250

Comments

JesperDramsch commented Apr 10, 2025 • edited Loading

Proposed Solution

Features:

Consideration of alternatives

General-Purpose Storage Formats

Arrow

Cap'n Proto

Numerical Computing Formats

NumPy (npy/npz)

ML-Specific Formats

PyTorch Pickle (Our Current Solution)

HDF5 (TensorFlow)

Protobuf (ONNX)

MsgPack (Flax)

Key Advantages of Safetensors for Anemoi

Practical Implementation Considerations

Implementation Plan

Phase 1: Checkpoint Layout

Phase 2: Metadata layout

Phase 3: Model Loading

Phase 4: Inference Integration and Interoperability

Phase 5: Testing & Migration

Technical Details

Questions

Resources

Additional Considerations

JesperDramsch commented Apr 10, 2025 •

edited

Loading