Skip to content

Implement Secure & Fast Inference Checkpoints with Safetensors #250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
16 tasks
JesperDramsch opened this issue Apr 10, 2025 · 0 comments
Open
16 tasks
Assignees
Labels

Comments

@JesperDramsch
Copy link
Member

JesperDramsch commented Apr 10, 2025

Our current checkpoint system has several limitations:

  • Uses pickle format which is insecure (can execute arbitrary code)
  • Slow loading, especially on CPU (impacts inference startup time)
  • Metadata for traceability breaks some checkpoints
  • Limited modularity (must load entire model)
  • Challenging to maintain backward compatibility with breaking changes

Proposed Solution

I plan to implement a new inference checkpoint system using safetensors format to address these issues:

Features:

  1. Secure & Fast Checkpoints

    • Replace pickle with safetensors format
    • Gain ~76x faster loading on CPU, ~2x faster on GPU
    • Eliminate security risks from loading untrusted models (eliminates arbitrary code execution)
  2. Traceability & Metadata

    • Store model metadata (version, training params, etc.)
    • Enable inspection of model architecture without loading weights
  3. Modular Loading

    • Support for lazy loading of specific weights
    • Allow partial model loading for distributed inference
  4. Backward Compatibility

    • Define "noop"/"identity" operations for new features
    • Handle missing tensors in older checkpoints gracefully
    • Maintain compatibility across (most) updates

Consideration of alternatives

When selecting a storage format for ML model weights, especially for inference, we need to balance multiple critical factors: security, performance, compatibility, and future-proofing. Let's analyze our options, starting with general-purpose formats before examining ML-specific solutions.

General-Purpose Storage Formats

Arrow

  • Strengths: Industry standard for columnar data, excellent for tabular data processing
  • Limitations:
    • Not designed with neural network weights in mind
    • Lacks native support for ML-specific datatypes like BFloat16
    • Optimization targets are data analytics workloads, not ML inference patterns
  • For our use case: Would require custom extensions to handle our ML-specific needs

Cap'n Proto

  • Strengths: Zero-copy capabilities, high security standard
  • Limitations:
    • Missing native support for key ML datatypes (Float16/BFloat16)
    • Designed for RPC and general serialization rather than ML weight storage
    • Implementation complexity exceeds our specific requirements
  • For our use case: Would require workarounds for modern ML datatype support

Numerical Computing Formats

NumPy (npy/npz)

  • Strengths: Widespread in scientific computing, relatively simple format
  • Limitations:
    • NPZ format susceptible to zip bombs (security concern)
    • No zero-copy capabilities, impacting performance
    • Limited datatype support (no BFloat16)
    • Internal format wasn't designed for partial access patterns
  • For our use case: Performance and security issues make it inadequate for production

ML-Specific Formats

PyTorch Pickle (Our Current Solution)

  • Critical issues:
    • Security vulnerabilities that allow arbitrary code execution
    • Performance bottlenecks: 76x slower on CPU, 2x slower on GPU than alternatives
    • No support for metadata inspection without loading the full model
    • Difficult to implement backward compatibility without loading the model
  • For our use case: The security risks alone justify migration to a safer format

HDF5 (TensorFlow)

  • Strengths: Better metadata handling, partial loading capability
  • Limitations:
    • Complex codebase (~210k lines) introducing security surface area
    • History of CVEs and memory safety issues
    • Memory overhead due to lack of zero-copy capabilities
    • Even TensorFlow is moving away from this format
  • For our use case: Unnecessary complexity with suboptimal performance characteristics

Protobuf (ONNX)

  • Strengths: Cross-platform compatibility, well-established format
  • Limitations:
    • 2GB file size limit makes it impractical for larger weather models
    • Not zero-copy, adding memory overhead during loading
    • Limited layout control for efficient distributed loading
  • For our use case: The size limitation is a dealbreaker for many of our models

MsgPack (Flax)

  • Strengths: Simple, lightweight binary serialization
  • Limitations:
    • Lacks layout control needed for efficient tensor access patterns
    • No metadata scheme for model inspection without loading
    • Not optimized for large model loading workflows
  • For our use case: Would require significant extensions to meet our needs

Key Advantages of Safetensors for Anemoi

Safetensors uniquely addresses the specific requirements for Anemoi's inference checkpoint system:

  1. Security and Performance Balance

    • Provides guaranteed safety without arbitrary code execution risks
    • Zero-copy architecture delivers dramatic speedups (76x on CPU, 2x on GPU)
    • Small codebase (~400 lines) minimizes security surface area
  2. Weather-Specific Model Requirements

    • Support for large tensor dimensions typical in weather models
    • No file size limitations
    • Efficient architecture for distributed inference patterns
  3. Practical Implementation Advantages

    • JSON header allows inspection without loading the entire model
    • Standardized metadata system for version tracking and model lineage
    • Layout control minimizes disk I/O during distributed loading
  4. Backward Compatibility Enabler

    • Format structure makes it easy to implement "identity operations" for backward compatibility
    • Clear path for handling architecture changes between versions
    • Metadata can store transformation information for version migration
  5. Future-Proofing

    • Support for modern datatypes (FP16, BFloat16, FP8)
    • Growing ecosystem adoption ensures continued maintenance
    • Cross-framework support preserves our flexibility for future architecture changes
    • Efficient lazy loading for distributed settings

To steal a summary from the safetensors docs:

Format Safe Zero-copy Lazy loading No file size limit Layout control Flexibility Bfloat16/Fp8
pickle (PyTorch) 🗸 🗸 🗸
H5 (Tensorflow) 🗸 🗸 🗸 ~ ~
SavedModel (Tensorflow) 🗸 🗸 🗸 🗸
MsgPack (flax) 🗸 🗸 🗸 🗸
Protobuf (ONNX) 🗸 🗸
Cap'n'Proto 🗸 🗸 ~ 🗸 🗸 ~
Arrow ? ? ? ? ? ?
Numpy (npy,npz) 🗸 ? ? 🗸
pdparams (Paddle) 🗸 🗸 🗸
SafeTensors 🗸 🗸 🗸 🗸 🗸 🗸

While several alternatives provide some of the benefits we need, safetensors is the only solution that comprehensively addresses all our requirements: security, performance, lazy loading, and framework agnosticism.

The implementation costs are low compared to alternatives, with high ROI in terms of security improvements, performance gains, and enabling future architecture flexibility.

The format's adoption by major ML projects (Hugging Face, MLX, StabilityAI) demonstrates its viability for production environments. Its simplicity and focus on ML-specific requirements make it far more suitable than general-purpose formats or older ML solutions with known limitations.

For Anemoi's inference checkpoints, safetensors offers the most direct path to secure, efficient, and future-proof model storage with minimal implementation overhead.

Practical Implementation Considerations

  1. Migration Path

    • Established conversion patterns from pickle to safetensors
    • Conversion tools available (HF spaces, scripts) and adaptable
  2. Metadata Handling

    • JSON-based header enables simple parsing and extension
    • Ability to store traceability information vital for MLOps
  3. Weather-Specific Requirements

    • Format's simplicity allows for domain-specific metadata
    • Handles large tensors efficiently, which is important for weather models

Implementation Plan

Phase 1: Checkpoint Layout

  • Integrate safetensors into Anemoi training
  • Implement checkpoint saving utilities
  • Prepare model schema versioning

Phase 2: Metadata layout

  • Implement metadata layout in safetensors
  • Ensure compatibility with inference toolset
  • Verify full traceability incl. model versioning

Phase 3: Model Loading

  • Develop model loading capability for safetensors checkpoints
  • Create adapter layer for backward compatibility
  • Implement tensor mapping for architecture changes

Phase 4: Inference Integration and Interoperability

Phase 5: Testing & Migration

  • Benchmark performance
  • Validate all existing models with new system
  • Create migration guide for users
  • Create conversion utilities (pickle → safetensors)

Technical Details

  • Format: safetensors (see safetensors schema)
  • Metadata: Will include model version, architecture config, creation date (full traceability from dataset to inference)
  • Backward Compatibility: Will implement model schema versioning

Safetensors format from the docs

As for the model loading, I would suggest implementing a similar loading capability to huggingface.

class AnemoiModel:
    ...

    def load_model(self, path: str | Path):
        metadata, tensors = safe_load(path)
        ...
        self_initialise(metadata)
        ...
        super_smart_magic_loading_that_is_backwards_compatible(tensors)

Questions

  • Should we support both formats during a transition period?
  • How should we handle custom user extensions?

Resources

Additional Considerations

I believe this should be in line with design considerations and discussions with some member states during the on-boarding process.

I also believe, if implemented correctly this will solve multiple backwards compatibility issues, which would improve the overall health of the project and decrease tech debt. It could possibly even be a blueprint for certain aspects of training checkpoints, but this is out of scope for this specific work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

1 participant