You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using distributed or parallel set-up in script?: Nope
Information
The official example scripts
My own modified scripts
Reproduction
The behavior occurs when running this script with inference for CPU :
importosimportdatetimefromhuggingface_hubimporthf_hub_downloadfromsafetensors.torchimportload_fileimporttorchfromtransformersimportGPT2LMHeadModel, GPT2Tokenizer# Ignore warningsimportwarningswarnings.filterwarnings("ignore")
# Download model filessf_filename=hf_hub_download("gpt2", filename="model.safetensors")
pt_filename=hf_hub_download("gpt2", filename="pytorch_model.bin")
# Load safetensors weightsstart_st=datetime.datetime.now()
weights_st=load_file(sf_filename, device="cpu")
load_time_st=datetime.datetime.now() -start_stprint(f"Loaded safetensors {load_time_st}")
# Load pytorch weightsstart_pt=datetime.datetime.now()
weights_pt=torch.load(pt_filename, map_location="cpu")
load_time_pt=datetime.datetime.now() -start_ptprint(f"Loaded pytorch {load_time_pt}")
print(f"on CPU, safetensors is faster than pytorch by: {load_time_pt/load_time_st:.1f}X")
# Initialize tokenizer and modeltokenizer=GPT2Tokenizer.from_pretrained("gpt2")
# Benchmark inference with safetensorsmodel_st=GPT2LMHeadModel.from_pretrained("gpt2", state_dict=weights_st)
input_ids=tokenizer.encode("Hello, world!", return_tensors="pt")
start_infer_st=datetime.datetime.now()
output_st=model_st(input_ids)
infer_time_st=datetime.datetime.now() -start_infer_stprint(f"Inference time with safetensors: {infer_time_st}")
# Benchmark inference with pytorchmodel_pt=GPT2LMHeadModel.from_pretrained("gpt2", state_dict=weights_pt)
start_infer_pt=datetime.datetime.now()
output_pt=model_pt(input_ids)
infer_time_pt=datetime.datetime.now() -start_infer_ptprint(f"Inference time with pytorch: {infer_time_pt}")
print(f"on CPU, safetensors inference is faster than pytorch by: {infer_time_pt/infer_time_st:.1f}X")
Expected behavior
I ran the script multiple times but the last 2 times this happened:
python3 main.py
Loaded safetensors 0:00:00.021458
Loaded pytorch 0:00:00.256295
on CPU, safetensors is faster than pytorch by: 11.9X
Inference time with safetensors: 0:00:00.053771
Inference time with pytorch: 0:00:00.095637
on CPU, safetensors inference is faster than pytorch by: 1.8X
python3 main.py
Loaded safetensors 0:00:00.020366
Loaded pytorch 0:00:00.259216
on CPU, safetensors is faster than pytorch by: 12.7X
Inference time with safetensors: 0:00:00.099727
Inference time with pytorch: 0:00:00.094022
on CPU, safetensors inference is faster than pytorch by: 0.9X
-The Speed Up During Inference seems inconsistent for some reason. I will also attach a screenshot.
The text was updated successfully, but these errors were encountered:
System Info
transformers-cli env result :
transformers
version: 4.48.0Information
Reproduction
The behavior occurs when running this script with inference for CPU :
Expected behavior
I ran the script multiple times but the last 2 times this happened:
python3 main.py
Loaded safetensors 0:00:00.021458
Loaded pytorch 0:00:00.256295
on CPU, safetensors is faster than pytorch by: 11.9X
Inference time with safetensors: 0:00:00.053771
Inference time with pytorch: 0:00:00.095637
on CPU, safetensors inference is faster than pytorch by: 1.8X
python3 main.py
Loaded safetensors 0:00:00.020366
Loaded pytorch 0:00:00.259216
on CPU, safetensors is faster than pytorch by: 12.7X
Inference time with safetensors: 0:00:00.099727
Inference time with pytorch: 0:00:00.094022
on CPU, safetensors inference is faster than pytorch by: 0.9X
-The Speed Up During Inference seems inconsistent for some reason. I will also attach a screenshot.
The text was updated successfully, but these errors were encountered: