add building compute-runtime UMD in benchmarks jobs #2577

pbalcer · 2025-01-16T17:38:44Z

No description provided.

github-actions · 2025-01-16T17:40:35Z

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/12814518956

github-actions · 2025-01-16T18:14:05Z

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/12814518956
Job status: success. Test status: success.

Summary

Total 83 benchmarks in mean.
Geomean 98.284%.
Improved 4 Regressed 24 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (11): 98.747%

Benchmark	This PR	baseline	Relative perf	Change	-
api_overhead_benchmark_ur SubmitKernel out of order	15.705000 μs	15.896 μs	101.22%	1.22%	.
api_overhead_benchmark_ur SubmitKernel in order	16.558000 μs	16.663 μs	100.63%	0.63%	.
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.674000 μs	1.675 μs	100.06%	0.06%	.
api_overhead_benchmark_l0 SubmitKernel out of order	11.582 μs	11.528000 μs	99.53%	-0.47%	.
api_overhead_benchmark_sycl SubmitKernel in order	25.082 μs	24.844000 μs	99.05%	-0.95%	.
api_overhead_benchmark_sycl SubmitKernel out of order	23.944 μs	23.678000 μs	98.89%	-1.11%	.
api_overhead_benchmark_ur SubmitKernel out of order CPU count	105483.000 instr	101923.000000 instr	96.63%	-3.37%	-
api_overhead_benchmark_ur SubmitKernel in order CPU count	110835.000 instr	107041.000000 instr	96.58%	-3.42%	-
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	2.200 μs	2.118000 μs	96.27%	-3.73%	-
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	124015.000000 instr	-
api_overhead_benchmark_ur SubmitKernel in order with measure completion	21.533000 μs	-

Relative perf in group memory (4): 85.407%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	257.869 μs	253.805000 μs	98.42%	-1.58%	.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.056 GB/s	3.151000 GB/s	96.99%	-3.01%	-
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.962 μs	5.638000 μs	94.57%	-5.43%	-
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	225.520 μs	132.929000 μs	58.94%	-41.06%	----------

Relative perf in group miscellaneous (1): 100.000%

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	858.609000 bw GB/s	858.609 bw GB/s	100.00%	0.00%	.

Relative perf in group multithread (10): 98.165%

Benchmark	This PR	baseline	Relative perf	Change	-
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	16958.141000 μs	17316.620 μs	102.11%	2.11%	+
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	47437.164000 μs	47907.007 μs	100.99%	0.99%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	6891.437000 μs	6935.535 μs	100.64%	0.64%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	2043.209 μs	2022.915000 μs	99.01%	-0.99%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	8708.383 μs	8555.721000 μs	98.25%	-1.75%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	111169.587 μs	108338.415000 μs	97.45%	-2.55%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	1191.024 μs	1157.521000 μs	97.19%	-2.81%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	42636.515 μs	40973.625000 μs	96.10%	-3.90%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	26765.936 μs	25543.132000 μs	95.43%	-4.57%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	7865.020 μs	7452.758000 μs	94.76%	-5.24%	-

Relative perf in group Velocity-Bench (9): 99.521%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench Easywave	228.000000 ms	229.000 ms	100.44%	0.44%	.
Velocity-Bench QuickSilver	117.640000 MMS/CTT	117.490 MMS/CTT	100.13%	0.13%	.
Velocity-Bench Sobel Filter	603.242 ms	602.045000 ms	99.80%	-0.20%	.
Velocity-Bench Bitcracker	35.254 s	35.129800 s	99.65%	-0.35%	.
Velocity-Bench CudaSift	203.197 ms	201.142000 ms	98.99%	-1.01%	.
Velocity-Bench Hashtable	355.764 M keys/sec	362.504819 M keys/sec	98.14%	-1.86%	.
Velocity-Bench dl-cifar	-	23.743900 s
Velocity-Bench dl-mnist	-	2.720000 s
Velocity-Bench svm	-	0.139900 s

Relative perf in group Runtime (8): 100.459%

Benchmark	This PR	baseline	Relative perf	Change	-
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	271.946000 ms	278.916 ms	102.56%	2.56%	+
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	272.793000 ms	278.736 ms	102.18%	2.18%	+
Runtime_DAGTaskThroughput_BasicParallelFor	1724.390000 ms	1746.233 ms	101.27%	1.27%	.
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1707.455000 ms	1725.256 ms	101.04%	1.04%	.
Runtime_DAGTaskThroughput_SingleTask	1667.011000 ms	1678.732 ms	100.70%	0.70%	.
Runtime_DAGTaskThroughput_NDRangeParallelFor	1689.333000 ms	1695.816 ms	100.38%	0.38%	.
Runtime_IndependentDAGTaskThroughput_SingleTask	264.721 ms	259.395000 ms	97.99%	-2.01%	.
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	281.980 ms	275.382000 ms	97.66%	-2.34%	-

Relative perf in group MicroBench (14): 97.193%

Benchmark	This PR	baseline	Relative perf	Change	-
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	4.518000 ms	4.547 ms	100.64%	0.64%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	617.206000 ms	617.523 ms	100.05%	0.05%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	617.773000 ms	617.994 ms	100.04%	0.04%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	617.770000 ms	617.954 ms	100.03%	0.03%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	617.095000 ms	617.254 ms	100.03%	0.03%	.
MicroBench_LocalMem_int32_4096	29.902 ms	29.866000 ms	99.88%	-0.12%	.
MicroBench_LocalMem_fp32_4096	29.877 ms	29.833000 ms	99.85%	-0.15%	.
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	4.759 ms	4.702000 ms	98.80%	-1.20%	.
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	4.707 ms	4.574000 ms	97.17%	-2.83%	-
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	4.951 ms	4.781000 ms	96.57%	-3.43%	-
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	4.733 ms	4.414000 ms	93.26%	-6.74%	--
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	4.703 ms	4.322000 ms	91.90%	-8.10%	--
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	4.612 ms	4.238000 ms	91.89%	-8.11%	--
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	4.720 ms	4.317000 ms	91.46%	-8.54%	--

Relative perf in group Pattern (10): 99.958%

Benchmark	This PR	baseline	Relative perf	Change	-
Pattern_SegmentedReduction_Hierarchical_int32	11.584000 ms	11.599 ms	100.13%	0.13%	.
Pattern_SegmentedReduction_Hierarchical_int64	11.764000 ms	11.779 ms	100.13%	0.13%	.
Pattern_SegmentedReduction_NDRange_int16	2.263000 ms	2.264 ms	100.04%	0.04%	.
Pattern_SegmentedReduction_Hierarchical_int16	11.798000 ms	11.801 ms	100.03%	0.03%	.
Pattern_SegmentedReduction_Hierarchical_fp32	11.587000 ms	11.589 ms	100.02%	0.02%	.
Pattern_SegmentedReduction_NDRange_int32	2.164000 ms	2.164 ms	100.00%	0.00%	.
Pattern_SegmentedReduction_NDRange_int64	2.337 ms	2.336000 ms	99.96%	-0.04%	.
Pattern_SegmentedReduction_NDRange_fp32	2.166 ms	2.163000 ms	99.86%	-0.14%	.
Pattern_Reduction_Hierarchical_int32	16.440 ms	16.411000 ms	99.82%	-0.18%	.
Pattern_Reduction_NDRange_int32	16.228 ms	16.163000 ms	99.60%	-0.40%	.

Relative perf in group ScalarProduct (6): 99.867%

Benchmark	This PR	baseline	Relative perf	Change	-
ScalarProduct_NDRange_fp32	3.744000 ms	3.759 ms	100.40%	0.40%	.
ScalarProduct_Hierarchical_fp32	10.142000 ms	10.170 ms	100.28%	0.28%	.
ScalarProduct_Hierarchical_int64	11.487000 ms	11.490 ms	100.03%	0.03%	.
ScalarProduct_Hierarchical_int32	10.522000 ms	10.523 ms	100.01%	0.01%	.
ScalarProduct_NDRange_int64	5.460 ms	5.456000 ms	99.93%	-0.07%	.
ScalarProduct_NDRange_int32	3.787 ms	3.733000 ms	98.57%	-1.43%	.

Relative perf in group USM (7): 99.402%

Benchmark	This PR	baseline	Relative perf	Change	-
USM_Allocation_latency_fp32_shared	0.060000 ms	0.066 ms	110.00%	10.00%	++
USM_Allocation_latency_fp32_device	0.067000 ms	0.068 ms	101.49%	1.49%	.
USM_Allocation_latency_fp32_host	37.754000 ms	37.899 ms	100.38%	0.38%	.
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.853 ms	1.814000 ms	97.90%	-2.10%	-
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.720 ms	1.661000 ms	96.57%	-3.43%	-
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	1.097 ms	1.046000 ms	95.35%	-4.65%	-
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.259 ms	1.195000 ms	94.92%	-5.08%	-

Relative perf in group VectorAddition (3): 99.941%

Benchmark	This PR	baseline	Relative perf	Change	-
VectorAddition_int64	3.108000 ms	3.139 ms	101.00%	1.00%	.
VectorAddition_fp32	1.447 ms	1.445000 ms	99.86%	-0.14%	.
VectorAddition_int32	1.463 ms	1.448000 ms	98.97%	-1.03%	.

Relative perf in group Polybench (3): 100.191%

Benchmark	This PR	baseline	Relative perf	Change	-
Polybench_2mm	1.211000 ms	1.216 ms	100.41%	0.41%	.
Polybench_Atax	6.869000 ms	6.880 ms	100.16%	0.16%	.
Polybench_3mm	1.727000 ms	1.727 ms	100.00%	0.00%	.

Relative perf in group Kmeans (1): 100.287%

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	16.037000 ms	16.083 ms	100.29%	0.29%	.

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
LinearRegressionCoeff_fp32	840.429000 ms	-

Relative perf in group MolecularDynamics (1): 93.333%

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	0.030 ms	0.028000 ms	93.33%	-6.67%	--

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	2437.540000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	2184.030000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	2936.560000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	301.359000 ns	-

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	694.985000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	192.264000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	263.332000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	205.812000 ns	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	1206.980000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	1838.860000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3143.580000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	260.359000 ns	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	720.620000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	188.160000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	298.914000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	200.241000 ns	-

Relative perf in group alloc/min (4): cannot calculate

Benchmark	This PR	baseline
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	826.120000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	177.484000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	953.037000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	934.560000 ns	-

Relative perf in group multiple (24): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	30422.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	4349.980000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	138635.000000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	31499.300000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	1196650.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	157278.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	1255370.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	140676.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	42120.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	14824.500000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	75328.800000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	25524.100000 ns	-
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 glibc	-	32574.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 glibc	-	4128.530000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 glibc	-	138399.000000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 glibc	-	28197.400000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1161430.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	161766.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 os_provider	-	1166110.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 os_provider	-	141737.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42212.800000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	14889.200000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 scalable_pool<os_provider>	-	72778.500000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 scalable_pool<os_provider>	-	27538.700000 ns

Relative perf in group llama.cpp (6): cannot calculate

Benchmark	This PR	baseline
llama.cpp Prompt Processing Batched 128	-	838.869803 token/s
llama.cpp Text Generation Batched 128	-	63.338561 token/s
llama.cpp Prompt Processing Batched 256	-	872.377637 token/s
llama.cpp Text Generation Batched 256	-	63.361520 token/s
llama.cpp Prompt Processing Batched 512	-	434.541716 token/s
llama.cpp Text Generation Batched 512	-	63.295460 token/s

Relative perf in group alloc/max (20): cannot calculate

Benchmark	This PR	baseline
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 glibc	-	2589.180000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 glibc	-	710.936000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 glibc	-	1188.310000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 glibc	-	716.901000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 glibc	-	861.597000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 glibc	-	175.935000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 os_provider	-	2246.790000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 os_provider	-	187.819000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 os_provider	-	1690.250000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 os_provider	-	189.702000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	4441.700000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	256.696000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3268.220000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	306.439000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	299.852000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.534000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	263.904000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	197.833000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 scalable_pool<os_provider>	-	1051.720000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 scalable_pool<os_provider>	-	952.492000 ns

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00381339 s
bitcracker - total time for whole calculation: 35.2537 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1255 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1094 1264 29.704% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1089 1251 29.5683% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1261 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1271 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1111 1254 30.1656% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1270 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1108 1280 30.0842% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1264 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1258 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1186 1255 32.202% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1270 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1253 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1206 1258 32.745% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1258 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1083 1266 29.4054% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1090 1262 29.5954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1266 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1120 1259 30.41% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1259 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1204 1257 32.6907% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1277 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1276 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1053 1267 28.5908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1062 1259 28.8352% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1127 1263 30.6001% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1045 1269 28.3736% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1254 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1274 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1102 1271 29.9213% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1082 1258 29.3782% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1126 1266 30.5729% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1097 1255 29.7855% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1132 1267 30.7358% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1266 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1264 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1258 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1257 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1212 1248 32.908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1118 1266 30.3557% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1080 1265 29.3239% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1266 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1263 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 203.197 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.334930e-01 6.279490e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.654610e-01 7.708770e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.621910e-01 7.803650e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.612940e-01 8.338790e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.323040e-01 7.907160e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.320770e-01 7.641590e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.314860e-01 7.637800e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.313930e-01 7.842600e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.331700e-01 7.826970e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.316040e-01 7.594260e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.117e+07 1.117e+07 1.117e+07 0.000e+00 100.00
cycleInit 10 3.514e+06 3.514e+06 3.514e+06 0.000e+00 100.00
cycleTracking 10 7.658e+06 7.658e+06 7.658e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.927e+06 4.927e+06 4.927e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.074e+05 2.074e+05 2.074e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.100e+02 4.100e+02 4.100e+02 0.000e+00 100.00
Figure Of Merit 117.64 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2437.54,1817.97,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,688.508,688.505,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1184.15,1093.31,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,695.751,695.754,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,753.346,727.617,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.995,173.994,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2184.03,2182.57,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,192.264,192.22,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1870.97,1870.63,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.16,188.155,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3328.5,3279.48,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,262.784,262.778,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3410.01,3357.55,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,294.458,294.451,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,301.359,298.08,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,214.39,214.39,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,260.359,259.494,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,202.78,202.775,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,953.037,949.463,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,934.56,934.549,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30027,28169.9,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4349.98,4349.83,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,137289,86382,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,31499.3,31499.1,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.27019e+06,1.26961e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,157278,157277,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.27141e+06,1.27046e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,140676,140669,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41618.2,40749.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14590.7,14590.4,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,73967.8,73947,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25524.1,25523.4,ns,,,,,

github-actions · 2025-01-17T08:55:40Z

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/12825782769

github-actions · 2025-01-17T09:27:50Z

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/12825782769
Job status: success. Test status: success.

Summary

Total 83 benchmarks in mean.
Geomean 97.918%.
Improved 7 Regressed 29 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 98.992%

Benchmark	This PR	baseline	Relative perf	Change	-
api_overhead_benchmark_ur SubmitKernel in order	15.982000 μs	16.663 μs	104.26%	4.26%	+
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.655000 μs	1.675 μs	101.21%	1.21%	.
api_overhead_benchmark_ur SubmitKernel out of order	15.750000 μs	15.896 μs	100.93%	0.93%	.
api_overhead_benchmark_l0 SubmitKernel out of order	11.672 μs	11.528000 μs	98.77%	-1.23%	.
api_overhead_benchmark_sycl SubmitKernel in order	25.375 μs	24.844000 μs	97.91%	-2.09%	-
api_overhead_benchmark_ur SubmitKernel out of order CPU count	104663.000 instr	101923.000000 instr	97.38%	-2.62%	-
api_overhead_benchmark_ur SubmitKernel in order CPU count	110006.000 instr	107041.000000 instr	97.30%	-2.70%	-
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	2.187 μs	2.118000 μs	96.84%	-3.16%	-
api_overhead_benchmark_sycl SubmitKernel out of order	24.514 μs	23.678000 μs	96.59%	-3.41%	-
api_overhead_benchmark_l0 SubmitKernel in order	11.162000 μs	-
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	123353.000000 instr	-
api_overhead_benchmark_ur SubmitKernel in order with measure completion	21.568000 μs	-

Relative perf in group memory (4): 86.496%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	255.419 μs	253.805000 μs	99.37%	-0.63%	.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.039 GB/s	3.151000 GB/s	96.45%	-3.55%	-
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.853 μs	5.638000 μs	96.33%	-3.67%	-
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	219.235 μs	132.929000 μs	60.63%	-39.37%	----------

Relative perf in group miscellaneous (1): 106.585%

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	805.564000 bw GB/s	858.609 bw GB/s	106.58%	6.58%	++

Relative perf in group multithread (10): 97.484%

Benchmark	This PR	baseline	Relative perf	Change	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	46713.383000 μs	47907.007 μs	102.56%	2.56%	+
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	17003.876000 μs	17316.620 μs	101.84%	1.84%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	6972.475 μs	6935.535000 μs	99.47%	-0.53%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	2059.267 μs	2022.915000 μs	98.23%	-1.77%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	42021.477 μs	40973.625000 μs	97.51%	-2.49%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	1203.256 μs	1157.521000 μs	96.20%	-3.80%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	8924.850 μs	8555.721000 μs	95.86%	-4.14%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	7847.707 μs	7452.758000 μs	94.97%	-5.03%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	114323.514 μs	108338.415000 μs	94.76%	-5.24%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	27217.618 μs	25543.132000 μs	93.85%	-6.15%	--

Relative perf in group Velocity-Bench (9): 99.365%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench QuickSilver	117.900000 MMS/CTT	117.490 MMS/CTT	100.35%	0.35%	.
Velocity-Bench Easywave	229.000000 ms	229.000 ms	100.00%	0.00%	.
Velocity-Bench Bitcracker	35.177 s	35.129800 s	99.87%	-0.13%	.
Velocity-Bench CudaSift	203.399 ms	201.142000 ms	98.89%	-1.11%	.
Velocity-Bench Sobel Filter	610.897 ms	602.045000 ms	98.55%	-1.45%	.
Velocity-Bench Hashtable	357.248 M keys/sec	362.504819 M keys/sec	98.55%	-1.45%	.
Velocity-Bench dl-cifar	-	23.743900 s
Velocity-Bench dl-mnist	-	2.720000 s
Velocity-Bench svm	-	0.139900 s

Relative perf in group Runtime (8): 99.366%

Benchmark	This PR	baseline	Relative perf	Change	-
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	276.096000 ms	278.736 ms	100.96%	0.96%	.
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	276.307000 ms	278.916 ms	100.94%	0.94%	.
Runtime_IndependentDAGTaskThroughput_SingleTask	259.484 ms	259.395000 ms	99.97%	-0.03%	.
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1726.594 ms	1725.256000 ms	99.92%	-0.08%	.
Runtime_DAGTaskThroughput_SingleTask	1682.600 ms	1678.732000 ms	99.77%	-0.23%	.
Runtime_DAGTaskThroughput_NDRangeParallelFor	1705.339 ms	1695.816000 ms	99.44%	-0.56%	.
Runtime_DAGTaskThroughput_BasicParallelFor	1763.156 ms	1746.233000 ms	99.04%	-0.96%	.
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	289.823 ms	275.382000 ms	95.02%	-4.98%	-

Relative perf in group MicroBench (14): 94.368%

Benchmark	This PR	baseline	Relative perf	Change	-
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	617.273000 ms	617.523 ms	100.04%	0.04%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	617.915000 ms	617.994 ms	100.01%	0.01%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	617.953000 ms	617.954 ms	100.00%	0.00%	.
MicroBench_LocalMem_int32_4096	29.867 ms	29.866000 ms	100.00%	-0.00%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	617.297 ms	617.254000 ms	99.99%	-0.01%	.
MicroBench_LocalMem_fp32_4096	29.903 ms	29.833000 ms	99.77%	-0.23%	.
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	5.067 ms	4.781000 ms	94.36%	-5.64%	-
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	4.849 ms	4.547000 ms	93.77%	-6.23%	--
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	4.816 ms	4.414000 ms	91.65%	-8.35%	--
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	4.731 ms	4.322000 ms	91.35%	-8.65%	--
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	5.013 ms	4.574000 ms	91.24%	-8.76%	--
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	4.735 ms	4.317000 ms	91.17%	-8.83%	--
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	5.209 ms	4.702000 ms	90.27%	-9.73%	--
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	5.298 ms	4.238000 ms	79.99%	-20.01%	-----

Relative perf in group Pattern (10): 102.072%

Benchmark	This PR	baseline	Relative perf	Change	-
Pattern_Reduction_Hierarchical_int32	14.720000 ms	16.411 ms	111.49%	11.49%	+++
Pattern_Reduction_NDRange_int32	14.547000 ms	16.163 ms	111.11%	11.11%	+++
Pattern_SegmentedReduction_Hierarchical_int32	11.590000 ms	11.599 ms	100.08%	0.08%	.
Pattern_SegmentedReduction_Hierarchical_fp32	11.594 ms	11.589000 ms	99.96%	-0.04%	.
Pattern_SegmentedReduction_NDRange_int16	2.265 ms	2.264000 ms	99.96%	-0.04%	.
Pattern_SegmentedReduction_Hierarchical_int64	11.785 ms	11.779000 ms	99.95%	-0.05%	.
Pattern_SegmentedReduction_Hierarchical_int16	11.808 ms	11.801000 ms	99.94%	-0.06%	.
Pattern_SegmentedReduction_NDRange_int64	2.338 ms	2.336000 ms	99.91%	-0.09%	.
Pattern_SegmentedReduction_NDRange_fp32	2.168 ms	2.163000 ms	99.77%	-0.23%	.
Pattern_SegmentedReduction_NDRange_int32	2.174 ms	2.164000 ms	99.54%	-0.46%	.

Relative perf in group ScalarProduct (6): 99.931%

Benchmark	This PR	baseline	Relative perf	Change	-
ScalarProduct_Hierarchical_int64	11.458000 ms	11.490 ms	100.28%	0.28%	.
ScalarProduct_Hierarchical_int32	10.516000 ms	10.523 ms	100.07%	0.07%	.
ScalarProduct_Hierarchical_fp32	10.170000 ms	10.170 ms	100.00%	0.00%	.
ScalarProduct_NDRange_fp32	3.761 ms	3.759000 ms	99.95%	-0.05%	.
ScalarProduct_NDRange_int32	3.743 ms	3.733000 ms	99.73%	-0.27%	.
ScalarProduct_NDRange_int64	5.480 ms	5.456000 ms	99.56%	-0.44%	.

Relative perf in group USM (7): 97.014%

Benchmark	This PR	baseline	Relative perf	Change	-
USM_Allocation_latency_fp32_host	37.748000 ms	37.899 ms	100.40%	0.40%	.
USM_Allocation_latency_fp32_shared	0.066000 ms	0.066 ms	100.00%	0.00%	.
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.701 ms	1.661000 ms	97.65%	-2.35%	-
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.873 ms	1.814000 ms	96.85%	-3.15%	-
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	1.093 ms	1.046000 ms	95.70%	-4.30%	-
USM_Allocation_latency_fp32_device	0.072 ms	0.068000 ms	94.44%	-5.56%	-
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.268 ms	1.195000 ms	94.24%	-5.76%	-

Relative perf in group VectorAddition (3): 101.177%

Benchmark	This PR	baseline	Relative perf	Change	-
VectorAddition_int64	3.056000 ms	3.139 ms	102.72%	2.72%	+
VectorAddition_int32	1.440000 ms	1.448 ms	100.56%	0.56%	.
VectorAddition_fp32	1.441000 ms	1.445 ms	100.28%	0.28%	.

Relative perf in group Polybench (3): 100.864%

Benchmark	This PR	baseline	Relative perf	Change	-
Polybench_Atax	6.713000 ms	6.880 ms	102.49%	2.49%	+
Polybench_2mm	1.211000 ms	1.216 ms	100.41%	0.41%	.
Polybench_3mm	1.732 ms	1.727000 ms	99.71%	-0.29%	.

Relative perf in group Kmeans (1): 100.274%

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	16.039000 ms	16.083 ms	100.27%	0.27%	.

Relative perf in group MolecularDynamics (1): 96.552%

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	0.029 ms	0.028000 ms	96.55%	-3.45%	-

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	2738.350000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	2018.430000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3019.300000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	297.523000 ns	-

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	723.083000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	191.762000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	270.528000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	216.749000 ns	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	1257.310000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	1834.630000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3343.820000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	254.273000 ns	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	737.976000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	191.662000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	285.897000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	200.089000 ns	-

Relative perf in group alloc/min (4): cannot calculate

Benchmark	This PR	baseline
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	818.174000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	177.341000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	964.486000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	1045.300000 ns	-

Relative perf in group multiple (24): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	35693.200000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	4319.450000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	140646.000000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	31390.100000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	1167170.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	156232.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	1191460.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	138974.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	43158.900000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	15529.200000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	75604.000000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	27973.700000 ns	-
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 glibc	-	32574.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 glibc	-	4128.530000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 glibc	-	138399.000000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 glibc	-	28197.400000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1161430.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	161766.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 os_provider	-	1166110.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 os_provider	-	141737.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42212.800000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	14889.200000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 scalable_pool<os_provider>	-	72778.500000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 scalable_pool<os_provider>	-	27538.700000 ns

Relative perf in group llama.cpp (6): cannot calculate

Benchmark	This PR	baseline
llama.cpp Prompt Processing Batched 128	-	838.869803 token/s
llama.cpp Text Generation Batched 128	-	63.338561 token/s
llama.cpp Prompt Processing Batched 256	-	872.377637 token/s
llama.cpp Text Generation Batched 256	-	63.361520 token/s
llama.cpp Prompt Processing Batched 512	-	434.541716 token/s
llama.cpp Text Generation Batched 512	-	63.295460 token/s

Relative perf in group alloc/max (20): cannot calculate

Benchmark	This PR	baseline
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 glibc	-	2589.180000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 glibc	-	710.936000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 glibc	-	1188.310000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 glibc	-	716.901000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 glibc	-	861.597000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 glibc	-	175.935000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 os_provider	-	2246.790000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 os_provider	-	187.819000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 os_provider	-	1690.250000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 os_provider	-	189.702000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	4441.700000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	256.696000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3268.220000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	306.439000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	299.852000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.534000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	263.904000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	197.833000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 scalable_pool<os_provider>	-	1051.720000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 scalable_pool<os_provider>	-	952.492000 ns

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00402114 s
bitcracker - total time for whole calculation: 35.177 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1254 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1274 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1140 1275 30.953% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1273 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1271 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1130 1273 30.6815% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1256 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1253 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1125 1265 30.5458% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1256 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1273 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1093 1267 29.6769% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1085 1258 29.4597% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1214 1251 32.9623% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1094 1267 29.704% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1081 1253 29.3511% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1086 1269 29.4868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1122 1270 30.4643% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1119 1273 30.3828% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1254 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1111 1273 30.1656% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1077 1268 29.2425% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1253 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1272 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1107 1259 30.057% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1273 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1268 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1152 1262 31.2788% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1208 1261 32.7993% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1269 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1066 1262 28.9438% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1144 1272 31.0616% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1272 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1266 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1276 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1254 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1126 1269 30.5729% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1083 1255 29.4054% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1079 1264 29.2968% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1261 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1165 1267 31.6318% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1268 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1132 1273 30.7358% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1261 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1212 1244 32.908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1094 1267 29.704% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 203.399 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.32224)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.390690e-01 6.297750e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.738790e-01 7.622130e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.439430e-01 7.783380e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.669560e-01 8.332190e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.386240e-01 7.892990e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.421230e-01 7.638260e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.430190e-01 7.623900e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.405080e-01 7.822750e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.421380e-01 7.819560e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.405340e-01 7.577380e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.121e+07 1.121e+07 1.121e+07 0.000e+00 100.00
cycleInit 10 3.571e+06 3.571e+06 3.571e+06 0.000e+00 100.00
cycleTracking 10 7.641e+06 7.641e+06 7.641e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.916e+06 4.916e+06 4.916e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.013e+05 2.013e+05 2.013e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.010e+02 4.010e+02 4.010e+02 0.000e+00 100.00
Figure Of Merit 117.90 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2671.2,1996.49,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,728.916,728.863,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1257.31,1135.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,737.976,737.937,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,836.689,760.399,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.341,177.074,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2018.43,2017.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.389,194.382,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.32,1877.71,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,191.662,191.657,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3019.3,2972.1,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,270.528,270.475,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3343.82,3301.6,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,282.153,282.147,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,293.94,291.856,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,216.749,216.746,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,254.273,248.566,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,192.063,192.058,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1079.66,1051.36,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,1045.3,1045.25,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,36899,34664.8,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4319.45,4319.35,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,142032,89595.3,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,31390.1,31389.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.16717e+06,1.16672e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,155031,155027,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19146e+06,1.19112e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,138974,138970,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,43158.9,42398.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14773.8,14773.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,75604,75115.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,27973.7,27973.1,ns,,,,,

github-actions · 2025-01-17T09:43:30Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12826547948

github-actions · 2025-01-17T09:47:38Z

Compute Benchmarks level_zero run (--filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12826547948
Job status: failure. Test status: failure.

github-actions · 2025-01-17T09:53:46Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12826701422

github-actions · 2025-01-17T09:57:51Z

Compute Benchmarks level_zero run (--filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12826701422
Job status: failure. Test status: failure.

github-actions · 2025-01-17T10:10:21Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12826967462

github-actions · 2025-01-17T10:19:27Z

Compute Benchmarks level_zero run (--filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12826967462
Job status: failure. Test status: failure.

github-actions · 2025-01-17T10:22:07Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12827163375

github-actions · 2025-01-17T10:34:22Z

Compute Benchmarks level_zero run (--filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12827163375
Job status: success. Test status: success.

Summary

Total 7 benchmarks in mean.
Geomean 99.840%.
Improved 1 Regressed 1 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group Velocity-Bench (9): 99.840%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench svm	0.137000 s	0.140 s	102.12%	2.12%	+++++++++
Velocity-Bench QuickSilver	118.840000 MMS/CTT	117.490 MMS/CTT	101.15%	1.15%	.
Velocity-Bench Bitcracker	35.029000 s	35.130 s	100.29%	0.29%	.
Velocity-Bench Easywave	230.000 ms	229.000000 ms	99.57%	-0.43%	.
Velocity-Bench CudaSift	202.856 ms	201.142000 ms	99.16%	-0.84%	.
Velocity-Bench Hashtable	358.943 M keys/sec	362.504819 M keys/sec	99.02%	-0.98%	.
Velocity-Bench Sobel Filter	616.496 ms	602.045000 ms	97.66%	-2.34%	----------
Velocity-Bench dl-cifar	-	23.743900 s
Velocity-Bench dl-mnist	-	2.720000 s

Relative perf in group api (9): cannot calculate

Benchmark	This PR	baseline
api_overhead_benchmark_l0 SubmitKernel out of order	-	11.528000 μs
api_overhead_benchmark_sycl SubmitKernel out of order	-	23.678000 μs
api_overhead_benchmark_sycl SubmitKernel in order	-	24.844000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	-	2.118000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	-	1.675000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count	-	101923.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order	-	15.896000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count	-	107041.000000 instr
api_overhead_benchmark_ur SubmitKernel in order	-	16.663000 μs

Relative perf in group memory (4): cannot calculate

Benchmark	This PR	baseline
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	-	253.805000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	-	132.929000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	-	5.638000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	-	3.151000 GB/s

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	-	858.609000 bw GB/s

Relative perf in group multithread (10): cannot calculate

Benchmark	This PR	baseline
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	-	6935.535000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	-	17316.620000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	-	47907.007000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	-	2022.915000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	-	7452.758000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	-	8555.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	-	25543.132000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	-	1157.521000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	40973.625000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	108338.415000 μs

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline
Runtime_IndependentDAGTaskThroughput_SingleTask	-	259.395000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	-	275.382000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	-	278.916000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	-	278.736000 ms
Runtime_DAGTaskThroughput_SingleTask	-	1678.732000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	-	1746.233000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	-	1725.256000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	-	1695.816000 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	-	4.238000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	-	4.317000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	-	4.322000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	-	4.414000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	-	617.994000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	-	617.954000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	-	4.547000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	-	4.781000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	-	4.574000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	-	4.702000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	-	617.523000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	-	617.254000 ms
MicroBench_LocalMem_int32_4096	-	29.866000 ms
MicroBench_LocalMem_fp32_4096	-	29.833000 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline
Pattern_Reduction_NDRange_int32	-	16.163000 ms
Pattern_Reduction_Hierarchical_int32	-	16.411000 ms
Pattern_SegmentedReduction_NDRange_int16	-	2.264000 ms
Pattern_SegmentedReduction_NDRange_int32	-	2.164000 ms
Pattern_SegmentedReduction_NDRange_int64	-	2.336000 ms
Pattern_SegmentedReduction_NDRange_fp32	-	2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16	-	11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32	-	11.599000 ms
Pattern_SegmentedReduction_Hierarchical_int64	-	11.779000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	-	11.589000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline
ScalarProduct_NDRange_int32	-	3.733000 ms
ScalarProduct_NDRange_int64	-	5.456000 ms
ScalarProduct_NDRange_fp32	-	3.759000 ms
ScalarProduct_Hierarchical_int32	-	10.523000 ms
ScalarProduct_Hierarchical_int64	-	11.490000 ms
ScalarProduct_Hierarchical_fp32	-	10.170000 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline
USM_Allocation_latency_fp32_device	-	0.068000 ms
USM_Allocation_latency_fp32_host	-	37.899000 ms
USM_Allocation_latency_fp32_shared	-	0.066000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	-	1.661000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	-	1.046000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	-	1.814000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	-	1.195000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline
VectorAddition_int32	-	1.448000 ms
VectorAddition_int64	-	3.139000 ms
VectorAddition_fp32	-	1.445000 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline
Polybench_2mm	-	1.216000 ms
Polybench_3mm	-	1.727000 ms
Polybench_Atax	-	6.880000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	-	16.083000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	-	0.028000 ms

Relative perf in group llama.cpp (6): cannot calculate

Benchmark	This PR	baseline
llama.cpp Prompt Processing Batched 128	-	838.869803 token/s
llama.cpp Text Generation Batched 128	-	63.338561 token/s
llama.cpp Prompt Processing Batched 256	-	872.377637 token/s
llama.cpp Text Generation Batched 256	-	63.361520 token/s
llama.cpp Prompt Processing Batched 512	-	434.541716 token/s
llama.cpp Text Generation Batched 512	-	63.295460 token/s

Relative perf in group alloc/max (20): cannot calculate

Benchmark	This PR	baseline
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 glibc	-	2589.180000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 glibc	-	710.936000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 glibc	-	1188.310000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 glibc	-	716.901000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 glibc	-	861.597000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 glibc	-	175.935000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 os_provider	-	2246.790000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 os_provider	-	187.819000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 os_provider	-	1690.250000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 os_provider	-	189.702000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	4441.700000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	256.696000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3268.220000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	306.439000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	299.852000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.534000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	263.904000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	197.833000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 scalable_pool<os_provider>	-	1051.720000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 scalable_pool<os_provider>	-	952.492000 ns

Relative perf in group multiple (12): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 glibc	-	32574.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 glibc	-	4128.530000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 glibc	-	138399.000000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 glibc	-	28197.400000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1161430.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	161766.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 os_provider	-	1166110.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 os_provider	-	141737.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42212.800000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	14889.200000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 scalable_pool<os_provider>	-	72778.500000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 scalable_pool<os_provider>	-	27538.700000 ns

Details

Benchmark details - environment, command, output...

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.373925 s
358.942858 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00411444 s
bitcracker - total time for whole calculation: 35.029 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1263 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1256 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1248 1284 33.8854% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1267 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1267 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1267 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1259 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1269 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1253 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1162 1268 31.5504% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1268 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1096 1266 29.7583% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1249 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1252 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1273 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1271 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1252 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1151 1260 31.2517% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1259 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1081 1270 29.3511% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1080 1253 29.3239% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1120 1260 30.41% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1258 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1259 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1261 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1269 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1257 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1270 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1268 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1256 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1276 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1204 1271 32.6907% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1273 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1067 1272 28.9709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1252 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1066 1264 28.9438% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1094 1271 29.704% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1164 1259 31.6047% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1260 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1272 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1154 1270 31.3332% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1260 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1256 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1054 1270 28.618% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1097 1268 29.7855% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1073 1267 29.1339% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 202.856 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.732040e-01 6.035540e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.386820e-01 7.433280e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.354010e-01 7.606540e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.671940e-01 8.153080e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.340240e-01 7.877320e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.326670e-01 7.637150e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.337340e-01 7.621170e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.335410e-01 7.835010e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.354200e-01 7.854990e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.340620e-01 7.749550e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.100e+07 1.100e+07 1.100e+07 0.000e+00 100.00
cycleInit 10 3.418e+06 3.418e+06 3.418e+06 0.000e+00 100.00
cycleTracking 10 7.580e+06 7.580e+06 7.580e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.917e+06 4.917e+06 4.917e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.934e+05 1.934e+05 1.934e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.000e+02 4.000e+02 4.000e+02 0.000e+00 100.00
Figure Of Merit 118.84 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.54385 s
sobelfilter - total time for whole calculation: 0.616496 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Output:

Number of args 3
Using cuSVM (Carpenter)...

Buffering input text file (6989624 B).
Load Done
Starting Training
_C 1.000000
Workgroup Size: 1024
nbrCtas 80
elemsPerCta 1248
threadsPerCta 128
Total run time: 0.065121 seconds
Iter:100
M:97683
N:123
Train done. Calulate Vector counts
Training done

Loading elapsed time : 0.0646 s
Processing elapsed time : 0.0703 s
Storing elapsed time : 0.0021 s
Total elapsed time : 0.1370 s
Result's are correct: 0.0551

github-actions · 2025-01-17T11:02:05Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12827777489

github-actions · 2025-01-17T11:07:37Z

Compute Benchmarks level_zero run (--filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12827777489
Job status: success. Test status: success.

github-actions · 2025-01-17T11:09:27Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12827897391

github-actions · 2025-01-17T11:15:54Z

Compute Benchmarks level_zero run (--filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12827897391
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (9): cannot calculate

Benchmark	This PR	baseline
api_overhead_benchmark_l0 SubmitKernel out of order	-	11.528000 μs
api_overhead_benchmark_sycl SubmitKernel out of order	-	23.678000 μs
api_overhead_benchmark_sycl SubmitKernel in order	-	24.844000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	-	2.118000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	-	1.675000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count	-	101923.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order	-	15.896000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count	-	107041.000000 instr
api_overhead_benchmark_ur SubmitKernel in order	-	16.663000 μs

Relative perf in group memory (4): cannot calculate

Benchmark	This PR	baseline
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	-	253.805000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	-	132.929000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	-	5.638000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	-	3.151000 GB/s

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	-	858.609000 bw GB/s

Relative perf in group multithread (10): cannot calculate

Benchmark	This PR	baseline
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	-	6935.535000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	-	17316.620000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	-	47907.007000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	-	2022.915000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	-	7452.758000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	-	8555.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	-	25543.132000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	-	1157.521000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	40973.625000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	108338.415000 μs

Relative perf in group Velocity-Bench (9): cannot calculate

Benchmark	This PR	baseline
Velocity-Bench Hashtable	-	362.504819 M keys/sec
Velocity-Bench Bitcracker	-	35.129800 s
Velocity-Bench CudaSift	-	201.142000 ms
Velocity-Bench Easywave	-	229.000000 ms
Velocity-Bench QuickSilver	-	117.490000 MMS/CTT
Velocity-Bench Sobel Filter	-	602.045000 ms
Velocity-Bench dl-cifar	-	23.743900 s
Velocity-Bench dl-mnist	-	2.720000 s
Velocity-Bench svm	-	0.139900 s

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline
Runtime_IndependentDAGTaskThroughput_SingleTask	-	259.395000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	-	275.382000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	-	278.916000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	-	278.736000 ms
Runtime_DAGTaskThroughput_SingleTask	-	1678.732000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	-	1746.233000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	-	1725.256000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	-	1695.816000 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	-	4.238000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	-	4.317000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	-	4.322000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	-	4.414000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	-	617.994000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	-	617.954000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	-	4.547000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	-	4.781000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	-	4.574000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	-	4.702000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	-	617.523000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	-	617.254000 ms
MicroBench_LocalMem_int32_4096	-	29.866000 ms
MicroBench_LocalMem_fp32_4096	-	29.833000 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline
Pattern_Reduction_NDRange_int32	-	16.163000 ms
Pattern_Reduction_Hierarchical_int32	-	16.411000 ms
Pattern_SegmentedReduction_NDRange_int16	-	2.264000 ms
Pattern_SegmentedReduction_NDRange_int32	-	2.164000 ms
Pattern_SegmentedReduction_NDRange_int64	-	2.336000 ms
Pattern_SegmentedReduction_NDRange_fp32	-	2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16	-	11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32	-	11.599000 ms
Pattern_SegmentedReduction_Hierarchical_int64	-	11.779000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	-	11.589000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline
ScalarProduct_NDRange_int32	-	3.733000 ms
ScalarProduct_NDRange_int64	-	5.456000 ms
ScalarProduct_NDRange_fp32	-	3.759000 ms
ScalarProduct_Hierarchical_int32	-	10.523000 ms
ScalarProduct_Hierarchical_int64	-	11.490000 ms
ScalarProduct_Hierarchical_fp32	-	10.170000 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline
USM_Allocation_latency_fp32_device	-	0.068000 ms
USM_Allocation_latency_fp32_host	-	37.899000 ms
USM_Allocation_latency_fp32_shared	-	0.066000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	-	1.661000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	-	1.046000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	-	1.814000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	-	1.195000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline
VectorAddition_int32	-	1.448000 ms
VectorAddition_int64	-	3.139000 ms
VectorAddition_fp32	-	1.445000 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline
Polybench_2mm	-	1.216000 ms
Polybench_3mm	-	1.727000 ms
Polybench_Atax	-	6.880000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	-	16.083000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	-	0.028000 ms

Relative perf in group llama.cpp (6): cannot calculate

Benchmark	This PR	baseline
llama.cpp Prompt Processing Batched 128	-	838.869803 token/s
llama.cpp Text Generation Batched 128	-	63.338561 token/s
llama.cpp Prompt Processing Batched 256	-	872.377637 token/s
llama.cpp Text Generation Batched 256	-	63.361520 token/s
llama.cpp Prompt Processing Batched 512	-	434.541716 token/s
llama.cpp Text Generation Batched 512	-	63.295460 token/s

Relative perf in group alloc/max (20): cannot calculate

Benchmark	This PR	baseline
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 glibc	-	2589.180000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 glibc	-	710.936000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 glibc	-	1188.310000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 glibc	-	716.901000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 glibc	-	861.597000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 glibc	-	175.935000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 os_provider	-	2246.790000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 os_provider	-	187.819000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 os_provider	-	1690.250000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 os_provider	-	189.702000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	4441.700000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	256.696000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3268.220000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	306.439000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	299.852000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.534000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	263.904000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	197.833000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 scalable_pool<os_provider>	-	1051.720000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 scalable_pool<os_provider>	-	952.492000 ns

Relative perf in group multiple (12): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 glibc	-	32574.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 glibc	-	4128.530000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 glibc	-	138399.000000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 glibc	-	28197.400000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1161430.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	161766.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 os_provider	-	1166110.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 os_provider	-	141737.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42212.800000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	14889.200000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 scalable_pool<os_provider>	-	72778.500000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 scalable_pool<os_provider>	-	27538.700000 ns

Details

Benchmark details - environment, command, output...

github-actions · 2025-01-17T14:22:09Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12830785458

github-actions · 2025-01-17T14:34:12Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12831096136

github-actions · 2025-01-17T14:52:36Z

Compute Benchmarks level_zero run (--filter "Velocity|llama"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12831096136
Job status: success. Test status: success.

Summary

Total 15 benchmarks in mean.
Geomean 98.711%.
Improved 1 Regressed 2 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group Velocity-Bench (9): 98.502%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench dl-mnist	2.380000 s	2.720 s	114.29%	14.29%	+++++++
Velocity-Bench Bitcracker	35.024000 s	35.130 s	100.30%	0.30%	.
Velocity-Bench QuickSilver	117.820000 MMS/CTT	117.490 MMS/CTT	100.28%	0.28%	.
Velocity-Bench dl-cifar	23.768 s	23.743900 s	99.90%	-0.10%	.
Velocity-Bench Sobel Filter	603.153 ms	602.045000 ms	99.82%	-0.18%	.
Velocity-Bench svm	0.140 s	0.139900 s	99.79%	-0.21%	.
Velocity-Bench Hashtable	357.004 M keys/sec	362.504819 M keys/sec	98.48%	-1.52%	.
Velocity-Bench CudaSift	204.243 ms	201.142000 ms	98.48%	-1.52%	.
Velocity-Bench Easywave	291.000 ms	229.000000 ms	78.69%	-21.31%	----------

Relative perf in group llama.cpp (6): 99.024%

Benchmark	This PR	baseline	Relative perf	Change	-
llama.cpp Prompt Processing Batched 256	876.139255 token/s	872.378 token/s	100.43%	0.43%	.
llama.cpp Prompt Processing Batched 512	433.964 token/s	434.541716 token/s	99.87%	-0.13%	.
llama.cpp Text Generation Batched 512	62.489 token/s	63.295460 token/s	98.73%	-1.27%	.
llama.cpp Text Generation Batched 128	62.505 token/s	63.338561 token/s	98.68%	-1.32%	.
llama.cpp Text Generation Batched 256	62.508 token/s	63.361520 token/s	98.65%	-1.35%	.
llama.cpp Prompt Processing Batched 128	820.469 token/s	838.869803 token/s	97.81%	-2.19%	-

Relative perf in group api (9): cannot calculate

Benchmark	This PR	baseline
api_overhead_benchmark_l0 SubmitKernel out of order	-	11.528000 μs
api_overhead_benchmark_sycl SubmitKernel out of order	-	23.678000 μs
api_overhead_benchmark_sycl SubmitKernel in order	-	24.844000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	-	2.118000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	-	1.675000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count	-	101923.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order	-	15.896000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count	-	107041.000000 instr
api_overhead_benchmark_ur SubmitKernel in order	-	16.663000 μs

Relative perf in group memory (4): cannot calculate

Benchmark	This PR	baseline
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	-	253.805000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	-	132.929000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	-	5.638000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	-	3.151000 GB/s

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	-	858.609000 bw GB/s

Relative perf in group multithread (10): cannot calculate

Benchmark	This PR	baseline
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	-	6935.535000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	-	17316.620000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	-	47907.007000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	-	2022.915000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	-	7452.758000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	-	8555.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	-	25543.132000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	-	1157.521000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	40973.625000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	108338.415000 μs

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline
Runtime_IndependentDAGTaskThroughput_SingleTask	-	259.395000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	-	275.382000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	-	278.916000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	-	278.736000 ms
Runtime_DAGTaskThroughput_SingleTask	-	1678.732000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	-	1746.233000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	-	1725.256000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	-	1695.816000 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	-	4.238000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	-	4.317000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	-	4.322000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	-	4.414000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	-	617.994000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	-	617.954000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	-	4.547000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	-	4.781000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	-	4.574000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	-	4.702000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	-	617.523000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	-	617.254000 ms
MicroBench_LocalMem_int32_4096	-	29.866000 ms
MicroBench_LocalMem_fp32_4096	-	29.833000 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline
Pattern_Reduction_NDRange_int32	-	16.163000 ms
Pattern_Reduction_Hierarchical_int32	-	16.411000 ms
Pattern_SegmentedReduction_NDRange_int16	-	2.264000 ms
Pattern_SegmentedReduction_NDRange_int32	-	2.164000 ms
Pattern_SegmentedReduction_NDRange_int64	-	2.336000 ms
Pattern_SegmentedReduction_NDRange_fp32	-	2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16	-	11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32	-	11.599000 ms
Pattern_SegmentedReduction_Hierarchical_int64	-	11.779000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	-	11.589000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline
ScalarProduct_NDRange_int32	-	3.733000 ms
ScalarProduct_NDRange_int64	-	5.456000 ms
ScalarProduct_NDRange_fp32	-	3.759000 ms
ScalarProduct_Hierarchical_int32	-	10.523000 ms
ScalarProduct_Hierarchical_int64	-	11.490000 ms
ScalarProduct_Hierarchical_fp32	-	10.170000 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline
USM_Allocation_latency_fp32_device	-	0.068000 ms
USM_Allocation_latency_fp32_host	-	37.899000 ms
USM_Allocation_latency_fp32_shared	-	0.066000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	-	1.661000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	-	1.046000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	-	1.814000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	-	1.195000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline
VectorAddition_int32	-	1.448000 ms
VectorAddition_int64	-	3.139000 ms
VectorAddition_fp32	-	1.445000 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline
Polybench_2mm	-	1.216000 ms
Polybench_3mm	-	1.727000 ms
Polybench_Atax	-	6.880000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	-	16.083000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	-	0.028000 ms

Relative perf in group alloc/max (20): cannot calculate

Benchmark	This PR	baseline
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 glibc	-	2589.180000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 glibc	-	710.936000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 glibc	-	1188.310000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 glibc	-	716.901000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 glibc	-	861.597000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 glibc	-	175.935000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 os_provider	-	2246.790000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 os_provider	-	187.819000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 os_provider	-	1690.250000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 os_provider	-	189.702000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	4441.700000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	256.696000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3268.220000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	306.439000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	299.852000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.534000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	263.904000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	197.833000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 scalable_pool<os_provider>	-	1051.720000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 scalable_pool<os_provider>	-	952.492000 ns

Relative perf in group multiple (12): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 glibc	-	32574.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 glibc	-	4128.530000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 glibc	-	138399.000000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 glibc	-	28197.400000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1161430.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	161766.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 os_provider	-	1166110.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 os_provider	-	141737.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42212.800000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	14889.200000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 scalable_pool<os_provider>	-	72778.500000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 scalable_pool<os_provider>	-	27538.700000 ns

Details

Benchmark details - environment, command, output...

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.375956 s
357.003659 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00378219 s
bitcracker - total time for whole calculation: 35.024 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1264 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1207 1256 32.7722% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1269 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1058 1271 28.7266% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1122 1264 30.4643% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1259 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1077 1272 29.2425% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1257 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1141 1250 30.9802% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1269 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1264 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1276 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1148 1258 31.1702% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1265 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1105 1256 30.0027% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1083 1253 29.4054% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1279 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1151 1262 31.2517% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1277 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1159 1265 31.4689% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1267 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1260 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1107 1254 30.057% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1253 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1153 1261 31.306% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1076 1261 29.2153% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1033 1261 28.0478% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1127 1285 30.6001% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1180 1266 32.0391% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1265 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1086 1255 29.4868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1107 1267 30.057% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1120 1271 30.41% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1122 1269 30.4643% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1097 1265 29.7855% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1260 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1202 1258 32.6364% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1100 1257 29.867% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1054 1264 28.618% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1113 1277 30.2199% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1272 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1252 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1245 1277 33.804% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1271 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1152 1275 31.2788% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1259 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1269 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 204.243 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.683340e-01 6.247260e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.335790e-01 7.658220e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.327160e-01 7.809610e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.647570e-01 8.337120e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.289860e-01 7.894370e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.308450e-01 7.648480e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.285020e-01 7.632040e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.289090e-01 7.833600e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.299260e-01 7.820970e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.297780e-01 7.583050e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.102e+07 1.102e+07 1.102e+07 0.000e+00 100.00
cycleInit 10 3.376e+06 3.376e+06 3.376e+06 0.000e+00 100.00
cycleTracking 10 7.646e+06 7.646e+06 7.646e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.923e+06 4.923e+06 4.923e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.983e+05 1.983e+05 1.983e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 3.940e+02 3.940e+02 3.940e+02 0.000e+00 100.00
Figure Of Merit 117.82 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.47787 s
sobelfilter - total time for whole calculation: 0.603153 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.3e-05 s
dl-cifar - total time for whole calculation: 23.7676 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.38 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-17T14:47:46Z","629702900","27468397","814.276980","34.257272"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-17T14:47:51Z","2044416365","2766885","62.609647","0.084592"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-17T14:48:01Z","576340317","1953063","888.372175","3.009091"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-17T14:48:05Z","2046070008","1956101","62.559000","0.059740"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-17T14:48:15Z","1171789578","5961410","436.947551","2.219789"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-17T14:48:22Z","2048353129","1381979","62.489248","0.042126"

github-actions · 2025-01-17T14:54:26Z

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/12831425140

github-actions · 2025-01-17T15:27:04Z

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/12831425140
Job status: success. Test status: success.

Summary

Total 92 benchmarks in mean.
Geomean 97.944%.
Improved 3 Regressed 30 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 97.031%

Benchmark	This PR	baseline	Relative perf	Change	-
api_overhead_benchmark_sycl SubmitKernel in order	24.437000 μs	24.844 μs	101.67%	1.67%	.
api_overhead_benchmark_sycl SubmitKernel out of order	23.425000 μs	23.678 μs	101.08%	1.08%	.
api_overhead_benchmark_ur SubmitKernel in order	16.573000 μs	16.663 μs	100.54%	0.54%	.
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	2.144 μs	2.118000 μs	98.79%	-1.21%	.
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.703 μs	1.675000 μs	98.36%	-1.64%	.
api_overhead_benchmark_l0 SubmitKernel out of order	11.868 μs	11.528000 μs	97.14%	-2.86%	-
api_overhead_benchmark_ur SubmitKernel out of order CPU count	105463.000 instr	101923.000000 instr	96.64%	-3.36%	-
api_overhead_benchmark_ur SubmitKernel in order CPU count	110815.000 instr	107041.000000 instr	96.59%	-3.41%	-
api_overhead_benchmark_ur SubmitKernel out of order	18.979 μs	15.896000 μs	83.76%	-16.24%	----
api_overhead_benchmark_l0 SubmitKernel in order	11.709000 μs	-
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	123991.000000 instr	-
api_overhead_benchmark_ur SubmitKernel in order with measure completion	21.478000 μs	-

Relative perf in group memory (4): 86.077%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	258.349 μs	253.805000 μs	98.24%	-1.76%	.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.059 GB/s	3.151000 GB/s	97.08%	-2.92%	-
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.859 μs	5.638000 μs	96.23%	-3.77%	-
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	222.228 μs	132.929000 μs	59.82%	-40.18%	----------

Relative perf in group miscellaneous (1): 100.034%

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	858.316000 bw GB/s	858.609 bw GB/s	100.03%	0.03%	.

Relative perf in group multithread (10): 97.418%

Benchmark	This PR	baseline	Relative perf	Change	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	46984.823000 μs	47907.007 μs	101.96%	1.96%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	17286.617000 μs	17316.620 μs	100.17%	0.17%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	6927.714000 μs	6935.535 μs	100.11%	0.11%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	2053.062 μs	2022.915000 μs	98.53%	-1.47%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	8846.523 μs	8555.721000 μs	96.71%	-3.29%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	42548.498 μs	40973.625000 μs	96.30%	-3.70%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	7775.891 μs	7452.758000 μs	95.84%	-4.16%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	1211.537 μs	1157.521000 μs	95.54%	-4.46%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	113757.337 μs	108338.415000 μs	95.24%	-4.76%	-
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	27153.283 μs	25543.132000 μs	94.07%	-5.93%	-

Relative perf in group Velocity-Bench (9): 98.289%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench dl-mnist	2.380000 s	2.720 s	114.29%	14.29%	++++
Velocity-Bench QuickSilver	118.360000 MMS/CTT	117.490 MMS/CTT	100.74%	0.74%	.
Velocity-Bench svm	0.139700 s	0.140 s	100.14%	0.14%	.
Velocity-Bench Bitcracker	35.185 s	35.129800 s	99.84%	-0.16%	.
Velocity-Bench dl-cifar	23.848 s	23.743900 s	99.56%	-0.44%	.
Velocity-Bench CudaSift	203.736 ms	201.142000 ms	98.73%	-1.27%	.
Velocity-Bench Hashtable	355.823 M keys/sec	362.504819 M keys/sec	98.16%	-1.84%	.
Velocity-Bench Sobel Filter	618.891 ms	602.045000 ms	97.28%	-2.72%	-
Velocity-Bench Easywave	289.000 ms	229.000000 ms	79.24%	-20.76%	-----

Relative perf in group Runtime (8): 100.672%

Benchmark	This PR	baseline	Relative perf	Change	-
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	273.737000 ms	278.916 ms	101.89%	1.89%	.
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	273.604000 ms	278.736 ms	101.88%	1.88%	.
Runtime_IndependentDAGTaskThroughput_SingleTask	255.713000 ms	259.395 ms	101.44%	1.44%	.
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	273.291000 ms	275.382 ms	100.77%	0.77%	.
Runtime_DAGTaskThroughput_SingleTask	1679.346 ms	1678.732000 ms	99.96%	-0.04%	.
Runtime_DAGTaskThroughput_NDRangeParallelFor	1696.629 ms	1695.816000 ms	99.95%	-0.05%	.
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1726.628 ms	1725.256000 ms	99.92%	-0.08%	.
Runtime_DAGTaskThroughput_BasicParallelFor	1753.295 ms	1746.233000 ms	99.60%	-0.40%	.

Relative perf in group MicroBench (14): 95.453%

Benchmark	This PR	baseline	Relative perf	Change	-
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	617.446000 ms	617.523 ms	100.01%	0.01%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	618.095 ms	617.994000 ms	99.98%	-0.02%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	618.113 ms	617.954000 ms	99.97%	-0.03%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	617.470 ms	617.254000 ms	99.97%	-0.03%	.
MicroBench_LocalMem_int32_4096	29.887 ms	29.866000 ms	99.93%	-0.07%	.
MicroBench_LocalMem_fp32_4096	29.897 ms	29.833000 ms	99.79%	-0.21%	.
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	4.890 ms	4.702000 ms	96.16%	-3.84%	-
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	4.746 ms	4.414000 ms	93.00%	-7.00%	--
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	4.650 ms	4.317000 ms	92.84%	-7.16%	--
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	4.662 ms	4.322000 ms	92.71%	-7.29%	--
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	4.583 ms	4.238000 ms	92.47%	-7.53%	--
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	4.923 ms	4.547000 ms	92.36%	-7.64%	--
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	5.241 ms	4.781000 ms	91.22%	-8.78%	--
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	5.244 ms	4.574000 ms	87.22%	-12.78%	---

Relative perf in group Pattern (10): 100.237%

Benchmark	This PR	baseline	Relative perf	Change	-
Pattern_Reduction_Hierarchical_int32	16.187000 ms	16.411 ms	101.38%	1.38%	.
Pattern_Reduction_NDRange_int32	15.977000 ms	16.163 ms	101.16%	1.16%	.
Pattern_SegmentedReduction_Hierarchical_int32	11.589000 ms	11.599 ms	100.09%	0.09%	.
Pattern_SegmentedReduction_Hierarchical_fp32	11.586000 ms	11.589 ms	100.03%	0.03%	.
Pattern_SegmentedReduction_Hierarchical_int16	11.800000 ms	11.801 ms	100.01%	0.01%	.
Pattern_SegmentedReduction_NDRange_int32	2.164000 ms	2.164 ms	100.00%	0.00%	.
Pattern_SegmentedReduction_Hierarchical_int64	11.781 ms	11.779000 ms	99.98%	-0.02%	.
Pattern_SegmentedReduction_NDRange_int16	2.265 ms	2.264000 ms	99.96%	-0.04%	.
Pattern_SegmentedReduction_NDRange_int64	2.338 ms	2.336000 ms	99.91%	-0.09%	.
Pattern_SegmentedReduction_NDRange_fp32	2.166 ms	2.163000 ms	99.86%	-0.14%	.

Relative perf in group ScalarProduct (6): 99.988%

Benchmark	This PR	baseline	Relative perf	Change	-
ScalarProduct_NDRange_int64	5.445000 ms	5.456 ms	100.20%	0.20%	.
ScalarProduct_NDRange_fp32	3.754000 ms	3.759 ms	100.13%	0.13%	.
ScalarProduct_Hierarchical_int64	11.486000 ms	11.490 ms	100.03%	0.03%	.
ScalarProduct_Hierarchical_fp32	10.174 ms	10.170000 ms	99.96%	-0.04%	.
ScalarProduct_Hierarchical_int32	10.529 ms	10.523000 ms	99.94%	-0.06%	.
ScalarProduct_NDRange_int32	3.746 ms	3.733000 ms	99.65%	-0.35%	.

Relative perf in group USM (7): 101.259%

Benchmark	This PR	baseline	Relative perf	Change	-
USM_Allocation_latency_fp32_shared	0.056000 ms	0.066 ms	117.86%	17.86%	++++
USM_Allocation_latency_fp32_device	0.065000 ms	0.068 ms	104.62%	4.62%	+
USM_Allocation_latency_fp32_host	37.781000 ms	37.899 ms	100.31%	0.31%	.
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.701 ms	1.661000 ms	97.65%	-2.35%	-
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.868 ms	1.814000 ms	97.11%	-2.89%	-
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	1.084 ms	1.046000 ms	96.49%	-3.51%	-
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.239 ms	1.195000 ms	96.45%	-3.55%	-

Relative perf in group VectorAddition (3): 98.848%

Benchmark	This PR	baseline	Relative perf	Change	-
VectorAddition_int64	3.107000 ms	3.139 ms	101.03%	1.03%	.
VectorAddition_int32	1.464 ms	1.448000 ms	98.91%	-1.09%	.
VectorAddition_fp32	1.495 ms	1.445000 ms	96.66%	-3.34%	-

Relative perf in group Polybench (3): 100.099%

Benchmark	This PR	baseline	Relative perf	Change	-
Polybench_Atax	6.851000 ms	6.880 ms	100.42%	0.42%	.
Polybench_2mm	1.214000 ms	1.216 ms	100.16%	0.16%	.
Polybench_3mm	1.732 ms	1.727000 ms	99.71%	-0.29%	.

Relative perf in group Kmeans (1): 99.907%

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	16.098 ms	16.083000 ms	99.91%	-0.09%	.

Relative perf in group MolecularDynamics (1): 96.552%

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	0.029 ms	0.028000 ms	96.55%	-3.45%	-

Relative perf in group llama.cpp (6): 98.866%

Benchmark	This PR	baseline	Relative perf	Change	-
llama.cpp Prompt Processing Batched 512	434.334 token/s	434.541716 token/s	99.95%	-0.05%	.
llama.cpp Prompt Processing Batched 256	867.383 token/s	872.377637 token/s	99.43%	-0.57%	.
llama.cpp Text Generation Batched 512	62.469 token/s	63.295460 token/s	98.69%	-1.31%	.
llama.cpp Text Generation Batched 256	62.518 token/s	63.361520 token/s	98.67%	-1.33%	.
llama.cpp Text Generation Batched 128	62.493 token/s	63.338561 token/s	98.66%	-1.34%	.
llama.cpp Prompt Processing Batched 128	820.452 token/s	838.869803 token/s	97.80%	-2.20%	-

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	2658.030000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	2303.340000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3040.100000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	290.618000 ns	-

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	698.246000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	201.409000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	266.979000 ns	-
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	217.624000 ns	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	1236.290000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	1955.980000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3471.010000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	262.192000 ns	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	727.605000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	190.480000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	305.486000 ns	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	201.022000 ns	-

Relative perf in group alloc/min (4): cannot calculate

Benchmark	This PR	baseline
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	792.249000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	173.996000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	1105.120000 ns	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	973.144000 ns	-

Relative perf in group multiple (24): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	32383.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	4236.290000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	137759.000000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	31771.100000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	1179870.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	165010.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	1186630.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	146181.000000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	41351.400000 ns	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	14804.300000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	76055.000000 ns	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	28517.100000 ns	-
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 glibc	-	32574.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 glibc	-	4128.530000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 glibc	-	138399.000000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 glibc	-	28197.400000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1161430.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	161766.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 os_provider	-	1166110.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 os_provider	-	141737.000000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42212.800000 ns
multiple_malloc_free/max_allocs:10000/size:4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	14889.200000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:4 scalable_pool<os_provider>	-	72778.500000 ns
multiple_malloc_free/max_allocs:10000/min size:8/max size:65536/granularity:8/iterations:2000/threads:1 scalable_pool<os_provider>	-	27538.700000 ns

Relative perf in group alloc/max (20): cannot calculate

Benchmark	This PR	baseline
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 glibc	-	2589.180000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 glibc	-	710.936000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 glibc	-	1188.310000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 glibc	-	716.901000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 glibc	-	861.597000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 glibc	-	175.935000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 os_provider	-	2246.790000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 os_provider	-	187.819000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 os_provider	-	1690.250000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 os_provider	-	189.702000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	4441.700000 ns
alloc/max_allocs:1000/pre_allocs:0/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	256.696000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3268.220000 ns
alloc/max_allocs:1000/pre_allocs:100000/size:4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	306.439000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	299.852000 ns
alloc/max_allocs:10000/pre_allocs:0/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.534000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	263.904000 ns
alloc/max_allocs:10000/pre_allocs:100000/size:4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	197.833000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:4 scalable_pool<os_provider>	-	1051.720000 ns
alloc/max_allocs:10000/pre_allocs:0/min size:8/max size:65536/granularity:8/iterations:200000/threads:1 scalable_pool<os_provider>	-	952.492000 ns

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00383903 s
bitcracker - total time for whole calculation: 35.1849 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1260 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1261 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1267 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1261 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1260 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1276 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1261 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1164 1273 31.6047% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1139 1261 30.9259% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1102 1251 29.9213% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1259 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1144 1250 31.0616% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1269 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1210 1269 32.8537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1105 1259 30.0027% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1274 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1270 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1089 1247 29.5683% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1270 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1127 1268 30.6001% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1266 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1049 1251 28.4822% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1265 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1264 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1272 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1264 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1260 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1266 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1087 1270 29.514% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1125 1265 30.5458% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1273 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1096 1256 29.7583% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1271 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1257 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1066 1258 28.9438% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1259 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1111 1264 30.1656% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1057 1259 28.6994% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1260 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1176 1259 31.9305% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1206 1272 32.745% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1052 1261 28.5637% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1254 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1149 1265 31.1974% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1254 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 203.736 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.324920e-01 6.081300e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.365440e-01 7.448300e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.357800e-01 7.619450e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.668010e-01 8.262260e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.424160e-01 7.907400e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.318040e-01 7.644380e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.318630e-01 7.632580e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.309150e-01 7.844910e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.309060e-01 7.912120e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.310230e-01 7.763190e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.108e+07 1.108e+07 1.108e+07 0.000e+00 100.00
cycleInit 10 3.471e+06 3.471e+06 3.471e+06 0.000e+00 100.00
cycleTracking 10 7.612e+06 7.612e+06 7.612e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.923e+06 4.923e+06 4.923e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.016e+05 2.016e+05 2.016e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.060e+02 4.060e+02 4.060e+02 0.000e+00 100.00
Figure Of Merit 118.36 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.52563 s
sobelfilter - total time for whole calculation: 0.618891 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.2e-05 s
dl-cifar - total time for whole calculation: 23.8482 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.38 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2214.69,1802.42,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,707.46,707.464,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1253.12,1205.35,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,731.641,731.644,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,783.132,755.561,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,180.129,180.128,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2094.73,2093.89,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,182.433,182.428,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1836.89,1836.02,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.48,190.474,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2974.34,2925.19,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,266.979,266.973,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3262.58,3213.53,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,305.486,305.479,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,290.618,286.766,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,217.624,217.617,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,273.484,272.369,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,201.022,201.016,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,989.795,983.79,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,953.613,953.604,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,31956.6,30055.2,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4236.29,4236.16,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,136051,86006,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30213.8,30213.7,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.15223e+06,1.15173e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,167135,167133,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.18271e+06,1.18219e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,154165,154164,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42905.8,41871.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14804.3,14804,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77513,77053,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,28517.1,28516.5,ns,,,,,

pbalcer requested a review from a team as a code owner January 16, 2025 17:38

github-actions bot added the ci/cd Continuous integration/devliery label Jan 16, 2025

pbalcer force-pushed the bench-build-umd branch from 88b2880 to 39061e6 Compare January 17, 2025 08:54

pbalcer force-pushed the bench-build-umd branch from 39061e6 to 4ee22db Compare January 17, 2025 09:36

oneapi-src deleted a comment from github-actions bot Jan 17, 2025

pbalcer force-pushed the bench-build-umd branch from 4ee22db to f6eed13 Compare January 17, 2025 09:52

pbalcer force-pushed the bench-build-umd branch from f6eed13 to 058043e Compare January 17, 2025 10:09

pbalcer force-pushed the bench-build-umd branch from 058043e to 2aadfa1 Compare January 17, 2025 10:20

pbalcer force-pushed the bench-build-umd branch from 2aadfa1 to 0585566 Compare January 17, 2025 11:00

pbalcer force-pushed the bench-build-umd branch from 0585566 to 9e06068 Compare January 17, 2025 11:08

add building compute-runtime UMD in benchmarks jobs

261f6f1

pbalcer force-pushed the bench-build-umd branch from 9e06068 to 261f6f1 Compare January 17, 2025 14:15

igchor merged commit ed09541 into main Jan 17, 2025
26 of 149 checks passed

pbalcer deleted the bench-build-umd branch January 20, 2025 09:56

add building compute-runtime UMD in benchmarks jobs #2577

add building compute-runtime UMD in benchmarks jobs #2577

Uh oh!

Conversation

pbalcer commented Jan 16, 2025

Uh oh!

github-actions bot commented Jan 16, 2025

Uh oh!

github-actions bot commented Jan 16, 2025

Summary

Performance change in benchmark groups

Details

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

==================================
Retrieving Info

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!