Skip to content

Performance is impacted when using both qml.capture.enable and autograph=True #1736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mlxd opened this issue May 12, 2025 · 1 comment
Open

Comments

@mlxd
Copy link
Member

mlxd commented May 12, 2025

When running a workload using both Catalyst's autograph and PennyLane capture the performance is on-par with both options disabled. Enabling either of the options individually gives an improvement to the runtime.

from timeit import default_timer as timer
import pennylane as qml
from catalyst import accelerate
import jax
from jax import numpy as jnp
import sys

@accelerate(dev=jax.devices("cpu")[0])
def classical_fn(x):
    return jnp.sin(x) ** 2

def program(r1, r2, op, **kwargs):
    @qml.qjit(autograph=kwargs["use_ag"])
    def func(r1, r2):

        dev = qml.device("lightning.qubit", wires=12)
        @qml.qnode(dev)
        def circuit(r1, r2):
            for j in range(1):
                for i in range(10):
                    op(wires=i)
                    qml.Rot(*r1, wires=i)
                    qml.Rot(*r2, wires=i)
                    qml.Hadamard(wires=i)
                
            return [qml.expval(qml.PauliZ(i)) for i in range(1)]

        return circuit(r1, r2)

    return classical_fn(jnp.sum(jnp.array(func(r1, r2))))

if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Run the script as: `python script.py X Y`, where X=0/1 disables/enables program capture and Y=0/1 enables/disables QJIT autograph")
    
    if int(sys.argv[1])==1:
        use_capture = True
        qml.capture.enable()
    else:
        use_capture = False

    if int(sys.argv[2])==1:
        use_ag = True
    else:
        use_ag = False

    # Number of indpendent job runs
    num_batches = 140 
    r1 = [jnp.array([0.4, 0.5, 0.6])]*num_batches
    r2 = [jnp.array([0.4, 0.5, 0.6])]*num_batches
    ops = [qml.PauliX, qml.PauliY, qml.PauliZ, qml.Hadamard]*(num_batches//4)

    # Create the executor with the given backend, and map the data batches to the function
    results = []
    kwargs = {"use_capture" : use_capture, "use_ag" : use_ag}
    start = timer()
    for r1a,r2a,opa in zip(r1,r2,ops):
        results.append(jnp.array(program(r1a, r2a, opa, **kwargs)))
    end = timer()

    print(jnp.sum(jnp.array(results), axis=0), end-start)

Run the script as: python script.py X Y, where X=0/1 disables/enables program capture and Y=0/1 enables/disables QJIT autograph.
The above workload uses an example that should offer better performance with autograph and capture, which is observed by enabling either option individually.

Locally, I see the following runtime and potential locations for investigation:

  • QJIT.compile: Entry-point="catalyst/jit.py:778".
    • No Catalyst AG, no QML Capture: 9.491 s
    • No Catalyst AG, QML Capture: 8.873 s
    • Catalyst AG, no QML Capture: 8.854 s
    • Catalyst AG, QML Capture: 9.538 s
  • QJIT.capture: Entry-point="catalyst/jit.py:695".
    • No Catalyst AG, no QML Capture: 4.845 s
    • No Catalyst AG, QML Capture: 1.842 s
    • Catalyst AG, no QML Capture: 1.901 s
    • Catalyst AG, QML Capture: 4.921 s
  • QJIT.generate_ir: Entry-point="catalyst/jit.py:762".
    • No Catalyst AG, no QML Capture: 2.279 s
    • No Catalyst AG, QML Capture: 0.953 s
    • Catalyst AG, no QML Capture: 1.025 s
    • Catalyst AG, QML Capture: 2.223 s

It is likely the issue could also be upstream in PennyLane, but as the profiling indicated the above as the main candidates, I think investigation can start there.

@dime10
Copy link
Contributor

dime10 commented May 12, 2025

If this implies what I think it does then it is actually a bug. My reading is that using qml.capture prevents autograph from being applied?
Especially with the data you mentioned this morning, which if I remember correctly you said the IR size is the same when autograph is disabled as when autograph and plxpr are both enabled, but smaller when only autograph is enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants