You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"fast-forward" or "forced" tokens are ones that are determined by the current state of a constraint.
An example, where fast forward tokens are useful is generating data adhering to a certain JSON schema. The constraint first forces {"name":" to be generated, then the model generates John", the controller forces ,\n"age":, model generates 42, and so on. Another example is chain-of-thought reasoning, where after the model generated a sentence, the controller forces more instructions for the model, the model generates more text, and so on. If used, these significantly speed up generation process, because they can be processed in one forward pass, similar to the prompt.
One can think of them as 100% accurate speculation.
The llguidance library supports computing them, so it would be nice to allow them in mistral.rs.
The text was updated successfully, but these errors were encountered:
"fast-forward" or "forced" tokens are ones that are determined by the current state of a constraint.
An example, where fast forward tokens are useful is generating data adhering to a certain JSON schema. The constraint first forces
{"name":"
to be generated, then the model generatesJohn"
, the controller forces,\n"age":
, model generates42
, and so on. Another example is chain-of-thought reasoning, where after the model generated a sentence, the controller forces more instructions for the model, the model generates more text, and so on. If used, these significantly speed up generation process, because they can be processed in one forward pass, similar to the prompt.One can think of them as 100% accurate speculation.
The llguidance library supports computing them, so it would be nice to allow them in mistral.rs.
The text was updated successfully, but these errors were encountered: