-
Notifications
You must be signed in to change notification settings - Fork 98
perf(weave): add pre group by ref filtering #4198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=ac8e3549a92db053bdd91f9fb031f8be8467fb27 |
@@ -1077,7 +1085,7 @@ def process_operand(operand: "tsi_query.Operand") -> str: | |||
) | |||
|
|||
|
|||
def process_op_name_filter_to_conditions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better names
Related to the query builder, I'm seeing double conditions in certain cases. It likely doesn't affect performance but makes for longer messier queries.
WITH filtered_calls AS
(SELECT calls_merged.id AS id
FROM calls_merged
WHERE calls_merged.project_id = 'UHJvamVjdEludGVybmFsSWQ6NDEyOTg5ODE='
GROUP BY (calls_merged.project_id,
calls_merged.id)
HAVING (((any(calls_merged.deleted_at) IS NULL))
AND ((NOT ((any(calls_merged.started_at) IS NULL))))
AND (any(calls_merged.parent_id) IS NULL)))
SELECT any(calls_merged.attributes_dump) AS attributes_dump,
any(calls_merged.deleted_at) AS deleted_at,
argMaxMerge(calls_merged.display_name) AS display_name,
any(calls_merged.ended_at) AS ended_at,
any(calls_merged.exception) AS
exception,
calls_merged.id AS id,
array_concat_agg(calls_merged.input_refs) AS input_refs,
any(calls_merged.inputs_dump) AS inputs_dump,
any(calls_merged.op_name) AS op_name,
any(calls_merged.output_dump) AS output_dump,
array_concat_agg(calls_merged.output_refs) AS output_refs,
any(calls_merged.parent_id) AS parent_id,
calls_merged.project_id AS project_id,
any(calls_merged.started_at) AS started_at,
any(calls_merged.summary_dump) AS summary_dump,
any(calls_merged.trace_id) AS trace_id,
any(calls_merged.wb_run_id) AS wb_run_id,
any(calls_merged.wb_user_id) AS wb_user_id
FROM calls_merged
LEFT JOIN feedback ON (feedback.weave_ref = concat('weave-trace-internal:///', 'UHJvamVjdEludGVybmFsSWQ6NDEyOTg5ODE=', '/call/', calls_merged.id))
WHERE calls_merged.project_id = 'UHJvamVjdEludGVybmFsSWQ6NDEyOTg5ODE='
AND calls_merged.project_id = 'UHJvamVjdEludGVybmFsSWQ6NDEyOTg5ODE='
AND (calls_merged.id IN filtered_calls)
GROUP BY (calls_merged.project_id,
calls_merged.id)
HAVING (any(feedback.trigger_ref) = 'weave-trace-internal:///UHJvamVjdEludGVybmFsSWQ6NDEyOTg5ODE=/object/monitor-test:sQijAwWRX6NmaIkMEpfOZhHbjqGw3UZwz70jD9msilI')
ORDER BY any(calls_merged.started_at) DESC
LIMIT 50
OFFSET 0 |
That is not good, investigating |
This is apparently by design, introduced 6 months ago in this pr. Looks like I stamped it :/. As you mentioned, it shouldn't hurt performance but it is indeed pretty ugly. It should be entirely unrelated to the changes in this pr; I will put up another pr to fix that. |
Fix pr for @neutralino1's comment: #4245 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
Description
WB-24547
Add a ref filtering condition pre-group by.
Technically we do not need
AND (hasAny(array_concat_agg(calls_merged.input_refs), ['ref'])))
as we are guaranteed that the one call start row contain any refs, so any call starts included at the group by will be valid. Lets include it for now just to be safe, it should not add meaningful latency.Without this change we get frequent memory exceeded errors, trace link in ticket. This query is the single most errored query in prod the last week, with over 50 error queries (2.2% failure rate).
Testing
(prod, then branch)