Skip to content

Stream TopN panics #8570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xxchan opened this issue Mar 15, 2023 · 8 comments · Fixed by #8659
Closed

Stream TopN panics #8570

xxchan opened this issue Mar 15, 2023 · 8 comments · Fixed by #8659
Assignees
Labels
type/bug Type: Bug. Only for issues.
Milestone

Comments

@xxchan
Copy link
Member

xxchan commented Mar 15, 2023

Describe the bug

panicked at 'assertion failed: self.middle.len() <= self.limit', /risingwave/src/stream/src/executor/top_n/top_n_cache.rs:140:13",

To Reproduce

create table hackernews_newstories( 
   "id" INTEGER,
   "title" VARCHAR,
   "author" VARCHAR,
   "score" INTEGER,
   "url" VARCHAR,
   "type" VARCHAR,
   "summary" VARCHAR,
   "keyphrases" VARCHAR[],
    "timestamp" TIMESTAMP)
WITH (
   connector = 'datagen' );



CREATE MATERIALIZED VIEW top_keyphrases_latest_5min AS WITH now_timestamp AS (
    SELECT MAX(timestamp) AS now_timestamp
    FROM hackernews_newstories
),
top_keyphrases_hop_5s_5min AS (
    SELECT count(*) as count,
        keyphrases,
        window_start,
        window_end
    FROM (
            SELECT unnest(keyphrases) as keyphrases,
                timestamp,
                window_start,
                window_end
            FROM HOP (
                    hackernews_newstories,
                    timestamp,
                    INTERVAL '5' SECOND,
                    INTERVAL '5' MINUTE
                )
        )
    GROUP BY keyphrases,
        window_start,
        window_end
    ORDER BY window_start DESC
)
SELECT count,
    keyphrases,
    window_end
FROM top_keyphrases_hop_5s_5min
WHERE window_end = (
        SELECT window_end
        FROM top_keyphrases_hop_5s_5min
        WHERE window_end >= (
                SELECT now_timestamp
                FROM now_timestamp
            )
        ORDER BY window_end ASC
        LIMIT 1
    )
ORDER BY window_end DESC;

Then wait a while. easily reproduable.

Expected behavior

No response

Additional context

No response

@xxchan xxchan added the type/bug Type: Bug. Only for issues. label Mar 15, 2023
@github-actions github-actions bot added this to the release-0.1.18 milestone Mar 15, 2023
@xxchan xxchan self-assigned this Mar 15, 2023
@xxchan
Copy link
Member Author

xxchan commented Mar 15, 2023

The only suspecious place I could find now is:

self.middle.remove(&cache_key);
res_ops.push(Op::Delete);
res_rows.push((&row).into());
// Bring one element, if any, from high cache to middle cache
if !self.high.is_empty() {
let high_first = self.high.pop_first().unwrap();
res_ops.push(Op::Insert);
res_rows.push(high_first.1.clone());
self.middle.insert(high_first.0, high_first.1);

where L258 didn't remove a row in middle when it's full, and L267 inserts a row. (Is this possible?)

@xxchan
Copy link
Member Author

xxchan commented Mar 15, 2023

The root cause is:

  1. delete a row existing in high cache
  2. insert a row. It's directly inserted to high cache (wrong).

We should either refill high cache after 1 or before 2

@xxchan
Copy link
Member Author

xxchan commented Mar 15, 2023

Here's a reproduce:

create table t(x int);
create materialized view mv as select * from t order by x limit 1;
insert into t values  (1), (2), (3), (4);
delete from t where x = 2;
insert into t values (5);
delete from t where x = 1;
insert into t values (6);
delete from t where x = 3;
delete from t where x = 4;
insert into t values (1); -- panics!

illustration:

insert into t values  (1), (2), (3), (4);

(mid)
1
---
(high)
2
3
---
(storage\cache)
4

delete from t where x = 2;
insert into t values (5);

(mid)
1
---
(high)
3
5
---
(storage\cache)
4



delete from t where x = 1;

(mid)
3
---
(high)
5
---
(storage\cache)
4



insert into t values (6);

(mid)
3
---
(high)
5
6
---
(storage\cache)
4



delete from t where x = 3;

(mid)
5
---
(high)
6
---
(storage\cache)
4

delete from t where x = 4;

Oops!

(mid)
5
6
---
(high)
---
(storage\cache)



insert into t values (1); -- panics!

@xxchan
Copy link
Member Author

xxchan commented Mar 15, 2023

I guess this could happen in higher possibility when the TopN's input has frequent deletes. In the given example, it's a GroupTopN, which has many deletes. 🥲

@BugenZhao
Copy link
Member

The root cause is:

  1. delete a row existing in high cache

  2. insert a row. It's directly inserted to high cache (wrong).

We should either refill high cache after 1 or before 2

This reminds me of the sync flag in the Min/Max aggregation cache. We only extend the cache if we're sure it's correct.

@st1page
Copy link
Contributor

st1page commented Mar 16, 2023

The root cause is:

  1. delete a row existing in high cache
  2. insert a row. It's directly inserted to high cache (wrong).

We should either refill high cache after 1 or before 2

We need to make sure that every cache should be "full" forever which means there is only three kinds of states:

  1. the cache is filled with N number of rows.
  2. the cache is not filled and its next-level cache is empty

To maintain it, we need to do the delete operation in the cache cascade.

@xxchan

This comment was marked as resolved.

@xxchan

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Type: Bug. Only for issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants