Duplicated sample points returned by query-frontend #3920

alvinlin123 · 2021-03-08T05:50:55Z

Describe the bug

I have continuous test running that queries Cortex. After upgrading from commit af9e20c to dd6dbf9 we observed that around 16:00:00 pacific time our test would fail because duplicate sample points were returned:

curl --location -g --request POST 'http://localhost:8080/prometheus/api/v1/query_range?query=my_metrics_name&start=1614556800&end=1614557160&step=60s'
{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "__name__": "my_metrics_name"
        },
        "values": [
          [
            1614556800,
            "96"
          ],
          [
            1614556860,
            "141"
          ],
          [
            1614556800,
            "96"
          ],
          [
            1614556860,
            "141"
          ],
          [
            1614556920,
            "141"
          ],
          [
            1614556980,
            "141"
          ],
          [
            1614557040,
            "141"
          ],
          [
            1614557100,
            "141"
          ],
          [
            1614557160,
            "185"
          ]
        ]
      }
    ]
  }
}

Notice how sample point with timestamp 1614556800 and 1614556860 are duplicated.

Our continuous is quite simple, it runs periodically to push some data and query some data from Cortex. Our test is not using HA mode.

To Reproduce
I am not sure how to reproduce it, but one pattern is that it seems to happen everyday around 16:00:00 pacific time (00:00:00 GMT). The query I had issues with are queries that query past data points like:

query data from 24 hours ago to 24 hours + 6 minutes ago (this query will land in ingester).
query data from 48 hours ago to 48 hours + 6 minuges ago (this query will not land in ingester).

Also, if I keep making the same query multiple times, the issues goes away after a few tries for that specific query; problem comes back next day at 16:00:00 pacific time.

Expected behavior
I don't expected sample points to be duplicated because:

Cortex rejects duplicated sample points for write path when not using HA mode.
Query path should deduplicate result from different sources.

Environment:

Infrastructure: k8s
Deployment tool: helm
I am using query front-end

Storage Engine

Blocks
Chunks

The text was updated successfully, but these errors were encountered:

bboreham · 2021-03-08T10:22:01Z

Could you clarify whether you are using the query-frontend component, and if so, whether the symptom persists if you call the querier directly?

jeromeinsf · 2021-03-08T16:00:41Z

Yes query-frontend is being used.

alvinlin123 · 2021-03-09T03:48:46Z

Could you clarify whether you are using the query-frontend component, and if so, whether the symptom persists if you call the querier directly?

I tried query all the queries directly after I see the bug through query-frontend, but was not able to reproduce the bug.

alvinlin123 · 2021-03-10T01:30:52Z

One more data point. I was able to reproduce the issue by querying agains query-frontend only as soon as I see duplicate sample points querying query-frontend, I made the exact same query against all the queriers, but querier never return duplicated samples. So basically here is what I did:

port-forward to query-frontend-x
make query_range against the query-frontend, got duplicate samples
make the same query_range call against ALL The querier, didn't get duplicate samples
make the same query_range call agains the same query-frontend, got duplicate samples again.

Below are the curl command I ran, port 7777 is to querier where as port 9999 is to a single query-frontend pod

Click to see the long outputs

▶ curl --location -g --request POST "http://localhost:9999/prometheus/api/v1/query_range?query=my_metrics&start=1615161360&end=1615161720&step=60s" -H "X-Scope-OrgId: tenant_id"  -d '{}' -v  | jq

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "__name__": "my_metrics"
        },
        "values": [
          [
            1615161360,
            "242"
          ],
          [
            1615161420,
            "29"
          ],
          [
            1615161480,
            "29"
          ],
          [
            1615161540,
            "29"
          ],
          [
            1615161600,
            "29"
          ],
          [
            1615161660,
            "29"
          ],
          [
            1615161720,
            "73"
          ],
          [
            1615161660,
            "29"
          ],
          [
            1615161720,
            "73"
          ]
        ]
      }
    ]
  }
}

▶ curl --location -g --request POST "http://localhost:7777/prometheus/api/v1/query_range?query=my_metrics&start=1615161360&end=1615161720&step=60s" -H "X-Scope-OrgId: tenant_id"  -d '{}' -v  | jq

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "__name__": "my_metrics"
        },
        "values": [
          [
            1615161360,
            "242"
          ],
          [
            1615161420,
            "29"
          ],
          [
            1615161480,
            "29"
          ],
          [
            1615161540,
            "29"
          ],
          [
            1615161600,
            "29"
          ],
          [
            1615161660,
            "29"
          ],
          [
            1615161720,
            "73"
          ]
        ]
      }
    ]
  }
}

▶ kill -3 $(pgrep -f '^kubectl.* port-forward .* 7777')                                                                 ➜  AWSPrometheusOpsTools git:(mainline)  [2021-03-09 17:25:31] AWS:
▶ curl --location -g --request POST "http://localhost:7777/prometheus/api/v1/query_range?query=my_metrics&start=1615161360&end=1615161720&step=60s" -H "X-Scope-OrgId: tenant_id"  -d '{}' -v  | jq

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "__name__": "my_metrics"
        },
        "values": [
          [
            1615161360,
            "242"
          ],
          [
            1615161420,
            "29"
          ],
          [
            1615161480,
            "29"
          ],
          [
            1615161540,
            "29"
          ],
          [
            1615161600,
            "29"
          ],
          [
            1615161660,
            "29"
          ],
          [
            1615161720,
            "73"
          ]
        ]
      }
    ]
  }
}

▶ kill -3 $(pgrep -f '^kubectl.* port-forward .* 7777')                                                                 ➜  AWSPrometheusOpsTools git:(mainline)  [2021-03-09 17:25:40] 

▶ curl --location -g --request POST "http://localhost:7777/prometheus/api/v1/query_range?query=my_metrics&start=1615161360&end=1615161720&step=60s" -H "X-Scope-OrgId: tenant_id"  -d '{}' -v  | jq

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "__name__": "my_metrics"
        },
        "values": [
          [
            1615161360,
            "242"
          ],
          [
            1615161420,
            "29"
          ],
          [
            1615161480,
            "29"
          ],
          [
            1615161540,
            "29"
          ],
          [
            1615161600,
            "29"
          ],
          [
            1615161660,
            "29"
          ],
          [
            1615161720,
            "73"
          ]
        ]
      }
    ]
  }
}

▶ curl --location -g --request POST "http://localhost:9999/prometheus/api/v1/query_range?query=my_metrics&start=1615161360&end=1615161720&step=60s" -H "X-Scope-OrgId: tenant_id"  -d '{}' -v  | jq

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "__name__": "my_metrics"
        },
        "values": [
          [
            1615161360,
            "242"
          ],
          [
            1615161420,
            "29"
          ],
          [
            1615161480,
            "29"
          ],
          [
            1615161540,
            "29"
          ],
          [
            1615161600,
            "29"
          ],
          [
            1615161660,
            "29"
          ],
          [
            1615161720,
            "73"
          ],
          [
            1615161660,
            "29"
          ],
          [
            1615161720,
            "73"
          ]
        ]
      }
    ]
  }
}

jeromeinsf · 2021-03-11T15:53:58Z

From the Cortex community call, wondering if this would be in or around https://github.com/cortexproject/cortex/blob/master/pkg/querier/queryrange/split_by_interval.go#L43

alvinlin123 · 2021-03-15T21:53:36Z

ok, one more datapoint I just discovered, if I set cache_results to true I get the duplicate samples, if I set cache_results to false, I can't re-produce the duplicate sample issue hmmm.

alvinlin123 · 2021-03-17T03:52:03Z

I just noticed something weird. With old commit af9e20c the size of cache entry doesn't change. But after I upgrade to commit dd6dbf9, I started to see cache item size start to grow everytime I make a same query:

key=cc4fd9127b36005e exp=-1 la=1615952893 cas=1181 fetch=no cls=21 size=7806
key=cc4fdc127b360577 exp=-1 la=1615952893 cas=1182 fetch=no cls=22 size=10308
END
lru_crawler metadump all
key=cc4fd9127b36005e exp=-1 la=1615952896 cas=1183 fetch=no cls=21 size=7860
key=cc4fdc127b360577 exp=-1 la=1615952896 cas=1184 fetch=no cls=22 size=10380
END
lru_crawler metadump all
key=cc4fd9127b36005e exp=-1 la=1615952896 cas=1183 fetch=no cls=21 size=7860
key=cc4fdc127b360577 exp=-1 la=1615952896 cas=1184 fetch=no cls=22 size=10380
END
lru_crawler metadump all
key=cc4fd9127b36005e exp=-1 la=1615952902 cas=1188 fetch=no cls=21 size=7968
key=cc4fdc127b360577 exp=-1 la=1615952902 cas=1187 fetch=no cls=22 size=10524
END
lru_crawler metadump all
key=cc4fd9127b36005e exp=-1 la=1615952902 cas=1188 fetch=no cls=21 size=7968
key=cc4fdc127b360577 exp=-1 la=1615952902 cas=1187 fetch=no cls=22 size=10524
END
lru_crawler metadump all
key=cc4fd9127b36005e exp=-1 la=1615952908 cas=1191 fetch=no cls=21 size=8076
key=cc4fdc127b360577 exp=-1 la=1615952908 cas=1192 fetch=no cls=22 size=10668
END
lru_crawler metadump all
key=cc4fd9127b36005e exp=-1 la=1615952908 cas=1191 fetch=no cls=21 size=8076
key=cc4fdc127b360577 exp=-1 la=1615952908 cas=1192 fetch=no cls=22 size=10668
END
lru_crawler metadump all
key=cc4fd9127b36005e exp=-1 la=1615952911 cas=1193 fetch=no cls=21 size=8130
key=cc4fdc127b360577 exp=-1 la=1615952911 cas=1194 fetch=no cls=22 size=10740
END

note that when cas changes,size increases. I don't think this is normal, is it? I am suspecting this commit 50ab740 introduces increase cache entry size behaviour, but I need to take a closer look.

The range query I am making has start=1614556800 end=1614556860, which crosses a day's boundary in GMT.

With the above in mind, I was somehow able to reproduce the duplicate sample issue if I down grade from newer commit to older commit; and I guess that is because the cached entry contains lots of duplicate samples?

pracucci · 2021-03-17T17:54:38Z

I am suspecting this commit 50ab740 introduces increase cache entry size behaviour, but I need to take a closer look.

We noticed another issue with the behaviour introduced by 50ab740 and Goutham fixed it in #3818. Could you test if #3818 solves your issue too? I just would like to exclude the issue has already be (unintentionally) fixed in master.

alvinlin123 · 2021-03-17T18:42:55Z

I actually tested with #3818 in. Specifically, I was testing with e02797a and see the cache entry size increase.

alvinlin123 · 2021-03-17T23:06:07Z

Ok, so after looking closer, indeed 50ab740 caused the issue where cache entry size kept on increasing. Consider the following:

We are making a range query where

start: 1615161360000
end: 1615161720000
step: 60000 (1 minute)

Because the above timestamp crosses day's boundary, so the query splitter will split it to two queries:

Start: 1615161360000, End: 1615161540000 (3 minutes interval)
Start: 1615161600000, End: 1615161720000 (2 minutes interval)

Note that all the queries has interval less than 5 minutes. So these 2 queries will get pass further down to the result_cache middleware. Let's assume we have made the same query before, so there are result cache entries for the 2 queries.

Both query will go through the handleHit method with existing cache data, then go through the partition method .

Because the queries are less than 5 minutes this line of code will not use the cache data, and will return an Request object for each query.

handleHit then go ahead and execute the Request returned by partition and append it to the existing cached data. But now the problem comes; existing cached data already have the exactly same data, by appending the result of the Request we created a duplicated Extent in the cache entry. The more we execute the same query, the more duplicate Extent gets added to the cache entry; hence, the ever growing cache entry size.

Before 50ab740, I think it was expected that partition method returns requests to get data that is not in the cache. Commit 50ab740 breaks that expectation because after the change, partition would return requests to get data that is in cache.

I am working on a PR to fix this.

bboreham · 2021-08-19T16:15:59Z

What is left to fix here after #3968?

alvinlin123 · 2021-08-20T02:48:55Z

I am not seeing anymore duplicate samples pointing now, so I will go ahead and resolve this issue now, but I am still now sure how it was fixed :) Something must have changed with de-duplication logic together with #3968 that fixes this duplicate sample issue.

gouthamve changed the title ~~Duplicated sample points returned by querier~~ Duplicated sample points returned by query-frontend Mar 11, 2021

alvinlin123 mentioned this issue Mar 17, 2021

There may be duplicate data points when merging series from ingester and store-gateway #3952

Closed

alvinlin123 mentioned this issue Mar 18, 2021

Fix issue where cached entry size keeps increasing when making tiny query repeatedly #3968

Merged

3 tasks

bboreham added the component/query-frontend label Jun 10, 2021

alvinlin123 closed this as completed Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated sample points returned by query-frontend #3920

Duplicated sample points returned by query-frontend #3920

alvinlin123 commented Mar 8, 2021 •

edited

Loading

bboreham commented Mar 8, 2021

jeromeinsf commented Mar 8, 2021 •

edited

Loading

alvinlin123 commented Mar 9, 2021

alvinlin123 commented Mar 10, 2021 •

edited

Loading

jeromeinsf commented Mar 11, 2021

alvinlin123 commented Mar 15, 2021

alvinlin123 commented Mar 17, 2021 •

edited

Loading

pracucci commented Mar 17, 2021

alvinlin123 commented Mar 17, 2021

alvinlin123 commented Mar 17, 2021 •

edited

Loading

bboreham commented Aug 19, 2021

alvinlin123 commented Aug 20, 2021 •

edited

Loading

Duplicated sample points returned by query-frontend #3920

Duplicated sample points returned by query-frontend #3920

Comments

alvinlin123 commented Mar 8, 2021 • edited Loading

bboreham commented Mar 8, 2021

jeromeinsf commented Mar 8, 2021 • edited Loading

alvinlin123 commented Mar 9, 2021

alvinlin123 commented Mar 10, 2021 • edited Loading

jeromeinsf commented Mar 11, 2021

alvinlin123 commented Mar 15, 2021

alvinlin123 commented Mar 17, 2021 • edited Loading

pracucci commented Mar 17, 2021

alvinlin123 commented Mar 17, 2021

alvinlin123 commented Mar 17, 2021 • edited Loading

bboreham commented Aug 19, 2021

alvinlin123 commented Aug 20, 2021 • edited Loading

alvinlin123 commented Mar 8, 2021 •

edited

Loading

jeromeinsf commented Mar 8, 2021 •

edited

Loading

alvinlin123 commented Mar 10, 2021 •

edited

Loading

alvinlin123 commented Mar 17, 2021 •

edited

Loading

alvinlin123 commented Mar 17, 2021 •

edited

Loading

alvinlin123 commented Aug 20, 2021 •

edited

Loading