Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: FileBasedDeadLetterQueueReconsumer could result in duplicates #2236

Open
damccorm opened this issue Mar 5, 2025 · 1 comment
Open
Labels
bug Something isn't working needs triage p2

Comments

@damccorm
Copy link
Contributor

damccorm commented Mar 5, 2025

Related Template(s)

Anything using FileBasedDeadLetterQueueReconsumer (as best I can tell, looks like just spanner changestreams + datastreams templates - https://github.com/search?q=repo%3AGoogleCloudPlatform%2FDataflowTemplates%20FileBasedDeadLetterQueueReconsumer&type=code )

Template Version

latest - seen in 2023-07-18-00_rc00, but also just based on code inspection

What happened?

The way

is written, it could result in duplicates if Dataflow experiences any sort of backlog or slowdown. Specifically, the following scenario could happen:

  1. Generate sequence fires, and everything before the reshuffle happens
  2. Generate sequence fires again, and everything before the reshuffle happens
  3. We remove the file after the reshuffle
  4. We try to remove the file after the reshuffle again, which logs but succeeds -

Relevant log output

@damccorm damccorm added bug Something isn't working needs triage p2 labels Mar 5, 2025
@damccorm
Copy link
Contributor Author

damccorm commented Mar 5, 2025

Tagging some folks who may be interested in taking a look at this. Low likelihood bug which can lead to duplicates.

@manitgupta @Deep1998 @dhercher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage p2
Projects
None yet
Development

No branches or pull requests

1 participant