Do No Continue Processing Event in Batch Mode for Kinesis/DDBStreams #1820

belugabehr · 2025-04-16T16:49:15Z

When processing a DDB Stream in batch mode, I want to stop processing when a failure is reached. Since this is a stream, and ordering of message is important for me, the processing should immediately stop.

That is to say, if my data is partitioned on Purchase ID, I want to ensure all events related to the same purchase are played in order. If a failure occurs, the processing of the stream should stop and retry later.

Expected Behavior

When an error occurs, the offending event should be checkpointed, and processing should stop.

https://docs.aws.amazon.com/lambda/latest/dg/services-ddb-batchfailurereporting.html

Current Behavior

For DDB Stream Batch processing, the stream will continue to be reprocessed, and the same messages will be repeated again, and again.

powertools-lambda-java/powertools-batch/src/main/java/software/amazon/lambda/powertools/batch/handler/DynamoDbBatchMessageHandler.java

Lines 59 to 75 in 8dcdddf

    
           } catch (Throwable t) { 
        
               String sequenceNumber = record.getDynamodb().getSequenceNumber(); 
        
               LOGGER.error("Error while processing record with id {}: {}, adding it to batch item failures", 
        
                       sequenceNumber, t.getMessage()); 
        
               LOGGER.error("Error was", t); 
        
               batchFailures.add(new StreamsEventResponse.BatchItemFailure(sequenceNumber)); 
        
               // Report failure if we have a handler 
        
               if (this.failureHandler != null) { 
        
                   // A failing failure handler is no reason to fail the batch 
        
                   try { 
        
                       this.failureHandler.accept(record, t); 
        
                   } catch (Throwable t2) { 
        
                       LOGGER.warn("failureHandler threw handling failure", t2); 
        
                   } 
        
               } 
        
           }

Possible Solution

Return on any error. Take a look at the following example as reference:

https://docs.aws.amazon.com/lambda/latest/dg/services-ddb-batchfailurereporting.html

            } catch (Exception e) {
                /* Since we are working with streams, we can return the failed item immediately.
                   Lambda will immediately begin to retry processing from this failed item onwards. */
                batchItemFailures.add(new StreamsEventResponse.BatchItemFailure(curRecordSequenceNumber));
                return new StreamsEventResponse(batchItemFailures);
            }

or a little bit nicer:

                return new StreamsEventResponse(Collections.singletonList(
                        new StreamsEventResponse.BatchItemFailure(curRecordSequenceNumber)));

The text was updated successfully, but these errors were encountered:

phipag · 2025-04-17T11:35:45Z

Hey @belugabehr,

thank you for opening this issue and even taking the time to find the code inside library.

I would like to understand your use-case better. Can you elaborate on this? I don't fully understand what you mean by

and the same messages will be repeated again, and again.

If we stopped processing the batch on the first failure the same (failing) message would still be re-processed again. Am I missing something?

Regarding the behavior itself: This is the expected implementation. If a batch fails partially we still finish processing the batch and then report the failed events back to the DDB Streams service so that a checkpoint will be left at the index of the failed item in the batch. We have a nice diagram for this in the Powertools for AWS Lambda (Python) documentation: https://docs.powertools.aws.dev/lambda/python/latest/utilities/batch/#kinesis-and-dynamodb-streams.

To confirm my understanding of your request. Are you looking for SqsFifoPartialProcessor that we have in Powertools for AWS Lambda (Python)? Can you checkout this diagram and let me know if this what you are looking for? https://docs.powertools.aws.dev/lambda/python/latest/utilities/batch/#sqs-fifo

belugabehr · 2025-04-17T15:01:39Z

Hey,

Thanks for the feedback.

Yes. we have DDB Stream hooked up to an SNS FIFO queue.

The issue we have is if we have three events: (C)reate, (U)pdate, and (D)elete, then we need to process those in order.

For example, if we receive the C, and then we fail to handle the U, we do not want to continue in the stream and process the D. We just need the checkpoint to move up to the U event and wait until the issue clears.

So maybe just need a flag on the existing batch processor to exit early instead of continuing to process messages (and reset the checkpoint back to the latest queue offset).

phipag · 2025-04-17T17:53:35Z

Hey @belugabehr,

thanks for explaining your use-case. I will get back to to you with some new information next week and will do some tests in the meantime.

leandrodamascena · 2025-04-17T18:07:15Z

Thanks for opening this issue @belugabehr! This also can happen with Kinesis stream and if using bisect configuration can have some other side effects.

I have a few additional thoughts here before we make a decision and I hope to share them by Monday.

Thanks

belugabehr · 2025-04-18T01:59:51Z

Hello and thank you for the engagement. Once I realized that the current implementation of power tools batching sets a checkpoint on the broken event, but continued to process the batch, I found that surprising. As I understand it, if the batch size was 10 (for example), and the first item in the batch fails, then the entire batch of 10 items will be retried repeatedly - even if the last nine items are processed successfully. When I looked into it more, I came across the aforementioned example from the AWS docs that seemed to align with my understanding of the situation (the early return). It would be great if you could clarify and provide guidance. In our particular use case, we are sending the events our via SNS FIFO for fan out. We use the event sequence ID as the deduplication ID to add some extra protection on replay. Also, as an aside, we process large Kinesis batches (e.g., 1000) into smaller batches and publish events to SNS (max batch of ten). So, I think it would be an interesting thought exercise here. Power tools could support this use case by provide configuration for both input and output batch sizes. To maintain the existing behavior, one would specify a output size of one.

…

On Thu, Apr 17, 2025, 2:07 PM Leandro Damascena ***@***.***> wrote: Thanks for opening this issue @belugabehr <https://github.com/belugabehr>! This also can happen with Kinesis stream and if using bisect configuration can have some other side effects. I have a few additional thoughts here before we make a decision and I hope to share them by Monday. Thanks — Reply to this email directly, view it on GitHub <#1820 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC766EY63VTWG3VKTQSZT732Z7UWTAVCNFSM6AAAAAB3IZEKT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJTGY4TGNRZGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***> *leandrodamascena* left a comment (aws-powertools/powertools-lambda-java#1820) <#1820 (comment)> Thanks for opening this issue @belugabehr <https://github.com/belugabehr>! This also can happen with Kinesis stream and if using bisect configuration can have some other side effects. I have a few additional thoughts here before we make a decision and I hope to share them by Monday. Thanks — Reply to this email directly, view it on GitHub <#1820 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC766EY63VTWG3VKTQSZT732Z7UWTAVCNFSM6AAAAAB3IZEKT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJTGY4TGNRZGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

belugabehr · 2025-04-18T11:31:26Z

Also, just for completeness sake, note that: DynamoDB Streams captures a time-ordered sequence of item-level modifications. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html Again, order matters here and the default behavior IMHO is that there should be a fail-fast if they cannot be processed in order. I could maybe see a "fancy solution" whereby items are inserted into a bag based on the DDB event partition/sort keys, and processed that way, but that feels overkill. Thanks.

…

On Thu, Apr 17, 2025, 9:59 PM David ***@***.***> wrote: Hello and thank you for the engagement. Once I realized that the current implementation of power tools batching sets a checkpoint on the broken event, but continued to process the batch, I found that surprising. As I understand it, if the batch size was 10 (for example), and the first item in the batch fails, then the entire batch of 10 items will be retried repeatedly - even if the last nine items are processed successfully. When I looked into it more, I came across the aforementioned example from the AWS docs that seemed to align with my understanding of the situation (the early return). It would be great if you could clarify and provide guidance. In our particular use case, we are sending the events our via SNS FIFO for fan out. We use the event sequence ID as the deduplication ID to add some extra protection on replay. Also, as an aside, we process large Kinesis batches (e.g., 1000) into smaller batches and publish events to SNS (max batch of ten). So, I think it would be an interesting thought exercise here. Power tools could support this use case by provide configuration for both input and output batch sizes. To maintain the existing behavior, one would specify a output size of one. On Thu, Apr 17, 2025, 2:07 PM Leandro Damascena ***@***.***> wrote: > Thanks for opening this issue @belugabehr <https://github.com/belugabehr>! > This also can happen with Kinesis stream and if using bisect configuration > can have some other side effects. > > I have a few additional thoughts here before we make a decision and I > hope to share them by Monday. > > Thanks > > — > Reply to this email directly, view it on GitHub > <#1820 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC766EY63VTWG3VKTQSZT732Z7UWTAVCNFSM6AAAAAB3IZEKT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJTGY4TGNRZGQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > *leandrodamascena* left a comment > (aws-powertools/powertools-lambda-java#1820) > <#1820 (comment)> > > Thanks for opening this issue @belugabehr <https://github.com/belugabehr>! > This also can happen with Kinesis stream and if using bisect configuration > can have some other side effects. > > I have a few additional thoughts here before we make a decision and I > hope to share them by Monday. > > Thanks > > — > Reply to this email directly, view it on GitHub > <#1820 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC766EY63VTWG3VKTQSZT732Z7UWTAVCNFSM6AAAAAB3IZEKT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJTGY4TGNRZGQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

belugabehr added bug Something isn't working triage labels Apr 16, 2025

github-project-automation bot added this to Powertools for AWS Lambda (Java) Apr 16, 2025

github-project-automation bot moved this to Triage in Powertools for AWS Lambda (Java) Apr 16, 2025

phipag added batch feature-request New feature or request feature-parity Feature parity with python version and removed triage bug Something isn't working labels Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do No Continue Processing Event in Batch Mode for Kinesis/DDBStreams #1820

Do No Continue Processing Event in Batch Mode for Kinesis/DDBStreams #1820

belugabehr commented Apr 16, 2025 •

edited

Loading

phipag commented Apr 17, 2025

belugabehr commented Apr 17, 2025

phipag commented Apr 17, 2025

leandrodamascena commented Apr 17, 2025

belugabehr commented Apr 18, 2025 via email

belugabehr commented Apr 18, 2025 via email

Do No Continue Processing Event in Batch Mode for Kinesis/DDBStreams #1820

Do No Continue Processing Event in Batch Mode for Kinesis/DDBStreams #1820

Comments

belugabehr commented Apr 16, 2025 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

phipag commented Apr 17, 2025

belugabehr commented Apr 17, 2025

phipag commented Apr 17, 2025

leandrodamascena commented Apr 17, 2025

belugabehr commented Apr 18, 2025 via email

belugabehr commented Apr 18, 2025 via email

belugabehr commented Apr 16, 2025 •

edited

Loading