Skip to content

WQ Stream Sequence Reset to 0 on abrupt termination #6881

Closed
@souravagrawal

Description

@souravagrawal

Observed behavior

I encountered an issue where a NATS stream’s first and last sequence numbers were unexpectedly reset to 0 following an abrupt termination of the NATS server. Interestingly, the consumer remained fully caught up with messages and retained its expected state even after the crash, but the stream itself appeared to have been reset.

Before restarting NATS, I inspected the data directory and observed that the only blk file with 0 bytes. Additionally, based on the file timestamps, it was evident that index.db had not been updated recently and appeared stale.

Below is my stream info before crash

  Subjects: test
              Replicas: 1
               Storage: File
Options:
             Retention: WorkQueue
       Acknowledgments: true
        Discard Policy: New
      Duplicate Window: 2m0s
            Direct Get: true
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false
Limits:
      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: unlimited
           Maximum Age: unlimited
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited
State:
              Messages: 0
                 Bytes: 0 B
        First Sequence: 100,001
         Last Sequence: 100,000 @ 2025-05-08 14:54:10
      Active Consumers: 1

Consumer Info Before Crash

Information for Consumer test > test created 2025-05-08T14:54:03+05:30

Configuration:

                    Name: test
               Pull Mode: true
          Deliver Policy: All
              Ack Policy: Explicit
                Ack Wait: 30.00s
           Replay Policy: Instant
         Max Ack Pending: 1,000
       Max Waiting Pulls: 512

State:

  Last Delivered Message: Consumer sequence: 100,000 Stream sequence: 100,000 Last delivery: 2m5s ago
    Acknowledgment Floor: Consumer sequence: 100,000 Stream sequence: 100,000 Last Ack: 2m5s ago
        Outstanding Acks: 0 out of maximum 1,000
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 0 of maximum 512

Stream State after crash

State:

              Messages: 0
                 Bytes: 0 B
        First Sequence: 0
         Last Sequence: 0
      Active Consumers: 0

Consumer State after crash

State:

  Last Delivered Message: Consumer sequence: 200,000 Stream sequence: 200,000
    Acknowledgment Floor: Consumer sequence: 200,000 Stream sequence: 200,000
        Outstanding Acks: 0 out of maximum 1,000
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 0 of maximum 512

Expected behavior

Some data loss is expected in such scenario but stream state should not get reset to 0 while recovering.

Server and client version

Server 2.10.25
Client 1.30.0

Host environment

Linux

Steps to reproduce

  1. Create a Work Queue (WQ) stream along with a pull-based consumer.
  2. Publish and consume a few initial messages, then wait approximately 2 minutes to allow index.db to be created.
  3. Next, publish several thousand messages to the stream and consume all of them to ensure the stream and consumer are fully up to date.
  4. Navigate to the stream’s message directory and verify that only a single blk file exists.
  5. To simulate a crash scenario where messages were not flushed to disk, delete the existing blk file and recreate it as an empty file with the same name.
  6. Restart the NATS server.
  7. Observe that upon reboot, the stream state is reset to sequence 0, despite the consumer having previously consumed all messages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions