Description
Observed behavior
I encountered an issue where a NATS stream’s first and last sequence numbers were unexpectedly reset to 0 following an abrupt termination of the NATS server. Interestingly, the consumer remained fully caught up with messages and retained its expected state even after the crash, but the stream itself appeared to have been reset.
Before restarting NATS, I inspected the data directory and observed that the only blk file with 0 bytes. Additionally, based on the file timestamps, it was evident that index.db had not been updated recently and appeared stale.
Below is my stream info before crash
Subjects: test
Replicas: 1
Storage: File
Options:
Retention: WorkQueue
Acknowledgments: true
Discard Policy: New
Duplicate Window: 2m0s
Direct Get: true
Allows Msg Delete: true
Allows Purge: true
Allows Rollups: false
Limits:
Maximum Messages: unlimited
Maximum Per Subject: unlimited
Maximum Bytes: unlimited
Maximum Age: unlimited
Maximum Message Size: unlimited
Maximum Consumers: unlimited
State:
Messages: 0
Bytes: 0 B
First Sequence: 100,001
Last Sequence: 100,000 @ 2025-05-08 14:54:10
Active Consumers: 1
Consumer Info Before Crash
Information for Consumer test > test created 2025-05-08T14:54:03+05:30
Configuration:
Name: test
Pull Mode: true
Deliver Policy: All
Ack Policy: Explicit
Ack Wait: 30.00s
Replay Policy: Instant
Max Ack Pending: 1,000
Max Waiting Pulls: 512
State:
Last Delivered Message: Consumer sequence: 100,000 Stream sequence: 100,000 Last delivery: 2m5s ago
Acknowledgment Floor: Consumer sequence: 100,000 Stream sequence: 100,000 Last Ack: 2m5s ago
Outstanding Acks: 0 out of maximum 1,000
Redelivered Messages: 0
Unprocessed Messages: 0
Waiting Pulls: 0 of maximum 512
Stream State after crash
State:
Messages: 0
Bytes: 0 B
First Sequence: 0
Last Sequence: 0
Active Consumers: 0
Consumer State after crash
State:
Last Delivered Message: Consumer sequence: 200,000 Stream sequence: 200,000
Acknowledgment Floor: Consumer sequence: 200,000 Stream sequence: 200,000
Outstanding Acks: 0 out of maximum 1,000
Redelivered Messages: 0
Unprocessed Messages: 0
Waiting Pulls: 0 of maximum 512
Expected behavior
Some data loss is expected in such scenario but stream state should not get reset to 0 while recovering.
Server and client version
Server 2.10.25
Client 1.30.0
Host environment
Linux
Steps to reproduce
- Create a Work Queue (WQ) stream along with a pull-based consumer.
- Publish and consume a few initial messages, then wait approximately 2 minutes to allow index.db to be created.
- Next, publish several thousand messages to the stream and consume all of them to ensure the stream and consumer are fully up to date.
- Navigate to the stream’s message directory and verify that only a single blk file exists.
- To simulate a crash scenario where messages were not flushed to disk, delete the existing blk file and recreate it as an empty file with the same name.
- Restart the NATS server.
- Observe that upon reboot, the stream state is reset to sequence 0, despite the consumer having previously consumed all messages.