-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Crash after disk is full #1801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello! In this case, the error is caused by writing to a memory mapped file, and mmap(2) is raising this SIGBUS issue because of insufficient storage space on disk, which causes an inconsistency between virtual memory content of the file, and disk content. This however, should not cause an app to crash, so I would suggest to recover and return an error when this panic is raised, and should be applied to Ristretto's In order to reproduce it, so far I am creating a file system with limited amount of space (2MB): dd if=/dev/zero of=rawfile bs=1K count=2000
mkfs.ext4 rawfile
mkdir ~/.bfs
sudo mount -o loop rawfile ~/.bfs
sudo chmod -R 777 ~/.bfs After which I tested used this directory as path for package main
import (
"crypto/rand"
"flag"
"fmt"
"github.com/dgraph-io/badger/v3"
)
const (
MB = 1024 * 1024
)
var rounds *int
func init() {
rounds = flag.Int("megs", 20, "Number of MBs of data storage")
}
func main() {
flag.Parse()
path := "/home/admin02/.bfs"
opts := badger.DefaultOptions(path)
opts.WithInMemory(false)
bdb, err := badger.Open(opts)
if err != nil {
panic(err)
}
defer bdb.Close()
for i := 0; i <= *rounds; i++ {
func() {
tx := bdb.NewTransaction(true)
defer tx.Discard()
var key []byte = make([]byte, 10)
rand.Read(key)
var data []byte = make([]byte, 1*MB)
rand.Read(data)
fmt.Println(">>> entry:", i, len(data))
err = tx.Set(key, data)
if err != nil {
panic(err)
}
err = tx.Commit()
if err != nil {
panic(err)
}
}()
}
} which resulted (when hitting the limit):
|
how about add an error "disk is full", let client to handle this error |
reproduced with jaeger-remote-storage using badger memTable backend with v3.2103.5 on Linux 5.4.56 in container:
|
i will submit a mr soon |
This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open. |
@fatelei any update on this? |
Hi, I've been taking a stab at fixing this. My first attempt was using As an alternative, I swapped the mmap file creation to use fallocate instead of truncate. This thick provisions any mmapped file so we cannot trigger a sigbus at all. It does mean that we now have to track the size of the mmap file manually -- you cannot rely on the filedescriptor to accurately describe how much we have written. I accomplished this by modifying the vlog header to include used size. Any writes to the vlog now also update that header value. The relevant changes can be found here: https://github.com/astraw38/ristretto/tree/use-fallocate and https://github.com/astraw38/badger/tree/use-fallocate. Feedback welcome as to strategy as to when to enable fallocate usage. In this case I used build tags, but it could probably be done via options. TestingTesting was done using a custom script that would test the following scenarios: Disk is full on open
Disk is full during operationOperation that triggered the disk full returns an error. Additional thoughts welcome on how/if/when we should unblock writes when hitting disk-full, and then the disk is freed up. |
@ryanfoxtyler (or any maintainer?) could we get this re-opened please? |
When the disk is full, the process who open the badger database crashed. When it start again, it crash again:
The application is JuiceFS, which uses badger as the metadata engine.
The text was updated successfully, but these errors were encountered: