Skip to content

Commit 48791b0

Browse files
authored
zstd: Improve best compression's match selection (#705)
The best encoder selects matches based on the criterion a.est+(a.s-b.s)*bitsPerByte>>10 < b.est+(b.s-a.s)*bitsPerByte>>10 If this were computed on the reals, it would be equivalent to a.est < b.est, so the added terms only capture round-off error (this is also why CSE doesn't eliminate them). Changing the formula to a.est-b.est+(a.s-b.s)*bitsPerByte>>10 < 0 captures the intention better, I think, and improves compression: enwik9 260989017 259699309 -0.4942% silesia/dickens 3233958 3222189 -0.3639% silesia/mozilla 16980973 16912341 -0.4042% silesia/mr 3505223 3505553 0.0094% silesia/nci 2313702 2289871 -1.0300% silesia/ooffice 2915199 2896410 -0.6445% silesia/osdb 3364752 3390871 0.7763% silesia/reymont 1658404 1656006 -0.1446% silesia/samba 4330660 4326783 -0.0895% silesia/sao 5399736 5416932 0.3185% silesia/webster 9987784 9966351 -0.2146% silesia/xml 542081 538378 -0.6831% silesia/x-ray 5756210 5733061 -0.4022% ... as well as throughput: name old speed new speed delta Encoder_EncodeAllSimple/best-8 12.1MB/s ± 1% 12.2MB/s ± 1% +1.17% (p=0.000 n=18+20) Encoder_EncodeAllSimple4K/best-8 10.4MB/s ± 1% 10.5MB/s ± 1% +0.82% (p=0.000 n=20+20)
1 parent e5c6ce2 commit 48791b0

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

zstd/enc_best.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ encodeLoop:
190190
}
191191

192192
bestOf := func(a, b match) match {
193-
if a.est+(a.s-b.s)*bitsPerByte>>10 < b.est+(b.s-a.s)*bitsPerByte>>10 {
193+
if a.est-b.est+(a.s-b.s)*bitsPerByte>>10 < 0 {
194194
return a
195195
}
196196
return b

0 commit comments

Comments
 (0)