Skip to content

perf: disable WAL for transpiled source cache #18084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 22, 2023

Conversation

bartlomieju
Copy link
Member

@bartlomieju bartlomieju commented Mar 9, 2023

Disable Write-Ahead Log for the cached module source database.

This brings SQLite connection cost on startup from 2.5% to 1.6%.

I'm still hesitant about it as disabling WAL might have negavitve
impact on multiple Deno processes running in parallel (ie. it may
cause module cache to become out of sync between processes
leading to strange transpilation bugs).

@bartlomieju bartlomieju marked this pull request as ready for review March 9, 2023 01:17
@bartlomieju bartlomieju marked this pull request as draft March 9, 2023 01:41
@dsherret
Copy link
Member

dsherret commented Mar 9, 2023

I'm still hesitant about it as disabling WAL might have negavitve
impact on multiple Deno processes running in parallel (ie. it may
cause module cache to become out of sync between processes
leading to strange transpilation bugs).

Deno always passes along a hash of the source text it's loaded and won't use the cache unless it matches, so I don't think that will be an issue.

-- enable write-ahead-logging mode
PRAGMA journal_mode=WAL;
-- Disable write-ahead-logging mode
PRAGMA journal_mode=Off;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it faster when the SQLite file is large?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear, I think it has more impact on highly concurrent usage.

Copy link
Member

@dsherret dsherret Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering because of this:

The traditional rollback journal works by writing a copy of the original unchanged database content into a separate rollback journal file and then writing changes directly into the database file. In the event of a crash or ROLLBACK, the original content contained in the rollback journal is played back into the database file to revert the database file to its original state. The COMMIT occurs when the rollback journal is deleted.

The WAL approach inverts this. The original content is preserved in the database file and the changes are appended into a separate WAL file. A COMMIT occurs when a special record indicating a commit is appended to the WAL. Thus a COMMIT can happen without ever writing to the original database, which allows readers to continue operating from the original unaltered database while changes are simultaneously being committed into the WAL. Multiple transactions can be appended to the end of a single WAL file.

https://www.sqlite.org/wal.html

@littledivy
Copy link
Member

Can we lazy initialize sqlite instead? Why do we need to open a sqlite file always

@dsherret
Copy link
Member

dsherret commented Mar 9, 2023

Can we lazy initialize sqlite instead? Why do we need to open a sqlite file always

@littledivy it helps with a major performance improvement to startup time which caches the swc dependency analysis for deno_graph #15502 -- maybe there is a better way than sqlite

@bartlomieju
Copy link
Member Author

I've combined executing pragmas and setting up tables into a single batch query. It seems to make SQLite show up on flamegraphs as 1.6% compared to 2.5%.

@bartlomieju bartlomieju marked this pull request as ready for review March 9, 2023 12:02
@bartlomieju bartlomieju requested a review from dsherret March 9, 2023 14:43
@dsherret
Copy link
Member

dsherret commented Mar 9, 2023

This brings SQLite connection cost on startup from 2.5% to 1.6%.

Can you update with how many records were in your sqlite database? It might be good to test this by inserting synthetic data into the database as well then testing at different sizes.

@mmastrac
Copy link
Contributor

mmastrac commented Mar 21, 2023

I cherry-picked these on top of 7cf6e2a

7cf6e2a -- base

e92d5d2196fdcb889641b476a54f2b1fea82f8d3 -- PR w/cherry-pick

There's at least ~.3ms speedup here with a trivial script (~8kB, no imports).

09:03 $ hyperfine --warmup 1000 -M 1000 -L commit 'e92d5d2196fdcb889641b476a54f2b1fea82f8d3,7cf6e2a7a642cbcbb695b0226aba7d24fb84971b' './deno-{commit} run miniscript.js'
Benchmark 1: ./deno-e92d5d2196fdcb889641b476a54f2b1fea82f8d3 run miniscript.js
  Time (mean ± σ):      13.5 ms ±   0.6 ms    [User: 9.3 ms, System: 2.7 ms]
  Range (min … max):    12.3 ms …  16.1 ms    190 runs
 
Benchmark 2: ./deno-7cf6e2a7a642cbcbb695b0226aba7d24fb84971b run miniscript.js
  Time (mean ± σ):      13.8 ms ±   0.5 ms    [User: 9.3 ms, System: 2.9 ms]
  Range (min … max):    12.5 ms …  15.7 ms    183 runs
 
Summary
  './deno-e92d5d2196fdcb889641b476a54f2b1fea82f8d3 run miniscript.js' ran
    1.02 ± 0.06 times faster than './deno-7cf6e2a7a642cbcbb695b0226aba7d24fb84971b run miniscript.js'

@bartlomieju
Copy link
Member Author

@mmastrac could you create a fake DB with 1000/10000 entries to see if there's still an improvement?

@mmastrac
Copy link
Contributor

mmastrac commented Mar 21, 2023

I have a couple of small concerns here:

  • First is w/potential for data corruption with no journal enabled. In the case where the database is un-openable, does the user have enough information to know to delete the correct file? While the journal is no guarantee of freedom-from-corruption, it's possible that no journaling at all might present some additional risk of corruption that even a more basic journal might protect against.

  • Second, I'm not sure if this is the correct way to disable the WAL for an existing database. From what I can understand from research, we're supposed to migrate to WAL-disabled by issuing pragma journal_mode = DELETE. I am not certain if journal_mode = OFF silently drops the WAL (which in theory should just force us to re-cache a bunch of data) or if it is ignored and continues to use the WAL.

I'm building a test database to investigate further.

@mmastrac
Copy link
Contributor

The benchmarks with a large database are virtually the same (benchmark order reversed to maintain the WAL for the test):

10:33 $ export DENO_DIR=~/.deno; hyperfine --warmup 100 -M 100 -L commit 'e92d5d2196fdcb889641b476a54f2b1fea82f8d3,7cf6e2a7a642cbcbb695b0226aba7d24fb84971b' './deno-{commit} run miniscript.js'
Benchmark 1: ./deno-e92d5d2196fdcb889641b476a54f2b1fea82f8d3 run miniscript.js
  Time (mean ± σ):      17.4 ms ±   0.7 ms    [User: 12.4 ms, System: 4.2 ms]
  Range (min … max):    16.2 ms …  20.1 ms    100 runs
 
Benchmark 2: ./deno-7cf6e2a7a642cbcbb695b0226aba7d24fb84971b run miniscript.js
  Time (mean ± σ):      17.8 ms ±   1.0 ms    [User: 12.5 ms, System: 4.5 ms]
  Range (min … max):    16.3 ms …  22.8 ms    100 runs

Copy link
Contributor

@mmastrac mmastrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR is correct. We may truncate some older cache WAL logs but I don't believe this is harmful long-term. I think there's a slightly higher chance of database corruption but I put together a PR to ensure that we don't silently fail and the user isn't confused:

https://github.com/denoland/deno/pull/18330/files

In the future we may want to investigate one of the alternate journaling modes, but I think this is good for now.

@bartlomieju bartlomieju enabled auto-merge (squash) March 21, 2023 23:45
@bartlomieju bartlomieju merged commit 619806d into denoland:main Mar 22, 2023
@bartlomieju bartlomieju deleted the perf_sqlite_pragma branch March 22, 2023 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants