Skip to content

Create indices for bookshelves_events table #10593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cdrini
Copy link
Collaborator

@cdrini cdrini commented Mar 18, 2025

Noticing ~1/3 of time of /search spent in db queries according to sentry profiles (link):
image

I created 50k dummy data rows locally, and ran the 3 types of queries we run against this table. The one multi-index seems to have removed all Seq Scans and drastically dropped down the query time!

Query Time Before Time After
Query 1 124.643 ms 5.099 ms
Query 2 44.402 ms 1.072 ms
Query 3 47.150 ms 0.108 ms

It might be worth experimenting with adding more indices, but this seems sufficient at the moment. And it would be good to run the below queries on ol-db1 as well before/after.

-- Sample check-in button query
EXPLAIN ANALYZE SELECT id, event_date FROM bookshelves_events
WHERE username='user_42' AND work_id IN (1234, 143)
AND event_type=3
ORDER BY event_date DESC LIMIT 1;

-- Sample filtered reading log query
EXPLAIN ANALYZE SELECT b.work_id, b.created, b.edition_id
FROM bookshelves_books b
INNER JOIN bookshelves_events e
ON b.work_id = e.work_id AND b.username = e.username
WHERE b.username = 'user_42'
AND e.event_date LIKE '2024%'
ORDER BY b.created DESC;

-- Find all books with events for a given user
EXPLAIN ANALYZE SELECT DISTINCT ON (work_id) work_id, * FROM bookshelves_events
WHERE username='user_42'
    AND event_type=3
    AND event_date LIKE '2024%'
ORDER BY work_id, updated DESC;

Results without index:

Query 1:

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------   
 Limit  (cost=1961.71..1961.71 rows=1 width=9) (actual time=120.035..120.035 rows=0 loops=1)
   ->  Sort  (cost=1961.71..1961.71 rows=1 width=9) (actual time=118.147..118.147 rows=0 loops=1)
         Sort Key: event_date
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on bookshelves_events  (cost=0.00..1961.70 rows=1 width=9) (actual time=114.946..114.946 rows=0 loops=1)    
               Filter: ((work_id = ANY ('{1234,143}'::integer[])) AND (username = 'user_42'::text) AND (event_type = 3))
               Rows Removed by Filter: 70497
 Total runtime: 124.643 ms

Query 2:

                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1793.91..1793.92 rows=1 width=16) (actual time=44.339..44.339 rows=0 loops=1)
   Sort Key: b.created
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=0.41..1793.90 rows=1 width=16) (actual time=44.308..44.308 rows=0 loops=1)
         ->  Seq Scan on bookshelves_events e  (cost=0.00..1785.45 rows=1 width=13) (actual time=39.111..44.266 rows=1 loops=1)
               Filter: ((event_date ~~ '2024%'::text) AND (username = 'user_42'::text))
               Rows Removed by Filter: 70496
         ->  Index Scan using bookshelves_books_pkey on bookshelves_books b  (cost=0.41..8.44 rows=1 width=25) (actual time=0.032..0.032 rows=0 loops=1)
               Index Cond: ((username = 'user_42'::text) AND (work_id = e.work_id))
 Total runtime: 44.402 ms

Query 3:

                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1961.71..1961.72 rows=1 width=78) (actual time=47.110..47.111 rows=1 loops=1)
   ->  Sort  (cost=1961.71..1961.71 rows=1 width=78) (actual time=47.108..47.108 rows=1 loops=1)
         Sort Key: work_id, updated
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on bookshelves_events  (cost=0.00..1961.70 rows=1 width=78) (actual time=36.892..45.651 rows=1 loops=1)
               Filter: ((event_date ~~ '2024%'::text) AND (username = 'user_42'::text) AND (event_type = 3))
               Rows Removed by Filter: 70496
 Total runtime: 47.150 ms

Results after creating multi-key index:

Query 1:

                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12.88..12.89 rows=1 width=9) (actual time=0.301..0.301 rows=0 loops=1)
   ->  Sort  (cost=12.88..12.89 rows=1 width=9) (actual time=0.299..0.299 rows=0 loops=1)
         Sort Key: event_date
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using bookshelves_events_user_checkins_idx on bookshelves_events  (cost=0.42..12.87 rows=1 width=9) (actual time=0.288..0.288 rows=0 loops=1)
               Index Cond: ((username = 'user_42'::text) AND (work_id = ANY ('{1234,143}'::integer[])) AND (event_type = 3))
 Total runtime: 5.099 ms

Query 2:

                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=38.97..38.98 rows=1 width=16) (actual time=0.120..0.120 rows=0 loops=1)
   Sort Key: b.created
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=4.89..38.96 rows=1 width=16) (actual time=0.109..0.109 rows=0 loops=1)
         ->  Bitmap Heap Scan on bookshelves_events e  (cost=4.47..30.52 rows=1 width=13) (actual time=0.079..0.080 rows=1 loops=1)
               Recheck Cond: (username = 'user_42'::text)
               Filter: (event_date ~~ '2024%'::text)
               Rows Removed by Filter: 6
               ->  Bitmap Index Scan on bookshelves_events_user_checkins_idx  (cost=0.00..4.47 rows=7 width=0) (actual time=0.039..0.039 rows=7 loops=1)
                     Index Cond: (username = 'user_42'::text)
         ->  Index Scan using bookshelves_books_pkey on bookshelves_books b  (cost=0.41..8.44 rows=1 width=25) (actual time=0.025..0.025 rows=0 loops=1)
               Index Cond: ((username = 'user_42'::text) AND (work_id = e.work_id))
 Total runtime: 1.072 ms

Query 3:

                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=26.97..26.98 rows=1 width=78) (actual time=0.065..0.065 rows=1 loops=1)
   ->  Sort  (cost=26.97..26.97 rows=1 width=78) (actual time=0.064..0.064 rows=1 loops=1)
         Sort Key: work_id, updated
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on bookshelves_events  (cost=4.49..26.96 rows=1 width=78) (actual time=0.055..0.055 rows=1 loops=1)
               Recheck Cond: ((username = 'user_42'::text) AND (event_type = 3))
               Filter: (event_date ~~ '2024%'::text)
               Rows Removed by Filter: 5
               ->  Bitmap Index Scan on bookshelves_events_user_checkins_idx  (cost=0.00..4.49 rows=6 width=0) (actual time=0.032..0.032 rows=6 loops=1)
                     Index Cond: ((username = 'user_42'::text) AND (event_type = 3))
 Total runtime: 0.108 ms

Technical

Testing

Screenshot

Stakeholders

@cdrini cdrini added Theme: Performance Issues related to UI or Server performance. [managed] Needs: Special Deploy This PR will need a non-standard deploy to production Needs: Testing labels Mar 18, 2025
@cdrini
Copy link
Collaborator Author

cdrini commented Mar 18, 2025

Tested on prod, it shows a solid improvement in EXPLAIN/ANALYZE!

Before:
Total runtime: 91.175 ms
Total runtime: 92.013 ms
Total runtime: 109.075 ms

After:
Total runtime: 0.111 ms
Total runtime: 0.264 ms
Total runtime: 11.682 ms
On prod before:

Query 1:

                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8141.30..8141.30 rows=1 width=13) (actual time=91.150..91.150 rows=0 loops=1)
   ->  Sort  (cost=8141.30..8141.30 rows=1 width=13) (actual time=91.150..91.150 rows=0 loops=1)
         Sort Key: event_date
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on bookshelves_events  (cost=0.00..8141.28 rows=1 width=13) (actual time=91.142..91.142 rows=0 loops=1)
               Filter: ((work_id = ANY ('{1234,143}'::integer[])) AND (username = 'ScarTissue'::text) AND (event_type = 3))
               Rows Removed by Filter: 221828
 Total runtime: 91.175 ms

Query 2:

                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=7611.91..7611.92 rows=1 width=16) (actual time=91.941..91.942 rows=8 loops=1)
   Sort Key: b.created
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=0.56..7611.90 rows=1 width=16) (actual time=44.171..91.895 rows=8 loops=1)
         ->  Seq Scan on bookshelves_events e  (cost=0.00..7601.53 rows=4 width=15) (actual time=42.047..87.095 rows=8 loops=1)
               Filter: ((event_date ~~ '2024%'::text) AND (username = 'ScarTissue'::text))
               Rows Removed by Filter: 221820
         ->  Index Scan using bookshelves_books_pkey on bookshelves_books b  (cost=0.56..2.58 rows=1 width=27) (actual time=0.592..0.593 rows=1 loops=8)
               Index Cond: ((username = 'ScarTissue'::text) AND (work_id = e.work_id))
 Total runtime: 92.013 ms

Query 3:

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=8141.32..8141.35 rows=2 width=84) (actual time=109.023..109.031 rows=8 loops=1)
   ->  Sort  (cost=8141.32..8141.34 rows=4 width=84) (actual time=109.020..109.021 rows=8 loops=1)
         Sort Key: work_id, updated
         Sort Method: quicksort  Memory: 26kB
         ->  Seq Scan on bookshelves_events  (cost=0.00..8141.28 rows=4 width=84) (actual time=61.809..107.576 rows=8 loops=1)
               Filter: ((event_date ~~ '2024%'::text) AND (username = 'ScarTissue'::text) AND (event_type = 3))
               Rows Removed by Filter: 221820
 Total runtime: 109.075 ms

After:

Query 1:

                                                                              QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3.88..3.89 rows=1 width=13) (actual time=0.049..0.049 rows=0 loops=1)
   ->  Sort  (cost=3.88..3.89 rows=1 width=13) (actual time=0.049..0.049 rows=0 loops=1)
         Sort Key: event_date
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using bookshelves_events_user_checkins_idx on bookshelves_events  (cost=0.42..3.87 rows=1 width=13) (actual time=0.043..0.043 rows=0 loops=1)
               Index Cond: ((username = 'ScarTissue'::text) AND (work_id = ANY ('{1234,143}'::integer[])) AND (event_type = 3))
 Total runtime: 0.111 ms
(7 rows)

Query 2:

                                                                                 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=16.03..16.04 rows=1 width=16) (actual time=0.230..0.231 rows=8 loops=1)
   Sort Key: b.created
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=0.98..16.02 rows=1 width=16) (actual time=0.095..0.218 rows=8 loops=1)
         ->  Index Only Scan using bookshelves_events_user_checkins_idx on bookshelves_events e  (cost=0.42..5.65 rows=4 width=15) (actual time=0.033..0.050 rows=8 loops=1)
               Index Cond: ((username = 'ScarTissue'::text) AND (event_date >= '2024'::text) AND (event_date < '2025'::text))
               Filter: (event_date ~~ '2024%'::text)
               Heap Fetches: 8
         ->  Index Scan using bookshelves_books_pkey on bookshelves_books b  (cost=0.56..2.58 rows=1 width=27) (actual time=0.019..0.020 rows=1 loops=8)
               Index Cond: ((username = 'ScarTissue'::text) AND (work_id = e.work_id))
 Total runtime: 0.264 ms

Query 3:

                                                                              QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=5.73..5.75 rows=2 width=84) (actual time=11.627..11.641 rows=8 loops=1)
   ->  Sort  (cost=5.73..5.74 rows=4 width=84) (actual time=11.625..11.626 rows=8 loops=1)
         Sort Key: work_id, updated
         Sort Method: quicksort  Memory: 26kB
         ->  Index Scan using bookshelves_events_user_checkins_idx on bookshelves_events  (cost=0.42..5.69 rows=4 width=84) (actual time=2.606..11.603 rows=8 loops=1)
               Index Cond: ((username = 'ScarTissue'::text) AND (event_type = 3) AND (event_date >= '2024'::text) AND (event_date < '2025'::text))
               Filter: (event_date ~~ '2024%'::text)
 Total runtime: 11.682 ms

@mekarpeles mekarpeles merged commit 7691405 into internetarchive:master Mar 18, 2025
4 checks passed
@cdrini cdrini deleted the feature/checkins-db-indices branch March 18, 2025 18:03
@cdrini cdrini removed Needs: Special Deploy This PR will need a non-standard deploy to production Needs: Testing labels Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Theme: Performance Issues related to UI or Server performance. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants