Remove rest of page limit #5429

dessalines · 2025-02-14T20:18:56Z

This PR adds a lot, but its mostly boilerplate.

Adding a page_cursor, page_back, and limit to every API list fetch.
- The cursor_data is always fetched before the DB view list function. Ideally the list function should only ever run 1 query.
- Tried to clean up a lot of the pre-fetching in post_view to fit this model.
Adding the list, next_page and prev_page for the result of every API list fetch.
Adding a few missing indexes.

Other cleanups:

Making most functions in db_schema an db_views return LemmyResults, so no more error wrappers are necessary in the API code.
- This meant consolidating and removing many error types.
Removed some redundant structs for PostTag.
Cleaned up keyword_block code a bit, and made sure it pre-fetches the keywords.

Nutomic · 2025-04-10T08:30:04Z

crates/utils/src/error.rs

+  CouldntCreatePasswordResetRequest,
+  CouldntCreateLoginToken,
+  CouldntUpdateLocalSiteUrlBlocklist,
+  CouldntCreateEmailVerification,


Is it really necessary to have separate error messages for all of these? At least Create and Update errors could be merged, eg with ChangeTaglineError

I think its worth it, mainly because a failed create denotes a constraint conflict, while a failed update means its missing, or something else. They're already done anyway.

Nutomic · 2025-04-10T08:33:28Z

crates/db_schema/src/source/post.rs

 )]
 #[cfg_attr(feature = "full", diesel(belongs_to(crate::source::post::Post)))]
 #[cfg_attr(feature = "full", diesel(table_name = post_actions))]
 #[cfg_attr(feature = "full", diesel(primary_key(person_id, post_id)))]
 #[cfg_attr(feature = "full", diesel(check_for_backend(diesel::pg::Pg)))]
 #[cfg_attr(feature = "full", ts(export))]
+#[cfg_attr(feature = "full", cursor_keys_module(name = post_actions_keys))]


What is this for? The library doesnt have any docs unfortunately.

Its for sorting. I use this in 2 upcoming PRs, that are fairly simple and will make it clear.

The CursorKeysModule derive here creates the db_schema::source::post::post_action_keys module. Until I add missing docs, you might understand it better by looking at references to the module's items. I couldn't quickly find a reference to post_actions_keys, but post view imports a similar module (post_keys as key).

Nutomic · 2025-04-10T08:42:25Z

crates/db_views/src/post/post_view.rs

+    let (_, id) = pids
+      .as_slice()
+      .first()
+      .ok_or(LemmyErrorType::CouldntParsePaginationToken)?;


This logic is used various times, it should go into a helper method on PaginationCursor.

Good idea, I'll do that now.

Nutomic · 2025-04-10T08:44:31Z

crates/db_views/src/community/community_view.rs

+    let sort = o.sort.unwrap_or_default();
+    let sort_direction = asc_if(sort == Old || sort == NameAsc);
+
+    let mut pq = paginate(query, sort_direction, o.cursor_data, None, o.page_back);


Why not keep using query variable here?

It changes type after it passes through paginate, so I always change its name to paginated_query (or pq) to be more clear.

Nutomic · 2025-04-10T08:45:37Z

crates/db_views/src/community/community_follower_view.rs

-
-    Ok(res)
+      .await
+      .with_lemmy_type(LemmyErrorType::NotFound)


Why add explicit NotFound error in so many places? Diesel's NotFound error is automatically converted to LemmyError NotFound so this shouldnt be necessary at all.

So that all of these can explicitly return lemmy errors instead of diesel ones.

Again these get converted automatically.

https://github.com/LemmyNet/lemmy/blob/main/crates/utils/src/error.rs#L221

Part of this rework was to try to remove all diesel errors from schema and db_views, and return LemmyResult for all the functions. Seems like a bad idea to only use diesel results for only these ones; I'd rather have the return types uniformly be lemmy results.

I tried and .map_err(diesel::result::Error::into) works, but thats not less verbose than .with_lemmy_type(LemmyErrorType::NotFound), and less explicit.

? does the conversion

Unfortunately that'd require wrapping every result in Ok( .

Nutomic · 2025-04-10T08:55:16Z

crates/apub/src/lib.rs

          local_site,
          allowed_instances,
          blocked_instances,
        }))
      })
-      .await?,
+      .await.map_err(|_e| LemmyErrorType::NotFound)?,


Its generally not a good idea to ignore errors, so use .with_lemmy_type() or simply remove map_err.

I couldn't find a way around this one.

Right this one is strange, in other places it is handled like this:

.await .map_err(|e| anyhow::anyhow!("err getting activity: {e:?}"))

Nutomic · 2025-04-10T09:01:23Z

Is there currently any test coverage for pagination? Something like create a few dozen posts, get the first page, then get next few pages and get previous pages. If not it should be added, whether as Rust unit test, API test or the db perf check.

crates/db_views/src/post/post_view.rs

dessalines · 2025-04-10T14:28:45Z

Is there currently any test coverage for pagination? Something like create a few dozen posts, get the first page, then get next few pages and get previous pages. If not it should be added, whether as Rust unit test, API test or the db perf check.

Here

crates/db_schema/src/utils.rs

crates/db_schema/src/newtypes.rs

crates/db_views/src/community/community_view.rs

crates/db_views/src/post/post_view.rs

dullbananas · 2025-04-13T05:09:11Z

crates/db_views/src/post/post_view.rs

+      .optional()?;
+
+    if let Some(largest_subscribed) = largest_subscribed {
+      // TODO Not sure this is correct


not correct, because it has to use the same sort as the normal query for correct results, and it has to use the same filters for full optimization

probably best to:

move everything currently in list to a new helper function like list_inner which takes additional parameters cursor_before_data: Option<Post>, community_id_for_prefetch: Option<CommunityId>

remove cursor_before_data and community_id_just_for_prefetch from PostQuery

make list just be a wrapper function that does mostly the same thing as the prefetch_cursor_before_data function here but uses 2 calls to list_inner (first one is for getting the upper bound)

also in the list wrapper function, add back the condition self.listing_type == Some(ListingType::Subscribed) and use if let to replace the self.local_user.is_some() condition, so getting an upper bound is only done for subscribed view

I'm going to make this its own issue and reference your comment, because this needs to be handled separately, and should be done in a clean and clear way that doesn't involve calling the same query twice.

We also need to clarify what specifically this is used for (I think only backwards pagination for the subscribed query?). Because there are potentially much faster and clearer ways that we could be doing subscribed queries.

If this will be fixed separately, then temporarily disable the optimization completely in this PR by changing o.cursor_before_data to None in the call to paginate.

i don't think it's reasonable to remove a significant optimization in the name of "prettying the code", without actually having a plan to add it back. what's the point? why do you dislike calling a function so much?

It was 3 lines of code, and you made it into 10 lines of code that do something else, worse (incorrect).

clean and clear way that doesn't involve calling the same query twice

It literally has to be the same query, except for the single change. That's the whole point of it. Otherwise the result will not be valid.

We also need to clarify what specifically this is used for

It is used whenever multiple communities are queried, which is almost always. I think it's pretty clear from the comment in the code.

crates/db_views/src/post/post_view.rs

crates/db_views/src/site/local_image_view.rs

dullbananas · 2025-04-13T05:26:23Z

crates/db_views/src/post/post_view.rs

Update indexes to match changes made here, including the removed uses of reversetimestampkey

This one I'm not sure about, because I didn't remove any published sort. Anyways fixing the indexes should probably wait on #5555

reverse_timestamp_key(published) must be replaced with published in the indexes

As an example for one index:

"idx_post_featured_local_active" btree (featured_local DESC, hot_rank_active DESC, published DESC, id DESC)

You mean that should be changed to a PUBLISHED ASC?

There are like 20 of these indexes, so even if they should be changed, it should probably wait on #5555

No, that index stays the same. Only some indexes, including the one for the "old" sort type, used reverse_timestamp_key. Also I just realized that index changes might also be needed to match the removal of the featured column in the new/old sort. Both new and old sort will use the same index on (published, id), with both columns having the same direction.

These changes may be done later as long as you don't forget. (edit: I mentioned this in the issue you linked)

Oh gotcha. I'll remove / edit those ones now to all use published desc then.

(community_id, featured_community DESC, published DESC, id DESC) must be changed to (community_id, published DESC, id DESC) because sorting by featured is now under if sort != New && sort != Old

also there might be a similar index, without the community id, that needs to be changed too

I'll add one for that specifically.

Thats done, but there are a ton of complicated post indexes, and they all need to actually be tested in practice as a part of #5555 .

crates/db_schema/src/utils.rs

migrations/2025-04-09-123352_cursor_pagination_indexes/up.sql

dullbananas · 2025-04-15T00:10:39Z

crates/db_views/src/post/post_view.rs

reverse_timestamp_key(published) must be replaced with published in the indexes

dullbananas · 2025-04-15T00:21:34Z

crates/db_views/src/post/post_view.rs

+      .optional()?;
+
+    if let Some(largest_subscribed) = largest_subscribed {
+      // TODO Not sure this is correct


If this will be fixed separately, then temporarily disable the optimization completely in this PR by changing o.cursor_before_data to None in the call to paginate.

phiresky

seems like this PR removes a significant optimization for no particular reason (above comment)

phiresky · 2025-04-15T20:15:24Z

The context is linked in the code comment, but I will post it here since people may have forgotten. The TL;DR is that this prefetch is the difference between every page load fetching hundreds of thousands of posts vs fetching ~100, a difference between multiple seconds (remember when we had 60+ second page loads?) and <10ms.

PR: #3872

Important Comment 1: #2877 (comment)
Important Comment 2: #2877 (comment)

Visualization:

dessalines · 2025-04-16T00:03:04Z

My main issue isn't with that logic, it was just with the post_view recursively calling itself, rather than prefetching the cursor (which was part of this refactor).

I've added #5621 to the 1.0 milestone to make sure this gets added back in some form, so we'll make sure this gets re-added / fixed before any release.

I'm going to merge this (once CI passes) as its blocking a lot of other PRs that need reviewed.

Nutomic added 30 commits February 7, 2025 12:16

migration

5ca40a7

update code

a3b835f

tests

549e51f

triggers

43ef0b5

Merge branch 'main' into remove-aggregate-tables

2ea59bf

fix

31cbe18

fmt

cd29e85

clippy

61c5038

post aggregate migration

b27aea0

changes for post aggregate code

bb4bfe3

wip: update tests for post aggregate

17d8d98

format

4124f03

fix partialeq

c9aaaf0

Merge branch 'main' into remove-aggregate-tables

068aded

trigger fix

a88845e

fix post insert trigger

1a1c366

wip

af3035b

reorder

41112a9

fixes

bb30add

Merge branch 'main' into remove-aggregate-tables

8cd04b4

Merge branch 'main' into remove-aggregate-tables

f2d2852

community aggregate migration

794240c

update code

082cdfd

triggers

261c0a1

person aggregate migration

f145565

person aggregate code

1ee4e63

person triggers

f29c21d

test fixes

8c92d25

fix scheduled task

707c418

update api tests

85fc71d

Nutomic reviewed Apr 10, 2025

View reviewed changes

crates/db_views/src/post/post_view.rs Show resolved Hide resolved

dessalines mentioned this pull request Apr 10, 2025

Don't read the LocalUserKeywords within PostView. #5605

Closed

dessalines and others added 4 commits April 10, 2025 11:16

Adding a first_id helper function.

10f1c09

Fixing api_common

005ecfe

Addressing PR comments

096b718

Merge branch 'main' into remove_rest_of_page_limit

b4c3aea

dullbananas requested changes Apr 13, 2025

View reviewed changes

Addressing PR comments 1.

2a264ea

dessalines mentioned this pull request Apr 14, 2025

Fix post_view.prefetch_cursor_before_data #5621

Closed

Using id desc.

d08a75e

dullbananas requested changes Apr 15, 2025

View reviewed changes

dessalines added 4 commits April 15, 2025 11:55

Addressing PR comments 2.

d88c73b

Removing the reverse_timestamp keys for the post table.

53b377d

Make community_title and community_lower_name indexes desc

424e8e5

Remove featured_community from post sort

a6d4d01

dullbananas approved these changes Apr 15, 2025

View reviewed changes

phiresky requested changes Apr 15, 2025

View reviewed changes

Forgot to drop index.

bb2c215

dessalines merged commit e431fce into main Apr 16, 2025
2 checks passed

dessalines mentioned this pull request Apr 16, 2025

Embed all necessary data in pagination cursors #4982

Closed

5 tasks

Uh oh!

Remove rest of page limit #5429

Remove rest of page limit #5429

Uh oh!

Conversation

dessalines commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nutomic Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nutomic Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nutomic commented Apr 10, 2025

Uh oh!

Uh oh!

dessalines commented Apr 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dessalines Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dessalines commented Feb 14, 2025 •

edited

Loading

Nutomic Apr 10, 2025 •

edited

Loading

Nutomic Apr 10, 2025 •

edited

Loading

dessalines Apr 15, 2025 •

edited

Loading

dullbananas Apr 15, 2025 •

edited

Loading

dessalines Apr 15, 2025 •

edited

Loading

phiresky left a comment •

edited

Loading