Skip to content

Remove rest of page limit #5429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 161 commits into from
Apr 16, 2025
Merged

Remove rest of page limit #5429

merged 161 commits into from
Apr 16, 2025

Conversation

dessalines
Copy link
Member

@dessalines dessalines commented Feb 14, 2025

This PR adds a lot, but its mostly boilerplate.

  • Adding a page_cursor, page_back, and limit to every API list fetch.
    • The cursor_data is always fetched before the DB view list function. Ideally the list function should only ever run 1 query.
    • Tried to clean up a lot of the pre-fetching in post_view to fit this model.
  • Adding the list, next_page and prev_page for the result of every API list fetch.
  • Adding a few missing indexes.

Other cleanups:

  • Making most functions in db_schema an db_views return LemmyResults, so no more error wrappers are necessary in the API code.
    • This meant consolidating and removing many error types.
  • Removed some redundant structs for PostTag.
  • Cleaned up keyword_block code a bit, and made sure it pre-fetches the keywords.

#4517

CouldntCreatePasswordResetRequest,
CouldntCreateLoginToken,
CouldntUpdateLocalSiteUrlBlocklist,
CouldntCreateEmailVerification,
Copy link
Member

@Nutomic Nutomic Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to have separate error messages for all of these? At least Create and Update errors could be merged, eg with ChangeTaglineError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth it, mainly because a failed create denotes a constraint conflict, while a failed update means its missing, or something else. They're already done anyway.

)]
#[cfg_attr(feature = "full", diesel(belongs_to(crate::source::post::Post)))]
#[cfg_attr(feature = "full", diesel(table_name = post_actions))]
#[cfg_attr(feature = "full", diesel(primary_key(person_id, post_id)))]
#[cfg_attr(feature = "full", diesel(check_for_backend(diesel::pg::Pg)))]
#[cfg_attr(feature = "full", ts(export))]
#[cfg_attr(feature = "full", cursor_keys_module(name = post_actions_keys))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? The library doesnt have any docs unfortunately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its for sorting. I use this in 2 upcoming PRs, that are fairly simple and will make it clear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CursorKeysModule derive here creates the db_schema::source::post::post_action_keys module. Until I add missing docs, you might understand it better by looking at references to the module's items. I couldn't quickly find a reference to post_actions_keys, but post view imports a similar module (post_keys as key).

let (_, id) = pids
.as_slice()
.first()
.ok_or(LemmyErrorType::CouldntParsePaginationToken)?;
Copy link
Member

@Nutomic Nutomic Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is used various times, it should go into a helper method on PaginationCursor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll do that now.

let sort = o.sort.unwrap_or_default();
let sort_direction = asc_if(sort == Old || sort == NameAsc);

let mut pq = paginate(query, sort_direction, o.cursor_data, None, o.page_back);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep using query variable here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It changes type after it passes through paginate, so I always change its name to paginated_query (or pq) to be more clear.


Ok(res)
.await
.with_lemmy_type(LemmyErrorType::NotFound)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add explicit NotFound error in so many places? Diesel's NotFound error is automatically converted to LemmyError NotFound so this shouldnt be necessary at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that all of these can explicitly return lemmy errors instead of diesel ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this rework was to try to remove all diesel errors from schema and db_views, and return LemmyResult for all the functions. Seems like a bad idea to only use diesel results for only these ones; I'd rather have the return types uniformly be lemmy results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and .map_err(diesel::result::Error::into) works, but thats not less verbose than .with_lemmy_type(LemmyErrorType::NotFound), and less explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? does the conversion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that'd require wrapping every result in Ok( .

local_site,
allowed_instances,
blocked_instances,
}))
})
.await?,
.await.map_err(|_e| LemmyErrorType::NotFound)?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its generally not a good idea to ignore errors, so use .with_lemmy_type() or simply remove map_err.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a way around this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right this one is strange, in other places it is handled like this:

    .await
    .map_err(|e| anyhow::anyhow!("err getting activity: {e:?}"))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Nutomic
Copy link
Member

Nutomic commented Apr 10, 2025

Is there currently any test coverage for pagination? Something like create a few dozen posts, get the first page, then get next few pages and get previous pages. If not it should be added, whether as Rust unit test, API test or the db perf check.

@dessalines
Copy link
Member Author

Is there currently any test coverage for pagination? Something like create a few dozen posts, get the first page, then get next few pages and get previous pages. If not it should be added, whether as Rust unit test, API test or the db perf check.

Here

.optional()?;

if let Some(largest_subscribed) = largest_subscribed {
// TODO Not sure this is correct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not correct, because it has to use the same sort as the normal query for correct results, and it has to use the same filters for full optimization

probably best to:

  • move everything currently in list to a new helper function like list_inner which takes additional parameters cursor_before_data: Option<Post>, community_id_for_prefetch: Option<CommunityId>
  • remove cursor_before_data and community_id_just_for_prefetch from PostQuery
  • make list just be a wrapper function that does mostly the same thing as the prefetch_cursor_before_data function here but uses 2 calls to list_inner (first one is for getting the upper bound)
  • also in the list wrapper function, add back the condition self.listing_type == Some(ListingType::Subscribed) and use if let to replace the self.local_user.is_some() condition, so getting an upper bound is only done for subscribed view

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to make this its own issue and reference your comment, because this needs to be handled separately, and should be done in a clean and clear way that doesn't involve calling the same query twice.

We also need to clarify what specifically this is used for (I think only backwards pagination for the subscribed query?). Because there are potentially much faster and clearer ways that we could be doing subscribed queries.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this will be fixed separately, then temporarily disable the optimization completely in this PR by changing o.cursor_before_data to None in the call to paginate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it's reasonable to remove a significant optimization in the name of "prettying the code", without actually having a plan to add it back. what's the point? why do you dislike calling a function so much?

It was 3 lines of code, and you made it into 10 lines of code that do something else, worse (incorrect).

clean and clear way that doesn't involve calling the same query twice

It literally has to be the same query, except for the single change. That's the whole point of it. Otherwise the result will not be valid.

We also need to clarify what specifically this is used for

It is used whenever multiple communities are queried, which is almost always. I think it's pretty clear from the comment in the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update indexes to match changes made here, including the removed uses of reversetimestampkey

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not sure about, because I didn't remove any published sort. Anyways fixing the indexes should probably wait on #5555

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_timestamp_key(published) must be replaced with published in the indexes

Copy link
Member Author

@dessalines dessalines Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example for one index:

"idx_post_featured_local_active" btree (featured_local DESC, hot_rank_active DESC, published DESC, id DESC)

You mean that should be changed to a PUBLISHED ASC?

There are like 20 of these indexes, so even if they should be changed, it should probably wait on #5555

Copy link
Collaborator

@dullbananas dullbananas Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that index stays the same. Only some indexes, including the one for the "old" sort type, used reverse_timestamp_key. Also I just realized that index changes might also be needed to match the removal of the featured column in the new/old sort. Both new and old sort will use the same index on (published, id), with both columns having the same direction.

These changes may be done later as long as you don't forget. (edit: I mentioned this in the issue you linked)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gotcha. I'll remove / edit those ones now to all use published desc then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(community_id, featured_community DESC, published DESC, id DESC) must be changed to (community_id, published DESC, id DESC) because sorting by featured is now under if sort != New && sort != Old

also there might be a similar index, without the community id, that needs to be changed too

Copy link
Member Author

@dessalines dessalines Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add one for that specifically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats done, but there are a ton of complicated post indexes, and they all need to actually be tested in practice as a part of #5555 .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_timestamp_key(published) must be replaced with published in the indexes

.optional()?;

if let Some(largest_subscribed) = largest_subscribed {
// TODO Not sure this is correct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this will be fixed separately, then temporarily disable the optimization completely in this PR by changing o.cursor_before_data to None in the call to paginate.

Copy link
Collaborator

@phiresky phiresky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this PR removes a significant optimization for no particular reason (above comment)

@phiresky
Copy link
Collaborator

The context is linked in the code comment, but I will post it here since people may have forgotten. The TL;DR is that this prefetch is the difference between every page load fetching hundreds of thousands of posts vs fetching ~100, a difference between multiple seconds (remember when we had 60+ second page loads?) and <10ms.

PR: #3872

Important Comment 1: #2877 (comment)
Important Comment 2: #2877 (comment)

Visualization:

@dessalines
Copy link
Member Author

My main issue isn't with that logic, it was just with the post_view recursively calling itself, rather than prefetching the cursor (which was part of this refactor).

I've added #5621 to the 1.0 milestone to make sure this gets added back in some form, so we'll make sure this gets re-added / fixed before any release.

I'm going to merge this (once CI passes) as its blocking a lot of other PRs that need reviewed.

@dessalines dessalines merged commit e431fce into main Apr 16, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants