Skip to content

Community post tags (part 3/3: federation) #5636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Community post tags (part 3/3: federation) #5636

wants to merge 12 commits into from

Conversation

phiresky
Copy link
Collaborator

@phiresky phiresky commented Apr 22, 2025

This is the third and final part of post tags, implementing federation of tags.

No new activities are added - federation is purely done through the existing UpdateCommunity and CreateOrUpdatePost activities.

  • On the group (community), the available tags for posts are attached as a lemmy:tagsForPosts attribute.
  • On each page (post), the associated tags are attached within the tag: attribute (which comes from the ActivityStreams vocabulary)

Example community activity:

{
  "id": "http://lemmy-alpha:8541/activities/update/971bc72f-e316-4efd-a096-523b65ded34c",
  "actor": "http://lemmy-alpha:8541/u/lemmy_alpha",
  "to": [
    "http://lemmy-alpha:8541/c/6PbAFiVY80",
    "https://www.w3.org/ns/activitystreams#Public"
  ],
  "object": {
    "type": "Group",
    "id": "http://lemmy-alpha:8541/c/6PbAFiVY80",
    "name": "6PbAFiVY80",
    /* ... */
    "lemmy:tagsForPosts": [
      {
        "type": "lemmy:CommunityTag",
        "id": "http://lemmy-alpha:8541/c/6PbAFiVY80/tag/nneflhujpm",
        "display_name": "Some Post Tag Name"
      }
    ]
  },
  "cc": ["http://lemmy-alpha:8541/c/6PbAFiVY80"],
  "type": "Update"
}

Example post activity:

{
  "actor": "http://lemmy-alpha:8541/c/6PbAFiVY80",
  "type": "Announce",
  "id": "http://lemmy-alpha:8541/activities/announce/page/9bf77a89-a248-47fd-9778-4f1ba16c6394",
  "to": [
    "http://lemmy-alpha:8541/c/6PbAFiVY80",
    "https://www.w3.org/ns/activitystreams#Public"
  ],
  "cc": ["http://lemmy-alpha:8541/c/6PbAFiVY80/followers"],
  "object": {
    "id": "http://lemmy-alpha:8541/post/1",
    "actor": "http://lemmy-alpha:8541/u/lemmy_alpha",
    "type": "Page",
    /* ... */
    "tag": [
      {
        "type": "lemmy:CommunityTag",
        "id": "http://lemmy-alpha:8541/c/6PbAFiVY80/tag/eomfsnm9om",
        "display_name": "eoMFSNm9Om"
      },
      {
        "type": "lemmy:CommunityTag",
        "id": "http://lemmy-alpha:8541/c/6PbAFiVY80/tag/k8mm66irqo",
        "display_name": "k8Mm66iRQO"
      },
      {
        "href": "http://lemmy-alpha:8541/post/1",
        "name": "#uFKPYSi5d_",
        "type": "Hashtag"
      }
    ]
  }
}

Older Lemmy versions should silently ignore the new properties due to deserialize_skip_error on the Page tag property.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern is that there are several conflicting cases of community tag updates, which need to validate authorship and community tags.

Look at crates/api_common/src/tags.rs and possibly move it so that both the apub and API can take advantage of it.


impl PostTag {
pub async fn set(pool: &mut DbPool<'_>, tags: &[PostTagForm]) -> LemmyResult<Vec<Self>> {
let post_id = tags.first().map(|t| t.post_id).unwrap_or_default();
PostTag::delete_for_post(pool, post_id).await?;
PostTag::create_many(pool, tags).await
}
pub async fn set_from_apub(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between this and update_post_tags( ?

They seem to be overlapping, and they both need to be validating the tags and the author. update_post_tags does this, but this one doesn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • set_from_apub trusts the federation event, since it is assumed the sending instance has verified authorship of the post changes (should be the same as other post edits I guess? Post::insert_apub also doesn't do author checks). The surrounding function can only be called if the rest of the post is currently being updated, so that can only happen if the permissions fit, no?

  • set_from_apub looks up via ap_id and not id

The is_superset vs community_id check could I think be merged into the same thing of the methods. It would just add an additional unnecessary DB query to the update_post_tags function.

And the lookup via ap_id I don't think could be merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for is_author or community mod seems unnecessary both via api and federation. In both cases only the author can create or edit a post. But both federation and api need to check that the tag is valid for the community. To convert ap_id to db id can use a separate function which internally calls update_post_tags for the tag validation and db insert.

pool: &mut DbPool<'_>,
community_id: CommunityId,
ap_tags: Vec<TagInsertForm>,
) -> LemmyResult<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all this below, you could just do an upsert, which we have a lot of over the codebase.

https://github.com/LemmyNet/lemmy/blob/main/crates/db_schema/src/impls/person.rs#L74

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An upsert can't handle deletes, can it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not DB deletes, but here its just the deleted column, which makes it an update, so upsert should work.

Comment on lines +82 to +85
/// We add available post tags as a custom field on the group. another option would be to use the
/// standard "attachments" property, but it seems unlikely for there to be a benefit to that.
#[serde(default, rename = "lemmy:tagsForPosts")]
pub(crate) tags_for_posts: Vec<LemmyCommunityTag>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lemmy: necessary here? For all our other custom values/fields we just put "value": "lemmy:value" in https://join-lemmy.org/context.json

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True the prefix is unnecessary. The name would also be simpler with post_tags.

Copy link
Collaborator Author

@phiresky phiresky Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not strictly necessary, yes. but with how much of a mess apub already is, I think it's important to not make the mess worse.

Lemmy doesn't even send its full context anymore, instead referencing a centrally hosted context.json file. Which is probably good because of how we would otherwise send hundreds of copies of that context around per second. But as a reminder, this context is theoretically essential for being able to correctly parse every activity. And this file is not at all versioned, can not be versioned, and there is no way to actually guarantee correctness - how would a client now how long to cache it? What if there is a (e.g. political) split in the future or the domain goes away?

Really, the use of json-ld in activitypub is (or has become) a farce, with most implementations just completely ignoring it and hoping for the best. It's a wonder it still mostly works. If there was an actual AP test suite, almost all implementations would fail it. There's many discussions about how json-ld should be used (e.g. here), but I think the least we can do is be unambiguous. Directly prefixing an extension to make it clear it is our extension allows the resulting objects to be parsed unambiguously for everyone without actually implementing any JSON-LD-specific shenanigans.

We know that this variable is a lemmy-introduced extension to the spec. As such, it should be clear to every implementor that it is so. The self contained way to do this is to prefix the variable directly. The alternative would be ambiguity, and you can imagine every other AP implementor adding a // this comes from lemmy comment before their tagsForPosts variable, just like we add // this is a peertube extension comment on some other property.

TL;DR: inlining the prefix is a middle ground where both people who hate json-ld and people who think it's essential can meet, since it is unambiguous and future-proof without actually supporting json-ld.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Youre right JSON-LD is practically unused and all the major platforms use simple JSON. So supporting it is not really a priority. If we add lemmy: prefix for our custom fields that should be consistent for all fields, and should be done in a separate issue/pr, not in the middle of an unrelated change. Besides other platforms like Peertube are unlikely to change it either.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not really possible to change later or for the other properties because we don't implement Json-LD, which means the right time to do this is now in order to not make the situation worse

Copy link
Member

@dessalines dessalines May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think consistency is important for maintainabily also. Because then instead of finding 20 different ways of doing things and naming conventions, you keep strictly to one. Then later on, when you realize a rework is necessary, you can fix them all at once, in a single PR. I'm also fairly strict on naming and standardized conventions for that reason.

When you find something that could be better, its best to stick to the existing convention, and open up a separate issue where you can do the rework for all of them at once. This is better than leaving scattered inconsistencies around.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON-LD people seem to think that names of extension properties should be fully qualified:

https://codeberg.org/fediverse/fep/src/branch/main/fep/e229/fep-e229.md#ld-unaware-producers

E.g. https://join-lemmy.org/ns#tagsForPosts

I prefer the shortest form, though. Fully qualified and prefixed forms are only necessary if you expect a conflict with another application.

Copy link
Collaborator Author

@phiresky phiresky May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, people always prefer shorter because it takes less space and looks cleaner. The problem is that it's ambiguous, and since lemmy doesn't even send the context anymore it's not well defined either because there is no versioning of the context, no integrity checks (sth like SRI), no real caching policy, and it's centralized to an actor/domain that may or may not exist in the future. These are all complex topics that have very fine tuned solutions in e.g. HTTP/HTML, and are basically handled as "i'm sure it will work out" in AP, which makes it important for everyone to be considerate to the other parties involved and help each other out where they can.

Of course within lemmy it will work fine, but it places the burden on everyone else.

if you expect a conflict

That's a terrible way to think about this. It implies that we are aware of every fediverse application that exists and can predict every application that will ever exist. Making something intentionally ambiguous just because it is a tiny bit prettier.

I understand that people have opinions, but it's pretty demotivating to be so misaligned on this, when to me it feels like what everyone here wants is actively hostile to the fediverse as a whole. It feels obvious to me as well to try and not make the situation worse, even if I know that Lemmy intentionally already made the decision to not really care about json-ld.

"I'm sure it will work out" is just not the right attitude to me when it comes to forward compatibility and interoperability with other entities.

Sure, I can open an issue, but it's just not very interesting, especially discussing about consistency with other custom lemmy attributes, which is basically an impossible problem to fix retroactively without disgusting compat layers in the code base, likely incompatibility with other AP software, or actual LD code which is mountains of work that would likely be rejected. Meanwhile in this new code, this was literally 6 characters that will never need to be touched again and, to me, is adequately explained in one line of comment.

I'm not currently willing to to make a decision I personally consider hostile to the fediverse, so it will probably take some time for me to work on this again. If anyone else wants to pick it up meanwhile feel free.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

piefed already has tags (flairs in their UI) federated and to not break things I would strongly consider just leaving lemmy:CommunityTag and lemmy:tagsForPosts as is.

Copy link
Member

@Nutomic Nutomic Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Youre right Piefed actually federates these fields with lemmy: prefix, not even using piefed: as prefix. Very surprising to me and I still dont like how inconsistent it is with other fields, but we might as well follow their example.

curl -H 'Accept: application/activity+json' https://piefed.social/c/piefed_meta | jq
curl -H 'Accept: application/activity+json' https://piefed.social/post/900786 | jq

Comment on lines +26 to +32
pub struct LemmyCommunityTag {
#[serde(rename = "type")]
kind: LemmyCommunityTagType,
pub id: Url,
// the name of the tag can be updated by the moderators of the community. The ID is fixed.
pub display_name: String,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

Copy link
Collaborator Author

@phiresky phiresky Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "display_name" is an artifact from discussions in the previous PR about having either an additional id_slug property (which can not be changed and has no special characters or spaces) or an additional name property (which may be displayed or used in URLs without spaces or special characters, as opposed to the display name which can have special characters)

I can rename it back to name but idk what @dessalines thinks.

Copy link
Member

@dessalines dessalines Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion was here, but my recommendation was that just like person, and community, that tags have both names (required, ascii_limited) and optional display_names. Because that was rejected, and you made display names not null, we're getting confused here again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is AP code though, DB considerations shouldn't apply. We also map name to display_name for both users and communities.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename it to name here and keep the DB code and logic as is when I next work on this, unless someone disagrees (and has a better suggestion)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is AP code though, DB considerations shouldn't apply. We also map name to display_name for both users and communities.

That's true, kinda weird that with activitypub name is the optional one, and preferred_username is the required one, so we have to switch them around.

I still think the optional display_name should be added for tags at some point, but it doesn't have to be in this PR.

.await?;
}
}
}
Copy link
Member

@Nutomic Nutomic Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This shouldnt need any extra logic for displayname changes, just insert all tags which are not deleted.
  • Its inefficient to make db calls in a loop, instead pass a vec of forms with on conflict
  • Should be inside a transaction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if transaction is necessary, since you can do multiple upserts in a single statement .values(forms), but ya upsert should be done.

let creator = Person::read(&mut context.pool(), creator_id).await?;
let community_id = self.community_id;
let community = Community::read(&mut context.pool(), community_id).await?;
// not sure why we need the local instance id here...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5515

.collect(),
)
.await
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems pointless, you can convert to tag inside community_override_all_from_apub

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not because the DB schema crate does not depend on the API crate

@@ -95,6 +160,9 @@ test("Update post tags", async () => {
});
expect(postRes.post_view.post.id).toBeDefined();

// Wait post federated
await waitForPost(beta, postRes.post_view.post);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment both community and post are created on alpha so it doesnt fully test the federation. Change it to create a post from beta for better test coverage.

@phiresky
Copy link
Collaborator Author

phiresky commented Apr 26, 2025

I'm getting some test failures in the private_community tests, e.g. here:

not_found thrown in:

./api_tests/src/private_community.spec.ts:162:3:

await waitUntil(
() => resolvePost(user, post0.post_view.post),
p => p?.post?.post.id != undefined,
);

I'm not sure why these are happening because those parts of the code should not be affected. Does anyone know whether something has changed wrt private communities that is maybe unrelated to this PR?

@dessalines
Copy link
Member

Was probably just a CI blip (we still have a few api tests that randomly get errors). I'll rerun now to make sure.

@phiresky
Copy link
Collaborator Author

phiresky commented May 5, 2025 via email

@dessalines
Copy link
Member

Its alright, no rush. We have a ton of other work to get caught on.

@rimu
Copy link

rimu commented May 7, 2025

Consider adding properties to the lemmy:CommunityTag structure for text color and background color? It makes them more visually interesting, as used here - https://piefed.social/c/fediverse

e.g.

      {
        "type": "lemmy:CommunityTag",
        "id": "http://lemmy-alpha:8541/c/6PbAFiVY80/tag/eomfsnm9om",
        "display_name": "eoMFSNm9Om",
        "text_color": '#000000',
        "background_color": '#dddddd'
      },

@rimu
Copy link

rimu commented May 7, 2025

Also possibly you might like to add a weight or order attribute, which determines the order tags are shown in when they are listed in the community sidebar. Not all tags are equal.

@dessalines
Copy link
Member

Not a bad idea, but hard-coding colors like that wouldn't be a good idea, esp for app and platforms that use their own color schemes, and adapt to dark modes.

Priority of tags is definitely a good idea.

I can't find any good field that has priority... so we have to come up with something on our own. Doesn't look like mastodon has any of these either.

@phiresky
Copy link
Collaborator Author

phiresky commented May 10, 2025

There's three pending changes for this PR:

  • using batch delete + upsert for the update of the community available tag list instead of loops
  • making the tests use three instances (separate community, post creator, tag editor)
  • renaming display_name back to name

otherwise it is done. Due to my lack of motivation after the unresolved discussions about the naming of JSON-LD AP extensions I'm likely not going to work on this PR in the near future. Anyone else should feel free to pick this up.

Any extensions (adding mulitple names for a tag, adding hierarchical tags, adding tag colors, priorities, etc) should likely be discussed and implemented separately after the base is finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants