Skip to content

Adding support for private MUC messages to MAM #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Dec 18, 2020

Conversation

guusdk
Copy link
Member

@guusdk guusdk commented Dec 11, 2020

No description provided.

@guusdk guusdk requested a review from Fishbowler December 11, 2020 13:35
@guusdk guusdk force-pushed the tmp_refactoring-mam-query branch from 0c575d1 to 1154b97 Compare December 11, 2020 14:46
…for MUC archive

The archived MUC messages are persisted to the database more than once: Openfire stores them in ofMucConversationLog, the Monitoring plugin in ofMessageArchive.

This commit introduces a configuration option to switch between the two: conversation.database.use-openfire-tables

The default for this new option is to use the database table provided by the Monitoring plugin, which is a change from the behavior prior to this commit.
…change

The property 'conversation.metadataArchiving' controls wether the plugin is considered 'enabled'. A restart of the plugin shouldn't be required for a change to the value of that property to become effective.
In MUCs, Private Messages (PMs) can be exchanged between two participants. When a participant retrieves its MUC messages, the PMs that they were involved in should also be returned.

The pre-existing implementation stores PMs as regular one-to-one messages. By doing this, the context of them being exchanged in a MUC is lost.

This commit additionally stores the PMs in context of a MUC (an additional database column is created for this).

To be able to query for PMs, it is needed to provide, aside of the archive that's being queried, the person for which messages are queried. Note that this is specific to queries of MUC archives, as for Personal archives, these values are equal (the person owning the archive is the one querying it).
This switches to the replacement API, which makes available a new field.

This does not have functional changes, as the existing code doesn't use the new data.
…able

Historically, Private Messages sent through MUC rooms are stored as if they were one-on-one messages (in the personal archives of users). With igniterealtime#133 these PMs are now also stored in the MUC archive.

Existing functionality might depend on these messages to be stored in the personal archives, or might break when these messages start showing up in the MUC archives.

The changes in igniterealtime#133 retain the old functionality, but also adds the messages to the MUC archives (there's duplication now).

Configuration options should be made available to control this behavior.

Two new properties to be introduced, both take boolean values, both default to true:

- conversation.roomArchiving.PMinPersonalArchive
- conversation.roomArchiving.PMinRoomArchive
When there's an issue interacting with the database while retrieving messages, those issues are currently ignored. This leads to results that are incomplete.

Fail-fast behavior is desired here: when there's a database error, thrown an exception.

It's probably best to make this configurable, to allow administrators to fall back to ignoring errors, for those instances where, for a reason that's currently beyond me, have significant amounts of database issues.

The property that can be used to control this behavior is archive.ignore-retrieval-exceptions. It's value is a boolean. false, which will be the new default, will cause XMPP errors to be generated, while true restores the old behavior of ignoring errors.
@guusdk guusdk force-pushed the tmp_refactoring-mam-query branch from 05ed02f to 385effc Compare December 14, 2020 09:40
@guusdk
Copy link
Member Author

guusdk commented Dec 14, 2020

I've now rebased this to take in @akrherz's work with moving to Github Actions (see #149)

@guusdk
Copy link
Member Author

guusdk commented Dec 14, 2020

This branch is currently largely finished, with the notable exception of Lucene. That needs to be modified to work with private messages.

@Fishbowler
Copy link
Member

Fishbowler commented Dec 15, 2020

Issue: DMs are recorded twice in the database

Setup:

  • Using Openfire Docker Compose in clustered mode
  • 3 users connected (user 1 to node 1, user 2 & 3 to node 2)
  • Everyone joins MUC2
  • Everyone sends a message each to the MUC
  • User1 sends a DM to User2
  • User2 sends a DM to User3
  • User1 sends one more message to the open MUC

When looking at the ofMessageArchive table, the MUC messages are stored once, but the DMs are stored twice with slightly different information in some columns, but the same stanzas.

image

The effect of this is that when a user queries their personal archive, both copies are returned.

@Fishbowler
Copy link
Member

Fishbowler commented Dec 15, 2020

Issue: Using RSM "after" gets no results and only a count.

Setup: As above

Steps:

  • Send the following stanza, where the after is the stanza-id of the 3rd MUC message. Depending on the user, this should include either 1 or 2 DMs plus 1 MUC message.
<iq to="[email protected]" id="dantest1" type="set">
    <query xmlns='urn:xmpp:mam:2' queryid='00003'>
        <x xmlns='jabber:x:data' type='submit'>
            <field var='FORM_TYPE'>
                <value>urn:xmpp:mam:2</value>
            </field>
        </x>
        <set xmlns='http://jabber.org/protocol/rsm'>
            <max>100</max>
            <after>4742b839-11ba-4d1c-af90-30a203ac7e41</after>
        </set>
    </query>
</iq>
  • See the following returned:
<iq type="result" id="dantest1" from="[email protected]" to="[email protected]/DaCa000346.local">
    <fin xmlns="urn:xmpp:mam:2" queryid="00003" complete="true">
        <set xmlns="http://jabber.org/protocol/rsm">
            <count>5</count>
        </set>
    </fin>
</iq>

I assume that the reason this is 5 and not 3 is because of the bug above. Nope. It's 5 when I look for messages after the first message too. 🤔
Modifying the Max value (or omitting it) in the query does not modify the result.

Relevant OF log lines:

2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.impl.MucMamPersistenceManager - Finding messages in archive '[email protected]' for user '[email protected]' with start date 'null', end date '2020-12-15T14:42:52.774+0000' with 'null', query: 'null' and resultset 'XmppResultSet{after=4742b839-11ba-4d1c-af90-30a203ac7e41, before=null, index=null, max=100, first=null, firstIndex=null, last=null, count=null, complete=false}', useStableId 'true'.
2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.impl.MucMamPersistenceManager - Request for message archive of room '[email protected]' did not specify a start date. Using EPOCH.
2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.impl.MucMamPersistenceManager - Looking for ID of the message with stable/unique stanza ID 4742b839-11ba-4d1c-af90-30a203ac7e41
2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.impl.MucMamPersistenceManager - Found stable/unique stanza ID 4742b839-11ba-4d1c-af90-30a203ac7e41 in message with ID 53.
2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.impl.MucMamPersistenceManager - Request for message archive of room '[email protected]' resulted in the following query data: PaginatedMessageQuery{startDate=Thu Jan 01 00:00:00 UTC 1970, endDate=Tue Dec 15 14:42:52 UTC 2020, [email protected], [email protected], with='null'}
2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.impl.MucMamPersistenceManager - Request for message archive of room '[email protected]' found a total of 5 applicable messages. Of these, 0 were actually retrieved from the database.
2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.xep0313.IQQueryHandler - MAM: found: 0 items
2020.12.15 14:42:52 DEBUG [message-archive-handler-12]: com.reucon.openfire.plugin.archive.xep0313.IQQueryHandler - Retrieved 0 messages from archive.

@Fishbowler
Copy link
Member

Fishbowler commented Dec 15, 2020

Issue: The storage of DMs is inconsistent

Setup: as above

Steps: None

Looking at the data stored in ofMessageArchive, there's an inconsistency in the data written to the archive.
Looking at the 2 DMs sent, the stored stanzas are different in that the 2nd message has the "to" attribute translated from roomJID to realJID.
All 3 users were using identical clients, so the only difference I can see is that the first DM (the one I think is correct) is the only one to cross the cluster.

DM from User 1 to User 2

Sent from user1's client:

<message to='[email protected]/user2' id='66B4d-149' type='chat'>
  <thread>W4tZyF</thread>
  <x xmlns='http://jabber.org/protocol/muc#user'></x>
  <body>Whisper 1 to 2</body>
  <x xmlns="jabber:x:event">
    <offline/>
    <composing/>
  </x>
  <active xmlns='http://jabber.org/protocol/chatstates'/>
</message>

Stored in the archive:

<message xmlns="jabber:client" to="[email protected]/user2" id="66B4d-149" type="chat" from="[email protected]/user1">
    <thread>W4tZyF</thread>
    <x xmlns="http://jabber.org/protocol/muc#user"></x>
    <body>Whisper 1 to 2</body>
    <x xmlns="jabber:x:event">
        <offline/>
        <composing/>
    </x>
    <active xmlns="http://jabber.org/protocol/chatstates"></active>
    <stanza-id xmlns="urn:xmpp:sid:0" id="48e68c9a-8f29-4fdb-afa5-3693512f6398" by="[email protected]"></stanza-id>
    <addresses xmlns="http://jabber.org/protocol/address">
        <address type="ofrom" jid="[email protected]"/>
    </addresses>
</message>

DM from User 2 to User 3

Sent from user2's client:

<message to='[email protected]/user3' id='963bH-151' type='chat'>
  <thread>inJ2hF</thread>
  <x xmlns='http://jabber.org/protocol/muc#user'></x>
  <body>Whisper 2 to 3</body>
  <x xmlns="jabber:x:event">
    <offline/>
    <composing/>
  </x>
  <active xmlns='http://jabber.org/protocol/chatstates'/>
</message>

Stored in the archive:

<message xmlns="jabber:client" to="[email protected]/DaCa000346.local" id="963bH-151" type="chat" from="[email protected]/user2">
    <thread>inJ2hF</thread>
    <x xmlns="http://jabber.org/protocol/muc#user"></x>
    <body>Whisper 2 to 3</body>
    <x xmlns="jabber:x:event">
        <offline/>
        <composing/>
    </x>
    <active xmlns="http://jabber.org/protocol/chatstates"></active>
    <stanza-id xmlns="urn:xmpp:sid:0" id="f1216887-bd44-46c4-9380-44d9f4dc84b8" by="[email protected]"></stanza-id>
    <addresses xmlns="http://jabber.org/protocol/address">
        <address type="ofrom" jid="[email protected]"/>
    </addresses>
</message>

…in MUC archives

This adds text-based search for archived messages that were exchanged as private messages in a MUC.

This commit restructures the Lucene index that's used for this purpose, meaning that the index must be rebuild before usage!
With the changes for igniterealtime#146, the database table used for MUC queries is configurable. This commit makes sure that the correct table is used for lookups of some metadata (SSID, first and last date).
@guusdk
Copy link
Member Author

guusdk commented Dec 16, 2020

Issue: DMs are recorded twice in the database

This is intended behavior: Private Messages were initially stored in the personal archives (as if they were regular one-on-one messages). In this branch, I added them to the MUC archives as well, but choose to retain existing functionality (in the hope of not breaking something). In #147 this was made configurable - having both enabled by default. This causes the duplication (but retains backwards compatibility. Maybe we should choose to have only one enabled by default?

Issue: Using RSM "after" gets no results and only a count.

This was caused by the lookup for a database ID based on the SSID to be based on an incorrect table. This has been fixed in fc2a0d0.

The count being fixed is, I believe, desired behavior. You're paging to a result set, but you're not limiting the results in the (complete) resultset (which you'd do if you added a filter, like a 'query' or 'with' field). XEP-0059 RSM specifies: "The responding entity SHOULD also include the number of items in the full result set (which MAY be approximate) encapsulated in a element." That's what I think is happening - the complete result set is 5 messages large, which is still the case even if you request only the last few elements from the result set.

Issue: The storage of DMs is inconsistent

On a single server, I'm seeing that the stanza is saved using the real JID in the to attribute. By looking at the code, I can't immediately explain why the 'to' address should be inconsistent. From what I have found so far is that it should consistently log a real JID (which might be one of multiple, in case the recipient joined the MUC using more than one client using the same nickname). As I've explained above, private messages are stored twice: once in the 'personal archive' and once in the 'MUC archive'. Do both share the same inconsistency?

I agree that this should be consistent, but I'm not sure what variant to prefer. What makes you prefer one over the other?

Of more importance (to the implementation that retrieves messages like this from the database) is the addressing in the other database columns. Are those consistent across the cluster?

@Fishbowler
Copy link
Member

Of more importance ... is the addressing in the other database columns. Are those consistent across the cluster?

Maybe not...?
image

The first row here has the users the wrong way round, right?

@Fishbowler
Copy link
Member

Tested conversation.database.use-openfire-tables - this exhibits the same behaviour as before. Most messages are delivered for MAM requests for MUCs and personal requests, but PMs aren't retrieved.

Tested conversation.roomArchiving.PMinPersonalArchive and conversation.roomArchiving.PMinRoomArchive. With both enabled, it's possible for MAM queries to return both copies. With both disabled, PMs aren't ever stored to the Openfire database, and aren't returned for regular or full-text queries.

Testing "with" values return appropriate results, and return the same MAM results for a MUC regardless of whether a real JID or room JID is used.

@guusdk
Copy link
Member Author

guusdk commented Dec 17, 2020

The inconsistent addressing is undesirable, as it might leak the 'real' JID of an occupant to the sender (that can retrieve the modified message from the archive). This issue is to be addressed in Openfire. See OF-2163 / igniterealtime/Openfire#1770

@guusdk guusdk marked this pull request as ready for review December 17, 2020 11:27
With igniterealtime#133, private messages sent in a MUC are now stored as part of the MUC archive. The old behavior, storing them in the personal archive, is still supported, but has been made configurable.

This commit toggles the default value for that setting (`conversation.roomArchiving.PMinPersonalArchive`) from 'true' to 'false'.
…ional value

The 'is PM for JID' value is optional (not all messages are private messages), so it can be null. Null values cannot be parsed as JIDs.
The Openfire admin console allows administrators to look at a transcript of a conversation. This already includes private messages. Messages that are private messages should be annotated as such, otherwise, things get confusing.
No functional changes, other than that the PDFs that are generated might look slightly different.
@Fishbowler
Copy link
Member

Performed further testing on semi-anonymous MUCs, the effect of upgrades on clients, archiving completely disabled, and another pass looking for any leaked personal info. Nothing untoward at all. With all previous issues resolved, this is ready to merge 👍

@Fishbowler Fishbowler merged commit be1a9d2 into igniterealtime:master Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants