#1423 MAM: storage of "empty" messages

Reporter ge0rg
Owner Nobody
Created
Updated
Stars ★ (1)
Tags
  • Status-New
  • Priority-Medium
  • Type-Defect
  1. ge0rg on

    The built-in mod_mam of prosody 0.11 has a sophisticated logic, which message elements to store: 1. remove "useless" elements (as defined by `dont_archive_namespaces`) 2. if the message isn't empty, store it. This is good for reducing the memory and network footprint of MAM, but it results in the following messages ending up in MAM (real examples, JIDs pseudonymized): ``` <message type='chat' xml:lang='en' from='user@jabberfr.org/poezio' id='b867fc06497a494e945a84b73d565c0a' to='georg@yax.im'> <origin-id id='b867fc06497a494e945a84b73d565c0a' xmlns='urn:xmpp:sid:0'/> </message> <message type='chat' xml:lang='en' to='georg@yax.im/yaxim' id='7c0c163405834bfdb605eaf09333caa3' from='poezio@muc.poez.io/occupant'> <origin-id id='7c0c163405834bfdb605eaf09333caa3' xmlns='urn:xmpp:sid:0'/> <x xmlns='http://jabber.org/protocol/muc#user'/> </message> <message type='chat' xml:lang='en' to='georg@yax.im/poezio' id='a0acfeda-d0ed-4f65-a091-b8c5bbb6a31e-21AA7' from='muc@chat.yax.im/occupant'> <x xmlns='http://jabber.org/protocol/muc#user'/> </message> <message type='chat' xmlns='jabber:client' to='georg@yax.im/yaxim' id='Kr510-58' from='contact@yax.im/jitsi-31plbtd'> <thread>QWlT06</thread> </message> ``` Now obviously, stripping origin-id and thread ID from archived messages would be counter-productive. Therefore I suggest the following approach instead: 1. make a copy of the message 2. strip "useless" elements from the copy 3. if the copy isn't empty, store the original Or, if you want to benefit from the storage reduction of the original solution, introduce a second variable `strip_archive_namespaces`: 1. strip elements that match `strip_archive_namespaces` (e.g. chat states) 2. make a copy of the message 2. strip `dont_archive_namespaces` elements from the copy (e.g. thread, origin-id, muc-x) 3. if the copy isn't empty, store the "original" from after step 1 There is still a small issue with this overall approach, regarding mediated MUC invitations <https://xmpp.org/extensions/xep-0045.html#invite-mediated>: If you strip the <x/> element, the invitaiton contained within will be gone as well. However, hopefully, all mediated invitation implementations will also provide a legacy body so the message would end up in MAM nevertheless. At least we can only hope that.

  2. ge0rg on

    Update: these messages make up roughly 12% of the data in my server's MAM: sqlite> select count(value) from prosodyarchive where host="yax.im" and store="archive2"; 201847 sqlite> select count(value) from prosodyarchive where host="yax.im" and store="archive2" and not value like "%<body%" and not value like "%<received%" and length(value) <300; 24655

New comment

Not published. Used for spam prevention and optional update notifications.