The built-in mod_mam of prosody 0.11 has a sophisticated logic, which message elements to store:
1. remove "useless" elements (as defined by `dont_archive_namespaces`)
2. if the message isn't empty, store it.
This is good for reducing the memory and network footprint of MAM, but it results in the following messages ending up in MAM (real examples, JIDs pseudonymized):
```
<message type='chat' xml:lang='en' from='user@jabberfr.org/poezio' id='b867fc06497a494e945a84b73d565c0a' to='georg@yax.im'>
<origin-id id='b867fc06497a494e945a84b73d565c0a' xmlns='urn:xmpp:sid:0'/>
</message>
<message type='chat' xml:lang='en' to='georg@yax.im/yaxim' id='7c0c163405834bfdb605eaf09333caa3' from='poezio@muc.poez.io/occupant'>
<origin-id id='7c0c163405834bfdb605eaf09333caa3' xmlns='urn:xmpp:sid:0'/>
<x xmlns='http://jabber.org/protocol/muc#user'/>
</message>
<message type='chat' xml:lang='en' to='georg@yax.im/poezio' id='a0acfeda-d0ed-4f65-a091-b8c5bbb6a31e-21AA7' from='muc@chat.yax.im/occupant'>
<x xmlns='http://jabber.org/protocol/muc#user'/>
</message>
<message type='chat' xmlns='jabber:client' to='georg@yax.im/yaxim' id='Kr510-58' from='contact@yax.im/jitsi-31plbtd'>
<thread>QWlT06</thread>
</message>
```
Now obviously, stripping origin-id and thread ID from archived messages would be counter-productive. Therefore I suggest the following approach instead:
1. make a copy of the message
2. strip "useless" elements from the copy
3. if the copy isn't empty, store the original
Or, if you want to benefit from the storage reduction of the original solution, introduce a second variable `strip_archive_namespaces`:
1. strip elements that match `strip_archive_namespaces` (e.g. chat states)
2. make a copy of the message
2. strip `dont_archive_namespaces` elements from the copy (e.g. thread, origin-id, muc-x)
3. if the copy isn't empty, store the "original" from after step 1
There is still a small issue with this overall approach, regarding mediated MUC invitations <https://xmpp.org/extensions/xep-0045.html#invite-mediated>:
If you strip the <x/> element, the invitaiton contained within will be gone as well. However, hopefully, all mediated invitation implementations will also provide a legacy body so the message would end up in MAM nevertheless. At least we can only hope that.
ge0rg
on
Update: these messages make up roughly 12% of the data in my server's MAM:
sqlite> select count(value) from prosodyarchive where host="yax.im" and store="archive2";
201847
sqlite> select count(value) from prosodyarchive where host="yax.im" and store="archive2" and not value like "%<body%" and not value like "%<received%" and length(value) <300;
24655
The built-in mod_mam of prosody 0.11 has a sophisticated logic, which message elements to store: 1. remove "useless" elements (as defined by `dont_archive_namespaces`) 2. if the message isn't empty, store it. This is good for reducing the memory and network footprint of MAM, but it results in the following messages ending up in MAM (real examples, JIDs pseudonymized): ``` <message type='chat' xml:lang='en' from='user@jabberfr.org/poezio' id='b867fc06497a494e945a84b73d565c0a' to='georg@yax.im'> <origin-id id='b867fc06497a494e945a84b73d565c0a' xmlns='urn:xmpp:sid:0'/> </message> <message type='chat' xml:lang='en' to='georg@yax.im/yaxim' id='7c0c163405834bfdb605eaf09333caa3' from='poezio@muc.poez.io/occupant'> <origin-id id='7c0c163405834bfdb605eaf09333caa3' xmlns='urn:xmpp:sid:0'/> <x xmlns='http://jabber.org/protocol/muc#user'/> </message> <message type='chat' xml:lang='en' to='georg@yax.im/poezio' id='a0acfeda-d0ed-4f65-a091-b8c5bbb6a31e-21AA7' from='muc@chat.yax.im/occupant'> <x xmlns='http://jabber.org/protocol/muc#user'/> </message> <message type='chat' xmlns='jabber:client' to='georg@yax.im/yaxim' id='Kr510-58' from='contact@yax.im/jitsi-31plbtd'> <thread>QWlT06</thread> </message> ``` Now obviously, stripping origin-id and thread ID from archived messages would be counter-productive. Therefore I suggest the following approach instead: 1. make a copy of the message 2. strip "useless" elements from the copy 3. if the copy isn't empty, store the original Or, if you want to benefit from the storage reduction of the original solution, introduce a second variable `strip_archive_namespaces`: 1. strip elements that match `strip_archive_namespaces` (e.g. chat states) 2. make a copy of the message 2. strip `dont_archive_namespaces` elements from the copy (e.g. thread, origin-id, muc-x) 3. if the copy isn't empty, store the "original" from after step 1 There is still a small issue with this overall approach, regarding mediated MUC invitations <https://xmpp.org/extensions/xep-0045.html#invite-mediated>: If you strip the <x/> element, the invitaiton contained within will be gone as well. However, hopefully, all mediated invitation implementations will also provide a legacy body so the message would end up in MAM nevertheless. At least we can only hope that.
Update: these messages make up roughly 12% of the data in my server's MAM: sqlite> select count(value) from prosodyarchive where host="yax.im" and store="archive2"; 201847 sqlite> select count(value) from prosodyarchive where host="yax.im" and store="archive2" and not value like "%<body%" and not value like "%<received%" and length(value) <300; 24655