What steps will reproduce the problem?
1. Construct a large (~2 year) long MAM archive
2. Install and turn on Dino, which syncs the entire MAM archive https://github.com/dino/dino/issues/521
3. Watch top and general connection quality.
My MAM configuration is:
```
archive_expires_after = "3y"
default_archive_policy = "roster"
archive_store = "archive2" -- the old data, todo: migrate to 'archive'
```
What is the expected output?
Prosody should remain stable for all my users.
What do you see instead?
My connections are disrupted (my peers report getting "recipient-unavailable" from their servers), my users bounce in and out of MUCs, and prosody has 70% - 98% CPU usage.
What version of the product are you using? On what operating system?
ArchLinux:
# pacman -Qi prosody
Name : prosody
Version : 1:0.11.5-1
Description : Lightweight and extensible Jabber/XMPP server written in Lua
Architecture : x86_64
URL : https://prosody.im/
Licenses : MIT
Groups : None
Provides : prosody
Depends On : lua52 lua52-socket lua52-expat lua52-filesystem libidn openssl
Optional Deps : lua52-sec: TLS encryption support [installed]
lua52-dbi: SQL storage support
Required By : None
Optional For : None
Conflicts With : prosody
Replaces : None
Installed Size : 1463.54 KiB
Packager : Sergej Pupykin <pupykin.s+arch@gmail.com>
Build Date : Wed 25 Mar 2020 03:51:44 PM CET
Install Date : Sun 29 Mar 2020 12:00:39 AM CET
Install Reason : Explicitly installed
Install Script : Yes
Validated By : Signature
Please provide any additional information below.
This bug is mostly on Dino's side; they're going to work on reducing the load soon. But it shouldn't be possible for one user with a large archive to disrupt an entire server. You could imagine someone making a malicious bot that fills up their MAM archive and then re-requests it all, from several clients, all at once.
Here's the log from a single isolated client on my server, which is enough to spike the CPU:
Apr 06 16:43:05 comms2.kousu.ca prosody[50073]: kousu.ca:mam: Archive query, id nil with anyone from the dawn of time until now)
# sleeps here for ~2s
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: kousu.ca:mam: Archive query nil completed
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <iq to='nick@kousu.ca/dino.106396' id='ec2fe0fb-b63d-431d-97cf-5cda2023cdf8' type='result'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Received[c2s]: <a xmlns='urn:xmpp:sm:3' h='751'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: #queue = 21
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Queuing <r> (in a moment) from handle_a - #queue=21
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending <r> (inside timer, before send) from handle_a - #queue=22
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <r xmlns='urn:xmpp:sm:3'>
Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending <r> (inside timer, after send) from handle_a - #queue=22
Apr 06 16:43:10 comms2.kousu.ca prosody[50073]: c2s564227361440: Handled 103 incoming stanzas
Apr 06 16:43:10 comms2.kousu.ca prosody[50073]: c2s564227361440: Received[c2s]: <iq type='set' id='a56482f6-8137-4b3c-9082-79992e6d3121'>
# loops here:
Apr 06 16:43:10 comms2.kousu.ca prosody[50073]: kousu.ca:mam: Archive query, id nil with anyone from the dawn of time until now)
# sleeps here for ~2s
Apr 06 16:43:14 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:14 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
Apr 06 16:43:14 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'>
[...]
This seems to be a simple paged download as designed, since the log rate is constant and the iq at the end looks like this:
<iq to='nick@kousu.ca/dino.106396' id='ec2fe0fb-b63d-431d-97cf-5cda2023cdf8' type='result'>
<fin xmlns='urn:xmpp:mam:2'>
<set xmlns='http://jabber.org/protocol/rsm'>
<first>
h4xvoaXgQkywVLGE
</first>
<count>
136716
</count>
<last>
D7Uu42gOgs9QfxsO
</last>
</set>
</fin>
</iq>
I see https://prosody.im/doc/modules/mod_mam warns "Setting large number is not recommended, as Prosody will be blocked while processing the request and will not be able to do anything else.", but I have it set to the default page size of 20 and I'm still having problems, because my client decided to download the entire archive.
Could MAM downloading be simply rate-limited?
Menel
on
The new c2s limits reslove this for me..(I think its them)
But of course that's not a real fix.
Stephen Paul Weber
on
I had similar issues until switching to a sql storage backend
kousu
on
Several months ago I also switched to
```
storage = "sql"
```
I used https://prosody.im/doc/migrator to move everything from "archive2" to "sql" and now everything is much lower-resource. Phew! Prosody was getting kind of unusable.
What steps will reproduce the problem? 1. Construct a large (~2 year) long MAM archive 2. Install and turn on Dino, which syncs the entire MAM archive https://github.com/dino/dino/issues/521 3. Watch top and general connection quality. My MAM configuration is: ``` archive_expires_after = "3y" default_archive_policy = "roster" archive_store = "archive2" -- the old data, todo: migrate to 'archive' ``` What is the expected output? Prosody should remain stable for all my users. What do you see instead? My connections are disrupted (my peers report getting "recipient-unavailable" from their servers), my users bounce in and out of MUCs, and prosody has 70% - 98% CPU usage. What version of the product are you using? On what operating system? ArchLinux: # pacman -Qi prosody Name : prosody Version : 1:0.11.5-1 Description : Lightweight and extensible Jabber/XMPP server written in Lua Architecture : x86_64 URL : https://prosody.im/ Licenses : MIT Groups : None Provides : prosody Depends On : lua52 lua52-socket lua52-expat lua52-filesystem libidn openssl Optional Deps : lua52-sec: TLS encryption support [installed] lua52-dbi: SQL storage support Required By : None Optional For : None Conflicts With : prosody Replaces : None Installed Size : 1463.54 KiB Packager : Sergej Pupykin <pupykin.s+arch@gmail.com> Build Date : Wed 25 Mar 2020 03:51:44 PM CET Install Date : Sun 29 Mar 2020 12:00:39 AM CET Install Reason : Explicitly installed Install Script : Yes Validated By : Signature Please provide any additional information below. This bug is mostly on Dino's side; they're going to work on reducing the load soon. But it shouldn't be possible for one user with a large archive to disrupt an entire server. You could imagine someone making a malicious bot that fills up their MAM archive and then re-requests it all, from several clients, all at once. Here's the log from a single isolated client on my server, which is enough to spike the CPU: Apr 06 16:43:05 comms2.kousu.ca prosody[50073]: kousu.ca:mam: Archive query, id nil with anyone from the dawn of time until now) # sleeps here for ~2s Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: kousu.ca:mam: Archive query nil completed Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <iq to='nick@kousu.ca/dino.106396' id='ec2fe0fb-b63d-431d-97cf-5cda2023cdf8' type='result'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Received[c2s]: <a xmlns='urn:xmpp:sm:3' h='751'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: #queue = 21 Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Queuing <r> (in a moment) from handle_a - #queue=21 Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending <r> (inside timer, before send) from handle_a - #queue=22 Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <r xmlns='urn:xmpp:sm:3'> Apr 06 16:43:09 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending <r> (inside timer, after send) from handle_a - #queue=22 Apr 06 16:43:10 comms2.kousu.ca prosody[50073]: c2s564227361440: Handled 103 incoming stanzas Apr 06 16:43:10 comms2.kousu.ca prosody[50073]: c2s564227361440: Received[c2s]: <iq type='set' id='a56482f6-8137-4b3c-9082-79992e6d3121'> # loops here: Apr 06 16:43:10 comms2.kousu.ca prosody[50073]: kousu.ca:mam: Archive query, id nil with anyone from the dawn of time until now) # sleeps here for ~2s Apr 06 16:43:14 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:14 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> Apr 06 16:43:14 comms2.kousu.ca prosody[50073]: c2s564227361440: Sending[c2s]: <message to='nick@kousu.ca/dino.106396'> [...] This seems to be a simple paged download as designed, since the log rate is constant and the iq at the end looks like this: <iq to='nick@kousu.ca/dino.106396' id='ec2fe0fb-b63d-431d-97cf-5cda2023cdf8' type='result'> <fin xmlns='urn:xmpp:mam:2'> <set xmlns='http://jabber.org/protocol/rsm'> <first> h4xvoaXgQkywVLGE </first> <count> 136716 </count> <last> D7Uu42gOgs9QfxsO </last> </set> </fin> </iq> I see https://prosody.im/doc/modules/mod_mam warns "Setting large number is not recommended, as Prosody will be blocked while processing the request and will not be able to do anything else.", but I have it set to the default page size of 20 and I'm still having problems, because my client decided to download the entire archive. Could MAM downloading be simply rate-limited?
The new c2s limits reslove this for me..(I think its them) But of course that's not a real fix.
I had similar issues until switching to a sql storage backend
Several months ago I also switched to ``` storage = "sql" ``` I used https://prosody.im/doc/migrator to move everything from "archive2" to "sql" and now everything is much lower-resource. Phew! Prosody was getting kind of unusable.