#1711 Memory leak since upgrading to 0.11.12-1~focal1
Reporter
Bert Van de Poel
Owner
Zash
Created
Updated
Stars
★★ (2)
Tags
Type-Defect
Status-Fixed
Priority-Medium
Milestone-0.11
Bert Van de Poel
on
Yesterday we noticed our XMPP VM had had an OOM event that had caused prosody to be killed (and then get restarted). After further investigation, we noticed in our munin graphs that after having upgraded to 0.11.12-1~focal1, memory use has been on the rise for 4 days until we ran out. The same seems to be happening again now. We therefore expect that a memory leak was introduced in the release we upgraded to.
I had a quick look through the logs, but nothing was immediately drawing my attention. I know that's not really helping you guys find this issue, but feel free to tell me what to grep for in the debug logs (they're enormous).
This is on a 2GB RAM VM with Ubuntu 20.04 and prosody 0.11.12-1~focal1 from the repo supplied by prosody. This VM is used exclusively for XMPP related stuff, so we're quite sure this is not caused by other software.
Zash
on
Hi, thanks for the report!
Memory leaks tend to not show up in in logs.
We've noticed one potential cause, but to be sure, could you please tell which modules are enabled, especially whether SQL storage, mod_websocket or any 3rd party modules are enabled.
Another way to identify potential sources of leaks is to use https://github.com/zeen/lua-dump to take a memory snapshot. Preferably at least two, some time apart.
Changes
tags Status-NeedInfo Milestone-0.11
owner Zash
Bert Van de Poel
on
To make sure I'm not missing anything, you can find our module configuration below:
-- This is the list of modules Prosody will load on startup.
-- It looks for mod_modulename.lua in the plugins folder, so make sure that exists too.
-- Documentation on modules can be found at: https://prosody.im/doc/modules
modules_enabled = {
-- Generally required
"roster"; -- Allow users to have a roster. Recommended ;)
"saslauth"; -- Authentication for clients and servers. Recommended if you want to log in.
"tls"; -- Add support for secure TLS on c2s/s2s connections
"dialback"; -- s2s dialback support
"disco"; -- Service discovery
-- Not essential, but recommended
"carbons"; -- Keep multiple clients in sync
"pep"; -- Enables users to publish their avatar, mood, activity, playing music and more
"private"; -- Private XML storage (for room bookmarks, etc.)
-- "blocklist"; -- Allow users to block communications with other users
"vcard4"; -- User profiles (stored in PEP)
"vcard_legacy"; -- Conversion between legacy vCard and PEP Avatar, vcard
"bookmarks"; -- Maps legacy bookmarks to PEP bookmarks
-- Nice to have
"version"; -- Replies to server version requests
"uptime"; -- Report how long server has been running
"time"; -- Let others know the time here on this server
"ping"; -- Replies to XMPP pings with pongs
"mam"; -- Store messages in an archive and allow users to access it
"csi_simple"; -- Simple Mobile optimizations
-- Admin interfaces
"admin_adhoc"; -- Allows administration via an XMPP client that supports ad-hoc commands
"admin_telnet"; -- Opens telnet console interface on localhost port 5582
-- HTTP modules
"bosh"; -- Enable BOSH clients, aka "Jabber over HTTP"
--"websocket"; -- XMPP over WebSockets
--"http_files"; -- Serve static files from a directory over HTTP
-- Other specific functionality
--"limits"; -- Enable bandwidth limiting for XMPP connections
--"groups"; -- Shared roster support
--"server_contact_info"; -- Publish contact information for this service
--"announce"; -- Send announcement to all online users
--"welcome"; -- Welcome users who register accounts
--"watchregistrations"; -- Alert admins of registrations
--"motd"; -- Send a message to users when they log in
--"legacyauth"; -- Legacy authentication. Only used by some old clients and bots.
--"proxy65"; -- Enables a file transfer proxy service which clients behind NAT can use
-- ULYSSIS Modules for chat commands
"auto_scrapbook";
};
-- These modules are auto-loaded, but should you want
-- to disable them then uncomment them here:
modules_disabled = {
-- "offline"; -- Store offline messages
-- "c2s"; -- Handle client connections
"s2s"; -- Handle server-to-server connections
-- "posix"; -- POSIX functionality, sends server to background, enables syslog, etc.
};
I can imagine auto_scrapbook is drawing your attention right away. This is however a module that we haven't been using the past few days, and it simply takes a message, sends it to an API, and returns the resulting URL (kind of like a pastebin service), so it would be difficult to have a leak in such a simple feature, I would expect.
While no sql module is explicitly loaded, I do know and see we're using one to use postgresql as a backend. Websocket is enabled as you can see.
Concerning the memory dumps. I presume these will contain TLS keys and other sensitive information, so I assume I will have to compare two dumps on my own. Do you have any pointers what I would be looking for?
Zash
on
> --"websocket"; -- XMPP over WebSockets
> Websocket is enabled as you can see.
That is the opposite of what I see.
> using one to use postgresql
This would do it, the affected module is used by SQL storage for archive.
> Concerning the memory dumps.
Ignore it, I could could reproduce the issue.
> I presume these will contain TLS keys
No
> and other sensitive information
Yes
---
The problem is that LuaExpat has a problem when a reference loop is created, due to how it keeps keeps track of things internally in a way that complicates things for the garbage collector, which leads to the leak.
We have fixed a different case of this before, for normal XMPP Connections, but so long ago we apparently forgot this.
The real issue in LuaExpat is actually fixed already in http://code.matthewwild.co.uk/lua-expat/rev/a8caec6c5429 but it is not yet in a release.
A fix has been developed.
Thanks for the fix, my bad on not spotting the "--" in front of the websockets module.
Great to hear it's already been fixed upstream. Do you have any idea when this may land in a new release? We currently have a cronjob restarting prosody on a daily basis because, depending on presumably activity, our memory fills up quite quickly now. We would of course like to get rid of such a tempfix as soon as we can.
Thanks for the very swift responses and the great help. It's much appreciated! :)
Zash
on
Releases are made when they are made.
You could apply the patch to /usr/lib/prosody/util/xml.lua in the mean time. Nightly builds is another option (starting tomorrow).
Bert Van de Poel
on
Thanks for pointing out we could easily patch that file. We've let it run for a few hours and our graph is nice and flat. Thanks for all the help! :D
I deployed 0.11.13, thinking this patch is in there - and the release notes said so - but apparently, it's not.
The patch linked above (https://hg.prosody.im/trunk/rev/e5e0ab93d7f4) was replaced by another commit 1 hour later (https://hg.prosody.im/trunk/rev/ebeb4d959fb3), reverting the change.
The current trunk 0.12.x does not have the patch.
The obvious question is WHY it was replaced/reverted?
And I learned the hard way not to trust the release note.
Yesterday we noticed our XMPP VM had had an OOM event that had caused prosody to be killed (and then get restarted). After further investigation, we noticed in our munin graphs that after having upgraded to 0.11.12-1~focal1, memory use has been on the rise for 4 days until we ran out. The same seems to be happening again now. We therefore expect that a memory leak was introduced in the release we upgraded to. I had a quick look through the logs, but nothing was immediately drawing my attention. I know that's not really helping you guys find this issue, but feel free to tell me what to grep for in the debug logs (they're enormous). This is on a 2GB RAM VM with Ubuntu 20.04 and prosody 0.11.12-1~focal1 from the repo supplied by prosody. This VM is used exclusively for XMPP related stuff, so we're quite sure this is not caused by other software.
Hi, thanks for the report! Memory leaks tend to not show up in in logs. We've noticed one potential cause, but to be sure, could you please tell which modules are enabled, especially whether SQL storage, mod_websocket or any 3rd party modules are enabled. Another way to identify potential sources of leaks is to use https://github.com/zeen/lua-dump to take a memory snapshot. Preferably at least two, some time apart.
ChangesTo make sure I'm not missing anything, you can find our module configuration below: -- This is the list of modules Prosody will load on startup. -- It looks for mod_modulename.lua in the plugins folder, so make sure that exists too. -- Documentation on modules can be found at: https://prosody.im/doc/modules modules_enabled = { -- Generally required "roster"; -- Allow users to have a roster. Recommended ;) "saslauth"; -- Authentication for clients and servers. Recommended if you want to log in. "tls"; -- Add support for secure TLS on c2s/s2s connections "dialback"; -- s2s dialback support "disco"; -- Service discovery -- Not essential, but recommended "carbons"; -- Keep multiple clients in sync "pep"; -- Enables users to publish their avatar, mood, activity, playing music and more "private"; -- Private XML storage (for room bookmarks, etc.) -- "blocklist"; -- Allow users to block communications with other users "vcard4"; -- User profiles (stored in PEP) "vcard_legacy"; -- Conversion between legacy vCard and PEP Avatar, vcard "bookmarks"; -- Maps legacy bookmarks to PEP bookmarks -- Nice to have "version"; -- Replies to server version requests "uptime"; -- Report how long server has been running "time"; -- Let others know the time here on this server "ping"; -- Replies to XMPP pings with pongs "mam"; -- Store messages in an archive and allow users to access it "csi_simple"; -- Simple Mobile optimizations -- Admin interfaces "admin_adhoc"; -- Allows administration via an XMPP client that supports ad-hoc commands "admin_telnet"; -- Opens telnet console interface on localhost port 5582 -- HTTP modules "bosh"; -- Enable BOSH clients, aka "Jabber over HTTP" --"websocket"; -- XMPP over WebSockets --"http_files"; -- Serve static files from a directory over HTTP -- Other specific functionality --"limits"; -- Enable bandwidth limiting for XMPP connections --"groups"; -- Shared roster support --"server_contact_info"; -- Publish contact information for this service --"announce"; -- Send announcement to all online users --"welcome"; -- Welcome users who register accounts --"watchregistrations"; -- Alert admins of registrations --"motd"; -- Send a message to users when they log in --"legacyauth"; -- Legacy authentication. Only used by some old clients and bots. --"proxy65"; -- Enables a file transfer proxy service which clients behind NAT can use -- ULYSSIS Modules for chat commands "auto_scrapbook"; }; -- These modules are auto-loaded, but should you want -- to disable them then uncomment them here: modules_disabled = { -- "offline"; -- Store offline messages -- "c2s"; -- Handle client connections "s2s"; -- Handle server-to-server connections -- "posix"; -- POSIX functionality, sends server to background, enables syslog, etc. }; I can imagine auto_scrapbook is drawing your attention right away. This is however a module that we haven't been using the past few days, and it simply takes a message, sends it to an API, and returns the resulting URL (kind of like a pastebin service), so it would be difficult to have a leak in such a simple feature, I would expect. While no sql module is explicitly loaded, I do know and see we're using one to use postgresql as a backend. Websocket is enabled as you can see. Concerning the memory dumps. I presume these will contain TLS keys and other sensitive information, so I assume I will have to compare two dumps on my own. Do you have any pointers what I would be looking for?
> --"websocket"; -- XMPP over WebSockets > Websocket is enabled as you can see. That is the opposite of what I see. > using one to use postgresql This would do it, the affected module is used by SQL storage for archive. > Concerning the memory dumps. Ignore it, I could could reproduce the issue. > I presume these will contain TLS keys No > and other sensitive information Yes --- The problem is that LuaExpat has a problem when a reference loop is created, due to how it keeps keeps track of things internally in a way that complicates things for the garbage collector, which leads to the leak. We have fixed a different case of this before, for normal XMPP Connections, but so long ago we apparently forgot this. The real issue in LuaExpat is actually fixed already in http://code.matthewwild.co.uk/lua-expat/rev/a8caec6c5429 but it is not yet in a release. A fix has been developed.
ChangesFixed (or maybe it should be described as a workaround) in https://hg.prosody.im/trunk/rev/e5e0ab93d7f4
ChangesThanks for the fix, my bad on not spotting the "--" in front of the websockets module. Great to hear it's already been fixed upstream. Do you have any idea when this may land in a new release? We currently have a cronjob restarting prosody on a daily basis because, depending on presumably activity, our memory fills up quite quickly now. We would of course like to get rid of such a tempfix as soon as we can. Thanks for the very swift responses and the great help. It's much appreciated! :)
Releases are made when they are made. You could apply the patch to /usr/lib/prosody/util/xml.lua in the mean time. Nightly builds is another option (starting tomorrow).
Thanks for pointing out we could easily patch that file. We've let it run for a few hours and our graph is nice and flat. Thanks for all the help! :D
This fix is now released in 0.11.13: https://blog.prosody.im/prosody-0.11.13-released/ Thanks for the report and assisting us with diagnosing the problematic configurations!
I deployed 0.11.13, thinking this patch is in there - and the release notes said so - but apparently, it's not. The patch linked above (https://hg.prosody.im/trunk/rev/e5e0ab93d7f4) was replaced by another commit 1 hour later (https://hg.prosody.im/trunk/rev/ebeb4d959fb3), reverting the change. The current trunk 0.12.x does not have the patch. The obvious question is WHY it was replaced/reverted? And I learned the hard way not to trust the release note.