#981 Prosody sometimes gets stuck using 100% of the CPU

Reporter Link Mauve
Owner MattJ
Created
Updated
Stars ★★ (2)
Tags
  • Status-Fixed
  • Priority-Medium
  • Type-Defect
  • Patch
  1. Link Mauve on

    What steps will reproduce the problem? 1. Have the server run for long enough, maybe What is the expected output? Prosody should continue to run properly. What do you see instead? Instead, it’s stuck using 100% of a core, not serving any client or s2s, not answering on mod_admin_telnet, not doing any syscall according to strace, but churning in what seems to be Lua land. What version of the product are you using? On what operating system? Prosody 0.10’s tip as of 2017-06-08, 2017-06-19, 2017-06-30 and 2017-09-07, on both amd64 and ARMv7. Please provide any additional information below. Sorry, I don’t think I have any. :(

  2. Link Mauve on

    This happens on both Lua 5.1 and Lua 5.2.

  3. Link Mauve on

    This problem has been identified in util.stanza, in maptags(), called by mod_mam_muc to filter out a malicious <stanza-id/>. The amount of children and the amount of tags seemingly differed, causing it to miss its condition and go into an infinite loop.

  4. Zash on

    The following code reproduces the problem in maptags: local st = require "util.stanza"; local s = st.message({}, "Hello"); s.tags[1] = st.clone(s.tags[1]) s:maptags(function () end); It happens if the top level stanza objects becomes out of sync with the subset of child nodes that are tags, kept in the 'tags' field. That should not happen with normal stanza manipulation, so the root cause is still unknown.

    Changes
    • tags Status-Accepted
  5. MattJ on

    Link Mauve, can you provide a list of loaded modules on the affected server? If you have more than one server, the intersection of each should be enough. My guess is some module that isn't using util.stanza's API for manipulating stanzas. In particular I'm aware of some very suspect code in mod_cloud_notify. I did read it a while back, and couldn't spot a bug, but it's complex enough that there's no guarantee I would. And there may well be others doing similar things...

  6. Link Mauve on

    I never had mod_cloud_notify loaded on linkmauve.fr, and yet it exhibited the same symptoms twice (long ago). Here are the modules we have loaded at JabberFR: mod_roster mod_saslauth mod_tls mod_dialback mod_disco mod_carbons mod_pep mod_private mod_blocklist mod_vcard mod_version mod_uptime mod_time mod_ping mod_register mod_mam mod_admin_adhoc mod_admin_telnet mod_bosh mod_websocket mod_limits mod_server_contact_info mod_welcome mod_watchregistrations mod_block_registrations mod_checkcerts mod_lastlog mod_smacks mod_smacks_offline mod_cloud_notify mod_csi mod_throttle_unsolicited mod_firewall mod_s2s_blacklist mod_announce_all mod_secure_interfaces mod_serverinfo mod_measure_cpu mod_measure_memory mod_log_auth mod_munin mod_measure_stanza_counts mod_traceback And at linkmauve.fr: mod_roster mod_saslauth mod_tls mod_dialback mod_disco mod_private mod_vcard mod_blocklist mod_version mod_uptime mod_time mod_ping mod_pep mod_register mod_admin_adhoc mod_admin_telnet mod_bosh mod_http_files mod_announce mod_welcome mod_watchregistrations mod_smacks mod_smacks_offline mod_carbons mod_mam mod_poke_strangers mod_secure_interfaces mod_server_contact_info mod_serverinfo

  7. Link Mauve on

    https://linkmauve.fr/files/prosody-infinite-loop.patch fixes the infinite loop in question. I haven’t managed to identify the module causing the issue, but this patch at least fixes the symptoms.

  8. Zash on

    Changes
    • tags Patch
  9. MattJ on

    "Fixed" in 7df29c5fbb9b. A quick note for the record - the patch above had an off-by-one error, which was caught by unit tests I added. Ideally we remove the fix once we identify the root cause. Link Mauve is running with a more verbose version of this patch, but in the meantime this commit will prevent anyone accidentally running into the same issue (whatever it is).

    Changes
    • tags Status-Fixed
    • owner MattJ
  10. Zash on

    This issue seems to have resurfaced in #1856

New comment

Not published. Used for spam prevention and optional update notifications.