#1688 SystemD modifies the PID file, causes service to not stop/start when using systemctl

Reporter test4d
Owner Nobody
Stars ★★ (2)
  • Type-Defect
  • Milestone-0.12
  • Priority-Medium
  • Status-Accepted
  1. test4d on

    What steps will reproduce the problem? 1. `systemctl stop prosody` 2. `prosodyctl start` 3. `systemctl start prosody` What is the expected output? On step one, the Prosody server should completely stop, meaning that clients and servers cannot contact it. On step two, the Prosody server should restart, and clients and servers should contact it. I am aware that you're not really supposed to use prosodyctl. on step three, I'm not sure what should happen, perhaps do nothing as prosody is already running. What do you see instead? On step one, the pid file is destroyed, which means prosodyctl status reports that the server is off, and systemd reports that the service is off, but the service is still running, because clients/servers can connect to it. On step two, the pid file is created again, but prosody doesn't change as its already running. On step three, the command acts as a switch of sorts of the pid file. Running it destroys the pid file, running it again creates it, running it once more destroys it. This means that systemctl is not correctly commanding prosody to terminate and start as you would expect. What version of the product are you using? On what operating system? I was using Fedora 34 on version 0.11.10, using the prosody-0.11.10-1.fc34.rpm package specifically. Please provide any additional information below. I am monitoring the pid file placed in the default config's path, in /run/prosody. Zash asked me to report this earlier today.

  2. MattJ on

    • tags Milestone-0.12
  3. Zash on

    Thanks for filing the issue. We do need to investigate pidfile handling some more.

    • tags Status-NeedInfo Status-Accepted
  4. RemiBardon on

    Hello, I am investigating a similar issue where I try to run `prosody.util.prosodyctl.reload()` from a custom Prosody module at runtime and it behaves very weirdly. I stumbled upon one line, which I really don't understand. It's been here for 14 years so I'd expect it to be the expected behavior, but it really feels off. It's [prosodyctl.lua#L143](https://github.com/bjc/prosody/blob/a7799e11a9521d33cc322fa8b9cae99134219089/util/prosodyctl.lua#L143), the `if locked then` in: ```lua local file, err = io.open(pidfile, "r+"); -- ... local locked, err = lfs.lock(file, "w"); -- luacheck: ignore 211/err if locked then file:close(); return false, "pidfile-not-locked"; end ``` While trying to debug my issue, I understood that this code (from `local function getpid()`): 1. Opens the PID file in read & write mode - Why write too? To be allowed to lock writes? I don't know POSIX enough but it'd make sense 2. Locks writes to prevent race conditions 3. Reads the file for a PID 4. Closes the file - Is the lock automatically dropped? I suppose This process would make sense, but the code says something else: if the lock succeeds, it closes the file and raises an error saying "pidfile-not-locked" — the exact opposite of what `if locked then` means… [plugins/mod_posix.lua#L66](https://github.com/bjc/prosody/blob/a7799e11a9521d33cc322fa8b9cae99134219089/plugins/mod_posix.lua#L66) is the only other place I could find a `lfs.lock` being used, and it is used as I would expect — aborting if the lock fails (meaning someone else has a lock). I tried switching the condition from `if locked then` to `if not locked then`, and suddenly my module started working as expected, reloading Prosody when `prosodyctl.reload()` is called. However, `prosodyctl reload` broke, and now `lfs.lock` returns `false, "Resource temporarily unavailable"`. I still haven't found what's going on exactly with `prosodyctl reload`, but the code feels more correct, works as expected on my side and since you've also experienced issues with `prosodyctl` and the PID file, I have a feeling the problem is in `prosodyctl`. Hope this will help!

  5. RemiBardon on

    I just found some more information worth sharing: as I had guessed, the PID file is open in read+write intentionally, to get the lock later. MattJ explained it in https://issues.prosody.im/388#comment-2. I will add a comment linking to this answer in my patch.

  6. RemiBardon on

    By the way, `"pidfile-not-locked"` is not defined in `prosodyctl.error_messages` (https://github.com/bjc/prosody/blob/a7799e11a9521d33cc322fa8b9cae99134219089/util/prosodyctl.lua#L30). There is a fallback but this error could easily be better documented.

  7. RemiBardon on

    Ok I'm going back to searching and I also found another evidence that the code doesn't do as expected: the commit which introduced the line ([a664e2](https://github.com/prose-im/prose-pod-server/commit/a664e2ba60678280b37f036f6cb75d05f6276159)) is titled "util.prosodyctl: Report Prosody as not running if the pidfile isn't locked"… while the code clearly does the opposite. I thought I would find an explanation in the commit or PR which introduced the change… but now I really don't understand how this stayed under the radar for 14 years…

  8. RemiBardon on

    I got it. [In `mod_posix`](https://github.com/bjc/prosody/blob/a7799e11a9521d33cc322fa8b9cae99134219089/plugins/mod_posix.lua#L61-L84) the function `write_pidfile` exlusively locks the PID file but never unlocks it. I had supposed (to explain why the existing code was written like this) that the lock was automatically opened when the file was closed, but after some research it seamed I was wrong[^1]. I added `lfs.unlock(pidfile_handle);` before `pidfile_handle:close();` and after `pidfile_handle:flush();` in `write_pidfile()`, `lfs.unlock(file);` before `file:close();` in `prosodyctl.reload()` and switched `if locked then` to `if not locked then`, and now everything works as expected. I will submit my patch to <https://github.com/prose-im/prose-pod-server>, [Prose](https://prose.org/)'s fork of Prosody, so we can test it. If everything works correctly, I will push the patch upstream (I don't know how Mercurial works). [^1]: `writefile` in [keplerproject/orbit/src/orbit/cache.lua](https://github.com/keplerproject/orbit/blob/ff7fcbf26d392ad321931f647547d521cc38e3de/src/orbit/cache.lua#L40-L49) unlocks the file before closing it, and [the LuaFileSystem manual](https://lunarmodules.github.io/luafilesystem/manual.html) doesn't mention an automatic unlock. Also `unlock` needs an open file, so it wouldn't be able to run automatically after the file is closed.

  9. RemiBardon on

    See <https://github.com/prose-im/prose-pod-server/commit/e89ffbd1e668fef7f45b46d21f5c6f288509fd59> and <https://github.com/prose-im/prose-pod-server/commit/651b7e3e411c0489fbbc6d4cd56524e22d068cf2> for the two fixes in `mod_posix` and `prosodyctl.lua` respectively.

New comment

Not published. Used for spam prevention and optional update notifications.