#734 Non-ASCII characters dumped to debug log

Reporter Ge0rG
Owner Zash
Stars ★★ (2)
  • Status-Fixed
  • Milestone-0.12
  • Type-Defect
  • Priority-Medium
  1. Ge0rG on

    When an invalid input is detected on a connection, prosody dumps the raw data (with control characters stripped) to the logging facility: Aug 31 09:57:28 mod_c2s debug Received invalid XML (not well-formed (invalid token)) 77 bytes: ____H___D__Wƍ�2_M��x>_��a__����#ȱN�_4c��________ ___d_b_________c____�____ As the character encoding of syslog can not (and should not) be assumed, the dump should only contain the printable ASCII characters verbatim (codes 32-127). Otherwise, it might be possible to craft a character sequence that results in special or control characters, especially on UTF-8 systems (i.e. an RTL text-flow marker or even ASCII control characters composed via overlong UTF-8 sequences). Ideally, the invalid characters should be replaced by a safe encoding (urlencoded %1C or some other encoding that preserves the whole content and is easy to reconstruct). However, stripping of all non-printable non-ASCII characters would provide for a sufficient first approximation. Thanks, Georg

  2. Zash on

    • tags Status-Accepted
  3. Zash on

    FWIW it did escape ASCII codes 0-31 as underscores already. Fixed in https://hg.prosody.im/trunk/rev/7fa273f8869e

    • owner Zash
    • tags Milestone-0.12 Status-Fixed

New comment

Not published. Used for spam prevention and optional update notifications.