#268 Character set problem with prosody and mysql

Reporter menkovich
Owner MattJ
Created
Updated
Stars ★ (1)
Tags
  • Priority-Medium
  • Type-Defect
  • Milestone-0.10
  • Status-Fixed
  1. menkovich on

    There is a problem with using cyrilic roster and group names In mysql database: {"name":"Проверка","groups":{"Ð¡Ð¾Ð±ÐµÑ ÐµÐ´Ð½Ð¸ÐºÐ¸":true},"subscription":"both"} but should be: {"name":"Проверка","groups":{"Собеседники":true},"subscription":"both"} Seems, that there is some encoding problems with database communication sql dump from prosody table: CREATE TABLE IF NOT EXISTS `prosody` ( `host` text, `user` text, `store` text, `key` text, `type` text, `value` mediumtext, KEY `prosody_index` (`host`(20),`user`(20),`store`(20),`key`(20)) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; There was latin1 in character set client, at the moment when prosody installed, now I set collactions and charsets, so it looks like this: character set client utf8 character set connection utf8 character set database utf8 character set filesystem binary character set results utf8 character set server utf8 character set system utf8 collation connection utf8_general_ci collation database utf8_general_ci collation server utf8_general_ci I restart prosody server, and try to test. But unfortunately, problem do not fix. When I change wrong Проверка to correct Проверка in database, in client it looks like ???????

  2. MattJ on

    Thanks for the report. I'm looking into this. I guess I know the answer, but could you give the output of `mysqld --help --verbose | grep ^default-character-set`? Thanks!

    Changes
    • owner MattJ
  3. menkovich on

    110922 10:40:27 [Warning] '--default-character-set' is deprecated and will be removed in a future release. Please use '--character-set-server' instead. 110922 10:40:27 [Warning] '--default-collation' is deprecated and will be removed in a future release. Please use '--collation-server' instead. 110922 10:40:27 [Warning] Changed limits: max_open_files: 1024 max_connections: 886 table_cache: 64 110922 10:40:27 [Warning] Can't create test file /var/lib/mysql/apollo.lower-test 110922 10:40:27 [Warning] Can't create test file /var/lib/mysql/apollo.lower-test mysqld: Can't change dir to '/var/lib/mysql/' (Errcode: 13) 110922 10:40:27 [Warning] One can only use the --user switch if running as root 110922 10:40:27 [Note] Plugin 'FEDERATED' is disabled. mysqld: Table 'mysql.plugin' doesn't exist 110922 10:40:27 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. default-character-set utf8

  4. menkovich on

    I have an report, that in latest version of sourcecode this bug was fixed(in gentoo prosody-9999 ebuild), I could not check this in my environment, but if you have additional information about problem - could you say which commit fixed this?

  5. MattJ on

    Changes
    • tags Milestone-0.9 Status-Accepted
  6. MattJ on

    Sorry for the long wait on this issue. I failed to reproduce it until recently. It seems the problem appears in Debian, but not in Ubuntu (at least in mine). We are working on a fix for the problem, and it will be available in 0.9. It will also fix any existing data in the database automatically. Note to myself: This also affects prosody-migrator.

  7. MattJ on

    Changes
    • tags Status-Started
  8. MattJ on

    Officially pushed to 0.10 for now, to avoid delaying the 0.9 release further.

    Changes
    • tags Milestone-0.9 Milestone-0.10
  9. hill.sobaka on

    the problem is in util/sql.lua inside engine:setEncoding function if driver == "MySQL" then set_names_query = set_names_query:gsub(";$", " COLLATE 'utf8_bin';"); local ok, charsets = self:transaction(function() return self:select"SELECT `CHARACTER_SET_NAME` FROM `information_schema`.`CHARACTER_SETS` WHERE `CHARACTER_SET_NAME` LIKE 'utf8%' ORDER BY MAXLEN DESC LIMIT 1;"; end); local row = ok and charsets(); charset = row and row[1] or charset; end self.charset = charset; return self:transaction(function() return self:execute(set_names_query:format(charset)); end); what if i have two utf8 charset: utf8mb4 and utf8? utf8-bin is not a valid collaction for ut8mb4 charset it should be utf8mb4_bin, but accroding to this code you will get a query "SET NAMES 'utf8mb4' COLLATE 'utf8_bin';" which is not valid and mysql just fallback to latin1. the fix is: if driver == "MySQL" then - set_names_query = set_names_query:gsub(";$", " COLLATE 'utf8_bin';"); local ok, charsets = self:transaction(function() return self:select"SELECT `CHARACTER_SET_NAME` FROM `information_schema`.`CHARACTER_SETS` WHERE `CHARACTER_SET_NAME` LIKE 'utf8%' ORDER BY MAXLEN DESC LIMIT 1;"; end); local row = ok and charsets(); charset = row and row[1] or charset; +set_names_query = set_names_query:gsub(";$", (" COLLATE '%s_bin';"):format(charset));

  10. MattJ on

    Just thinking aloud... will that query always pick the '*_bin' collation?

  11. hill.sobaka on

    this query will just select appropriate utf8*_bin collation. utf8mb4 - is for japanese language. hardcoding simple utf8 charset and utf8_bin collation would be enough, but i'm not egoistic, so i found a universal fix ))))

  12. hill.sobaka on

    Another case and final fix. Well previously i wrote about wrong collation for utf8 and how to fix it, but it doesn't resolve problem totally. What will happen if after some time db connection goes wrong (ex. after some time of inactivity)? Evidently you will get smth like "mysql has gone away" and of course sql util honestly restores sql connection, BUT this time it just restores it and do not call engine:setEncoding function. So i think the best way will to call engine:setEncoding function just after local dbh, err = DBI.Connect( params.driver, params.database, params.username, params.password, params.host, params.port ); if not dbh then return nil, err; end dbh:autocommit(false); -- don't commit automatically self.conn = dbh; in engine:connect function, and making engine:setEncoding a private one.

  13. MattJ on

    Hi folks, we pushed a bunch of changes to 0.10 to fix this issue (and automatically upgrade existing databases to the correct encoding). If you have *not* manually changed Prosody's code or the database, there should be no lost or corrupt data as a result of this issue, the migration stage moves everything over to the correct encoding safely. In any case we recommend you have backups of your database before performing any upgrades. Right now the fixes are only in the 'sql2' module, but we will soon be renaming this to replace the original 'sql' module. More information at https://groups.google.com/d/msg/prosody-users/QtUg7TgSAjk/m4yuiLMs-SQJ I'm marking this issue as fixed, but feel free to contact us if you experience this issue after upgrading.

    Changes
    • tags Status-Fixed

New comment

Not published. Used for spam prevention and optional update notifications.