Announcement

Collapse
No announcement yet.

Different character set - login problems

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different character set - login problems

    Our old forums had this character settings in Language Manager. I downloaded the database to use on another domain.
    ------------------------
    oldforums: char set ISO-8859-1
    database: UTF-8 download
    ------------------------

    On the new domain I had two choices and I chose option 2. That way the character set in the Language Manager can be set at UTF-8.
    -1-----------------------
    database: UTF-8 upload
    newforums: char set ISO-8859-1

    -2-----------------------
    database: ISO-8859-1 upload
    newforums: char set UTF-8
    ------------------------

    Everything looks great. The special characters/foreign characters show in post.
    Problem is: users with special characters/foreign characters can't login, unless I save their username in the ADMINCP.


    For example:
    The user is Самохина

    In the database it is shown as #1057-#1086-#1088... (something like that) and login doesn't work. When I search for the username in ADMINCP nothing comes up but I can find the profile in ADMINCP searching for their email address.

    The name in the ADMINCP profile is just Самохина. When I save the user's profile in ADMINCP the username in the database changes to Самохина (instead of #1057;...etc) and now, I can find the username when I search for it in ADMINCP and the user can login again.


    Is there a fast way change that for all users who have special characters/foreign characters in their username, instead of saving it in ADMINCP one by one?

  • #2
    When you use ISO-8859-1, you're specifying a sub-set of UTF-8 that does not include cyrillic characters. These characters will be converted to their HTML Entities so they continue to display correctly. This includes usernames.

    With vBulletin 5, the recommended character set is utf8mb4 and the recommended collations are utf8mb4_general_ci or utf8mb4_unicode_ci.
    Translations provided by Google.

    Wayne Luke
    The Rabid Badger - a vBulletin Cloud demonstration site.
    vBulletin 5 API

    Comment


    • #3
      How do I make my forum be invisible to public
      sorry did not know where else to post this

      Comment


      • #4
        With vBulletin 5? Just like in older versions of the software.

        Channel Management → Channel Permissions.

        Set "Can View Channel" to No for all channels they shouldn't see. Children will inherit permissions from their parents. Do not set this for the Home Page "channel" as it will cause API problems and you may not be able to login properly.
        Translations provided by Google.

        Wayne Luke
        The Rabid Badger - a vBulletin Cloud demonstration site.
        vBulletin 5 API

        Comment


        • #5
          Originally posted by Wayne Luke View Post
          When you use ISO-8859-1, you're specifying a sub-set of UTF-8 that does not include cyrillic characters. These characters will be converted to their HTML Entities so they continue to display correctly. This includes usernames.

          With vBulletin 5, the recommended character set is utf8mb4 and the recommended collations are utf8mb4_general_ci or utf8mb4_unicode_ci.
          Thanks for the info. We already have set the settings you recommend, on our new domain. (utf8mb4_general_ci and forum character set). Posts and titles with foreign characters (like russian) shows normal. Everything works fine.

          The problem is with the foreign characters usernames (like: russian username) in ADMINCP search and log-in.
          When I search for that username with those characters in ADMINCP there is no result. Switching the character set in Language Manager to ISO-8859-1 gives me a positive result.
          Same with log-in for the user with that username. He/she can only login as the character set is ISO-8859-1. (or permanent workaround: if I save the username in ADMINCP -> that way character numbers in the database change to actual foreign characters).

          Changing to character set ISO-8859-1 in Language manager is not an option of course, because foreign characters in forum-post and titles etc will not show up anymore, or only as '?' - and it's not recommend either.


          Do you think I need to change all the username characters in the database?
          Like:

          - from username stored now: #1040-#1085-#1103; &#1057-#1086-#1088-#1086-#1082-#1080-#1085-#1072;

          - to username preferred: Самохина

          Comment


          • #6
            This was an upgrade? Did you run the conversion scripts we provide in the Database Tools folder? It is supposed to try and account for these changes and update the data where possible.

            If it is just one user then I suggest just updating their name in the AdminCP and saving it with the UTF-8 character set.
            Translations provided by Google.

            Wayne Luke
            The Rabid Badger - a vBulletin Cloud demonstration site.
            vBulletin 5 API

            Comment


            • #7
              v4.2.5 was transferred to the new domain. After that it was upgraded to v5.6.4 immediately,
              I never used anything from the do-not-upload folder but tools.php . I will take a look at the dbtools files if those can help. There was nothing in the upgrade manual about this.

              Comment


              • #8
                On a simple upgrade with the exact same settings as vBulletin 4.2.5, changes to the database shouldn't be needed. Unfortunately, this depends on the state of the data in your database and previous changes that were made over the years. So there are a lot of variables to deal with.
                Translations provided by Google.

                Wayne Luke
                The Rabid Badger - a vBulletin Cloud demonstration site.
                vBulletin 5 API

                Comment


                • #9
                  vB4.2.5 database was MyISAM uft8-general-ci and the character set in AdminCP set was set to ISO-8859-1.
                  On the new domain in vB5.6.4 we changed that to InnoDB and uft8mb4-general-ci and character set in AdminCP to UFT-8.

                  The data was actually stored incorrectly in the database. Converting data from latin1 to uft8mb4 did the trick.

                  Code:
                  php utf8convert.phar --connectionCharset=latin1 --charset=utf8mb4 --wipeSearch=1
                  It puzzles me why there was only a problem with login and AdminCP username search..

                  Comment


                  • #10
                    It seems that what I have said here and in the past is not being understood.

                    Some clarification...
                    • ISO-8859-1 is a subset of UTF-8 and does not support most languages. It supports English and some European Languages.
                    • vBulletin 4.25. does not support UTF-8 under any circumstance.
                    • vBulletin 5 supports UTF-8 when the database is using a UTF-8 character set.
                    • MySQL's Character Set Collation (i.e. latin1_swedish_ci, utf8_general_ci, and utf8mb4_general_ci) has nothing to do with how data is stored in the database. It controls how data is sorted in queries. Suffixes like ci, cs, ai and as control how accents and character case is handled in that sorting.
                    • Using the utf8 character set is not recommended. Instead you should use the UTF8MB4 character set. While Emoji are only supported with the utf8mb4 character set, the main reason is the specification. utf8 stores each character in 2-4 bytes. This means that PHP and therefore vBulletin is reliant on MySQL knowing where one character ends and another starts. It has had problems with this in the past. So the utf8mb4 character set was introduced. With this character set, every single character takes 4 bytes to store. This allows features like combination characters (used by Emoji).
                    All of this makes understanding the data in vBulletin 4.2.5 difficult. Especially when using the utf8 character set instead of the recommended latin1 (with vBulletin 4.X).


                    There are three variables in question here and determine how the system handles data. In new installations of MySQL, these are all utf8mb4 compatible...
                    • character set: Set at the server, database, table, and row levels. Technically, this determines how text data is stored by MySQL. However with modern versions of MySQL, all data is actually stored in a format that is UTF8MB4 compatible. Examples are latin1, utf8, utf16, and utf8mb4.
                    • collation: Set at the server, database, table, and row levels. ostensibly based on the character set but determines how query results are sorted.
                    • connection charset: This tells MySQL how to handle incoming data from applications like PHP. This allows MySQL to convert properly it for storage.
                    What about the HTML Character Set within vBulletin? This is not relevant to how the software communicates with MySQL and is only sent to the browser to tell it how to display characters. ISO-8859-1 is obsolete and should not be used with HTML5 at all. The only viable option here is UTF-8. All this does it tell the browser to handle characters like UTF-8 and try to display them based on that specification. The only reason that ISO-8859-1 might work in the AdminCP is because it doesn't use HTML5. It uses XHTML 1.0 (an entirely different problem) which does support ISO-8859-1.

                    Now about your issue

                    #1057-#1086-#1088... does not equal Самохина. The numbers are HTML Entity representations of the characters. However the browser will display the string of HTML Entities as the characters you expect. vBulletin 5 does try to determine if it is dealing with HTML Entities or not but this is also problematic.

                    Code:
                    php utf8convert.phar --connectionCharset=latin1 --charset=utf8mb4 --wipeSearch=1
                    The command line above is an example... It works for 99% of customers because they have never modified their database from the defaults of over a decade ago. For those that have modified their database, like yourself, it is hoped that you know better than we do and know these values. There are so many different server configurations that it is impossible to test against or even track them all.

                    If you were using UTF-8, then you should not have been using a connection character set of latin1. Though that is the default on many many MySQL installations. Even if they have been upgraded to MySQL 8 which uses utf8mb4 as the default character set, connection character set, and collation.

                    When the scripts runs to change your data it does this (simplified)...
                    1. They look at your character connection set (--connectionCharset=latin1) to get an idea of how your data is stored regardless of how MySQL says it is stored because they can be different.
                    2. They connect to MySQL with the same connection set instead of the server's default.
                    3. They tell MySQL to convert the var/char/text to varbinary/blob fields. When stored in binary, the each individual character should always be the same regardless of connection character set or the row's character set. This means that an & will always be an &.
                    4. Convert each row to the specified character set (--charset=utf8mb4).
                    5. Then convert the varbinary/blob fields back to char/var/text fields as necessary.
                    6. Repeat steps 3-5 for every row in the table.
                    7. Repeat for every table.
                    8. Update the database itself.
                    In a perfect world, this would accurately translate all data to utf8mb4. However, vBulletin 3 and 4 databases are not perfect. vBulletin 3 should have had UTF-8 support back in 2004. It didn't. UTF-8 support was never added to even vBulletin 4.2.5. So all data has to be treated with skepticism instead of being perfect. There is no way around this. In addition to this, some installations of vBulletin 4.X rely on PHP's MBSTRING class to handle international languages. Other installations rely on PHP's ICONV class. It was much later in vBulletin's lifespan where iconv has become required. Lack of data sets also hampers testing.

                    Most of the datasets involve arabic and german customers converting to vBulletin Cloud. As issues are encountered in those conversions, the scripts are updated and then those updates make it to the next released version for other customers.

                    So, here is what you have to do.

                    Determine your connection character set. This should be available in whatever tool you access the database outside of vBulletin. Do not use vBulletin to try and determine what this is. The database tools scripts do not use vBulletin's engine or API.

                    Restore the backup of your database. Run the conversion script with the correct connection set.

                    Comment out this line in your /core/includes/config.php file unless vBulletin specifically tells you to change it: $config['Mysqli']['charset'] = 'utf8';

                    If the name hasn't been updated, then you will need to manually change it.

                    Translations provided by Google.

                    Wayne Luke
                    The Rabid Badger - a vBulletin Cloud demonstration site.
                    vBulletin 5 API

                    Comment

                    Related Topics

                    Collapse

                    Working...
                    X