Announcement

Collapse
No announcement yet.

default UTF-8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NLP-er
    replied
    Correction: new texts are OK withaout the option. With the option new texts are broken too. Old texts are broken in booth configurations. What to do?
    Last edited by NLP-er; Wed 2 Feb '22, 6:36am.

    Leave a comment:


  • NLP-er
    replied
    After update from vb4 to vb5 special characters are just ? signs. I had utf-8 before and have utf-8 now. I had uncommented $config['Mysqli']['charset'] = 'utf8'; before. Now ? are displayed with and without that configuration. New texts are OK but old ones are broken. What to do?

    Leave a comment:


  • -=|zami|=-
    replied
    probably, then, the problem occurred in migrating from VB4 to VB5, right?

    in practical and objective terms: there is no solution, right?

    Leave a comment:


  • Wayne Luke
    replied
    Yes. Generally if you had the line in your config.php commented out before upgrading, you should leave it that way unless the upgrade script tells you otherwise.

    Leave a comment:


  • -=|zami|=-
    replied
    thanks Wayne!

    when you say "it depends on how VB was communicating with the server before", you mean before the conversion to VB5?

    Leave a comment:


  • Wayne Luke
    replied
    it would be an error if the character_set_database and character_set_connection values ​​were different, right?
    The answer here is maybe. Unfortunately, there are too many variables to answer definitively.

    In a perfect world, these would be the same. However, it really depends on how your vBulletin was communicating to the server before. Changing this could cause text stored before the change to display incorrectly. The problem is that we have no idea what the length of the individual character is. UTF-8 characters can take 1-4 bytes of storage. UTF-8 characters usually take 2-3 bytes. Which is correct? We have no idea.

    If you were using Latin1 then it is one byte per character and we've converted every UTF-8 character to HTML Entities. These display properly but aren't searchable.

    On a new installation of vBulletin, we set everything to utf8mb4. This means that every character takes 4 bytes to store. This allows for every UTF-8 character including the Emoji standard popular on mobile devices at the expense of storage space. However, we know what the length of each character is without a doubt. 4 bytes or 32 bits. This allows us to handle things a lot better.

    Converting to utf8mb4 without knowing the actual content before hand is the difficult. Converting all those stored HTML Entities back to characters is a little easier if we know what the character size is. We have our convertors working for languages based on Latin characters. These are the easiest because they are heavily used in computer processing, even by nations that don't speak those languages languages. We're working on finalizing Arabic and other languages as quickly as possible.

    Leave a comment:


  • -=|zami|=-
    replied
    any tips?

    thanks!

    Leave a comment:


  • -=|zami|=-
    replied
    friends, sorry for the delay in answering. Can we resume the discussion?

    by manual, it would be an error if the character_set_database and character_set_connection values ​​were different, right?
    https://www.vbulletin.com/docs/html/mysqli

    In my live forum these two values ​​are the same as latin1.
    at first, no problem, right?
    however, the ideal is that these values ​​are equal to utf8mb4, correct?

    is there a safe way to make this change?

    thanks!!!


    Leave a comment:


  • Wayne Luke
    replied
    We don't generally recommend changing the Server Character Set on the database. Not unless you have run a complete conversion on the contents of every field in your database. Just changing the field to UTFBMB4 isn't enough because vBulletin 4 stores most non-English characters as HTML entities. The problem is that LATIN1 uses 1 byte per character. UTF8 uses 2-4 bytes per character. UTF8MB4 uses 4 Bytes per character. By changing the character set, you've changed the number of bytes per character and now the system doesn't actually know if it is serving valid characters.

    This is the most complicated part of the conversion. Does an & mean ampersand or is it an HTML entity? What happens if a byte in a character ends up translating to an ampersand, or worse end of file marker? This is the part we're working on in our conversion scripts. The data isn't clean in vBulletin 3 and 4. We have to clean it.

    Leave a comment:


  • Blackhorse
    replied
    6) in the old database: collation = latin1_swedish_ci (all tables) and table types = InnoDB, MEMORY or MyISAM. (InnoDB and MEMORY less than 50% of the tables).
    Variable_name Value
    character_set_client latin1
    character_set_connection latin1
    character_set_database latin1
    character_set_filesystem binary
    character_set_results latin1
    character_set_server latin1
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/
    any tips? thanks!
    to change character_set_database you have to convert all tables and columns to utf8mb4 and here you use the vBulletin tools to convert database, or you can hire a professional to do or do on whatever script or solution you have by your own - This is the hardest and most tough work! - You must be careful and you have to keep a clean backup even after successful conversion, don't lose it.The conversion will be done 1st on a copy of the database, keep the original untouchable!!

    1st TAKE a BACKUP

    2nd: run this query on the COPY:

    PHP Code:
    ALTER DATABASE old_db_name
        
    DEFAULT CHARACTER SET utf8mb4
        
    DEFAULT COLLATE utf8mb4_general_ci


    3rd; use the conversion tool for columns and tables.


    4th: If database is converted successfully, then uncomment // $ config ['Mysqli'] ['charset'] = 'utf8' in config file and change utf8 to utf8mb4: it will look : $ config ['Mysqli'] ['charset'] = 'utf8mb4'

    5th: upload Language xml file with UTF-8 encoding, if not one of the default installed language!

    6th: change language settings for UTF-8 and change locale

    ======

    You should see the following results:
    Variable_name Value
    character_set_client utf8mb4
    character_set_connection utf8mb4
    character_set_database utf8mb4
    character_set_filesystem binary
    character_set_results utf8mb4
    character_set_server utf8mb4
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/
    Last edited by Blackhorse; Sun 21 Apr '19, 10:14pm.

    Leave a comment:


  • Blackhorse
    replied
    5) in the new database: collation = utf8mb4_general_ci and table types = InnoDB, in all tables.
    character_set_client utf8mb4
    character_set_connection utf8mb4
    character_set_database utf8mb4
    character_set_filesystem binary
    character_set_results utf8mb4
    character_set_server latin1
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/

    As you are working on a new installation for testing then there is no harm,
    character_set_server needs to be utf8mb4 not latin1

    1st: go un-comment the config file allow utf8 and change it to utf8mb4

    2nd, we are going to make a global change for now: if you have access to server (or your host) go change my.cnf configuration adding the following:

    PHP Code:
     [mysqld]
    character-set-client-handshake FALSE
    character
    -set-server=utf8mb4
    collation
    -server=utf8mb4_general_ci
    init_connect
    ='SET collation_connection = utf8mb4_general_ci,NAMES utf8mb4'

    [client]
    default-
    character-set=utf8mb4

    [mysql]
    default-
    character-set=utf8mb4 

    After those steps you should see the following in installation site:
    character_set_client utf8mb4
    character_set_connection utf8mb4
    character_set_database utf8mb4
    character_set_filesystem binary
    character_set_results utf8mb4
    character_set_server utf8mb4
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/

    After those steps you you should see the following in live old site:
    Variable_name Value
    character_set_client latin1
    character_set_connection latin1
    character_set_database latin1
    character_set_filesystem binary
    character_set_results latin1
    character_set_server utf8mb4
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/

    to be continued

    Leave a comment:


  • -=|zami|=-
    replied
    thank you!
    Following the tips, I did a new installation for testing.

    1) during the prepration for the test installation, I observed the following:
    - in core / includes / config.php, I noticed that the line "// $ config ['Mysqli'] ['charset'] = 'utf8'; is commented. no problems?

    2) when I started the installation:
    - I received the following alert:
    "Action Required
    The current database character set is 'latin1'. It is strongly recommended that you use the utf-8 character set when running vBulletin. However, if you are using this database then changing the character set can affect them.
    Do you automatically want to change your database to use utf-8? "

    I clicked YES

    3) in admincp of the new forum: language pack ==> UTF-8

    4) I posted messages new forum and found no problems with the accented words

    5) in the new database: collation = utf8mb4_general_ci and table types = InnoDB, in all tables.
    character_set_client utf8mb4
    character_set_connection utf8mb4
    character_set_database utf8mb4
    character_set_filesystem binary
    character_set_results utf8mb4
    character_set_server latin1
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/
    6) in the old database: collation = latin1_swedish_ci (all tables) and table types = InnoDB, MEMORY or MyISAM. (InnoDB and MEMORY less than 50% of the tables).
    Variable_name Value
    character_set_client latin1
    character_set_connection latin1
    character_set_database latin1
    character_set_filesystem binary
    character_set_results latin1
    character_set_server latin1
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/
    any tips? thanks!
    Last edited by -=|zami|=-; Sat 20 Apr '19, 7:09pm.

    Leave a comment:


  • Blackhorse
    replied
    You need to change your database collation and charset, you have to change every table and column and allow utf8mb4 then!

    Leave a comment:


  • Wayne Luke
    replied
    if I use UTF-8 and the text contains accented words, the text will not be visible.
    You're not completely using UTF-8. Something in the configuration is broken. The character set is only the last part of the puzzle. The database has to be UTF-8, preferably UTF8MB4. Your Locale in the language settings has to be set to a UTF-8 locale. You will also need a modern font that is UTF-8 compatible.

    If you install a brand new copy of vBulletin 5 on the same server but with a new database and directory, what happens to the same characters?

    Leave a comment:


  • -=|zami|=-
    replied
    Thank you both for the help!

    very useful information from you!
    I do not use English only, and I need to allow accented words. however, the text of the posts is only visible and perfectly accented if the HTML Character Set is equal to ISO-8859-1.
    if I use UTF-8 and the text contains accented words, the text will not be visible.

    any tips?

    Leave a comment:

Related Topics

Collapse

Working...
X