Announcement

Collapse
No announcement yet.

default UTF-8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    A default vBulletin 4.X did not actually use UTF-8 in any way shape or form. If the character was not within the standard ASCII character set, it was converted and stored as its HTML Entity. It was not stored in UTF-8 or processed as UTF-8 at any time.

    The MySQL Connection has nothing to do with how data is stored or processed within vBulletin. It tells MySQL how to accept data. Even with UTF-8, we don't actually know what character is what? Why? Because UTF-8 stores characters in 2-4 bytes on MySQL. This is why MySQL actually recommends UTF8MB4 and that is what vBulletin 5 has been developed to use these days. With UTF8MB4, every character takes 4 bytes.

    vBulletin will create tables with the default collation of your database. If your database's default collation is latin1_swedish_ci (like 99% of installations before MySQL 8) than that is what vBulletin will use. We assume you know more about your database than we do. You would have multiple collations if you changed the collation on some tables and did not update the default collation. The only exception to this rule is on New Installations of vBulletin 5.x will force the UTF8MB4 character set and corresponding collation. On MySQL 5.7, this should be something like utf8mb4_general_ci. On MySQL 8.0 and higher it should be utf8mb4_900_ai_ci.

    Language settings for vBulletin are accessed in the AdminCP under Languages → Language Manager. The only HTML Character Set that should be used in 2022 is UTF-8. The locale should match the actual language and be the UTF-8 variant. If you do not specify the locale the text used in dates and times will not match that language.

    It is always complicated converting older databases. They often have a lot of cruft and there is no way to guarantee they haven't been manipulated by external software like plugins.
    Translations provided by Google.

    Wayne Luke
    The Rabid Badger - a vBulletin Cloud demonstration site.
    vBulletin 5 API

    Comment


    • #32
      Locale is not set at all in Language Manager. But should I really set there value UTF-8 or en.UTF-8 (case sensitive?)? Before or after running db scripts (I used backup to restore before scripts). Is it even related to characterset issues? Because if not let leave it for moment and focus on main issue.

      I do not want to make new installation, but upgrade from vB4. If characters were stored as HTML entity so why they do not work anymore? They are still the same, why the problem? How to solve it? Cannot they stay as HTML entity? Any solution which will work when db scripts are not enought?

      vBulletin Enterprise Translator (vBET) - your forum in 52 languages!

      Comment


      • #33
        If you're using English then your locale should be something like en_us.UTF-8 for US English or en_uk.UTF-8 for UK English. The HTML Character Set and Locale tell the browser how to display characters.

        If you use something like ISO-8859-1 for the HTML character set, then you're restricting display to basically ASCII and a few other characters. With a UTF-8 character set, the browser will use the full breadth of the font available.
        Translations provided by Google.

        Wayne Luke
        The Rabid Badger - a vBulletin Cloud demonstration site.
        vBulletin 5 API

        Comment


        • #34
          This still don't explains why HTMLentities stop working... Something is just wrong here. So I checked it - restored vb4 backup and as I see it is not true what you was telling me.
          vBulletin4 WAS USING UTF-8 see screen shot with post table and selected content (many different languages - all in UTF-8 in database).
          Click image for larger version

Name:	vb4_encoding.png
Views:	85
Size:	13.4 KB
ID:	4466587
          Click image for larger version

Name:	vb4_results.png
Views:	62
Size:	115.3 KB
ID:	4466588
          Then I restored to vb5 - it is update procedure which breakes everything. In vb5 table text (post do not exists anymore) for no reason at all has encoding latin1_swedish_ci and has broken content - see ??? signs in screen shot.
          Click image for larger version

Name:	vb5_encoding.png
Views:	63
Size:	10.2 KB
ID:	4466589
          Click image for larger version

Name:	vb5_results.png
Views:	65
Size:	87.8 KB
ID:	4466590

          So update scripts changes encoding from UTF-8 to some crap and breakes everything (which is nonsense, because you wrote that vb5 uses utf-8 by default, so why it breakes utf-8 during update?).
          I have vb4 backup in UTF-8 and want tu upgrade it to vb5. How to do it without breaking everything by your upgrade scripts?

          vBulletin Enterprise Translator (vBET) - your forum in 52 languages!

          Comment


          • #35
            We would need a copy of the pre-upgrade database via Support Ticket. It could be your plugin that is breaking things. Something about the data is not standard.
            Translations provided by Google.

            Wayne Luke
            The Rabid Badger - a vBulletin Cloud demonstration site.
            vBulletin 5 API

            Comment


            • #36
              Sorry for response time - I had busy week.

              There is too many crucial data to send whole DB. Also dump file of full DB is large. Also it is probably against some law (RODO or else) to send you all users data.

              You wrote that for vb5 UTF-8 is default encoding, so why vB5 creates tables in different encoding and breakes strings which are already in UTF-8?
              My plugin has own cache tables for translations. I show you snapshot of your table with not translated data - posts written by users. I did quick check of code for plugin for hook postdata_presave and I do not see any reencoding made by my plugin. I cna check other places if you point which plugins could change data before saving post.
              Still main issue is - data are already in UTF-8, but vb5 (which suppose to use UTF-8) during update writes the data to tables in latin1_swedish_ci encoding. This nonsense is on your side and you do not need my full database to find out why your scripts create tables in latin1_swedish_ci instead of in UTF-8. Why table encoding is changed to latin1_swedish_ci by vb5 update scripts?
              vBulletin Enterprise Translator (vBET) - your forum in 52 languages!

              Comment


              • #37
                The default for a new installation is UTF8MB4. vBulletin will never use a UTF8 character set on an install. Even MySQL doesn't recommend using the UTF8 character set anymore. Oracle created the UTFMB4 character set to specifically address numerous limitations in UTF8.

                On upgrades, we assume you know more about your database than we do. People make changes to their databases all the time based whether they are helpful or not. The upgrade system will use whatever default you have set for the database. So if your database was latin1, then it will be used. If we just created new tables in UTF8MB4 then your site would completely break. You cannot use multiple character sets and collations in a database. To reiterate, if you manually converted data to UTF-8 previously and your database default character set is latin1 and the default collation is latin1_swedish_ci, then vBulletin will use those defaults. The upgrade script can't determine what you intend, it can only use the data available in the database.

                We can't fix the issues with your database when we don't have access to the data. I can't help with the coding of your plugin. I would suggest making sure the actual database is in a good state with the proper data before working on plugins on it.
                Last edited by Wayne Luke; Fri 11 Feb '22, 9:58am.
                Translations provided by Google.

                Wayne Luke
                The Rabid Badger - a vBulletin Cloud demonstration site.
                vBulletin 5 API

                Comment


                • #38
                  This is not about my plugin, but about upgrading vb4 to vb5. You know that my default was NOT latin1 - so why are you writing about it? You know that upgrade system DIDN't use my default (I didn't have latin1_swedish_ci before update, but UTF-8) - so why you writing that you just use default? You also know that you do not need full database to fix the issue, so why are you asking for it? You also know that you didn't ask for access to database but for full copy, so why a re you writing that you cannot help without access?

                  Can we focus on what is going on and not on what suppose to be? It suppose to be super - I knot - but it's not. Upgrade didn't use my default, my database wasn't in latin1 and so on. Why vb5 upgrade script changed and broke encoding? Maybe I can set something before upgrade to avoid it again?
                  vBulletin Enterprise Translator (vBET) - your forum in 52 languages!

                  Comment


                  • #39
                    Have you rolle your database back to your backup or are you still on your upgraded database? If still on the upgraded one, restore a copy of your backup to a NEW database then take screenshots of the table lists and post them here showing the collations of all tables and the main database collation in them.

                    As Wayne says, without access to a database to investigate, there's little we can do other than guess - plus this will simply result in "post tennis" as we bounce questions/answers back and forward...
                    Vote for:

                    - *Admin Settable Paid Subscription Reminder Timeframe*
                    -
                    *PM - Add ability to reply to originator only*
                    - Add Admin ability to auto-subscribe users to specific channel(s)
                    - "Quick Route" Interface...

                    Comment


                    • #40
                      I have full system snapshots of booth - actual one with broken vb5 encoding and with vb4. I also have full database backup from vb4 (over 2GB). As I remember vBET 5 didn't ask about encoding during update and it is working obligatory without previous files - so that could be reason why vB5 breakes encoding during update (it didn't see my $config['Mysqli']['charset'] = 'utf8'; from vb4).

                      I realize that it is harder. But as I wrote I simply cannot send you my all users data - I guess it is illegal in UE (RODO). Plus as I wrote it is LARGE file (ober 2GB - lot of my plugin's cache tables). Still my plugin do not change structure of vB4 tables - I just add my own caches in seprarate tables, so this is not the issue.
                      I also think that there is simple bug in update scripts which makes the bug happen. As I wrote - I think that vB5 is not using default encofing, because he simply loose this information from obligatory removed config file and to not ask about it in database connection form (which is visible during update). And finally vb5 do not check database structure (scripts could simply check table encoding). Now is used encoding not related to transformed table, so it breakes it.
                      Going this way it shouldn't be big issue to update update scripts and remove the bug. Not only for me. I guess there is still more vb4 users who are simply afraid to update because of issues.

                      Below are screenshotes of vb4 tables at the end is shown main. As I wrote before: crucial tables uses utf-8 (with posts and so on), some are using latin1. Blog was using latin1 as I see (but I didn't use it) - it was treated as your plugin by vb4 as I see (same as CMS - also not used by me). Forum is crucial for me.
                      Click image for larger version  Name:	Zrzut ekranu (11).png Views:	0 Size:	309.3 KB ID:	4466885
                      Click image for larger version  Name:	Zrzut ekranu (12).png Views:	0 Size:	344.0 KB ID:	4466886
                      Click image for larger version  Name:	Zrzut ekranu (13).png Views:	0 Size:	336.3 KB ID:	4466887
                      Click image for larger version  Name:	Zrzut ekranu (14).png Views:	0 Size:	333.5 KB ID:	4466888
                      Click image for larger version  Name:	Zrzut ekranu (15).png Views:	0 Size:	333.0 KB ID:	4466889Click image for larger version  Name:	Zrzut ekranu (16).png Views:	0 Size:	329.7 KB ID:	4466890Click image for larger version  Name:	Zrzut ekranu (17).png Views:	0 Size:	332.1 KB ID:	4466891
                      Click image for larger version  Name:	Zrzut ekranu (18).png Views:	0 Size:	362.9 KB ID:	4466892
                      Click image for larger version  Name:	Zrzut ekranu (19).png Views:	0 Size:	344.4 KB ID:	4466893
                      Click image for larger version  Name:	Zrzut ekranu (20).png Views:	0 Size:	25.1 KB ID:	4466895
                      Attached Files
                      vBulletin Enterprise Translator (vBET) - your forum in 52 languages!

                      Comment


                      • #41
                        Originally posted by NLP-er View Post
                        This is not about my plugin, but about upgrading vb4 to vb5. You know that my default was NOT latin1 - so why are you writing about it? You know that upgrade system DIDN't use my default (I didn't have latin1_swedish_ci before update, but UTF-8) - so why you writing that you just use default? You also know that you do not need full database to fix the issue, so why are you asking for it? You also know that you didn't ask for access to database but for full copy, so why a re you writing that you cannot help without access?

                        Can we focus on what is going on and not on what suppose to be? It suppose to be super - I knot - but it's not. Upgrade didn't use my default, my database wasn't in latin1 and so on. Why vb5 upgrade script changed and broke encoding? Maybe I can set something before upgrade to avoid it again?
                        I absolutely know nothing about your database. You have refused to let us see it to provide support. I asked for a full copy because our developers won't be able to fix the issue on your site. The tools they need won't be available. I am not the one that is going to be fixing it.

                        Even if all of your existing tables are UTF-8, the DATABASE has its own default that it will use for new tables unless a character set and collation is defined. vBulletin upgrades do not define a character set and collation when creating new tables on upgrades.
                        Translations provided by Google.

                        Wayne Luke
                        The Rabid Badger - a vBulletin Cloud demonstration site.
                        vBulletin 5 API

                        Comment


                        • #42
                          You've got a real mix of collations on your tables there, I'm actually surprised you haven't run into an "Illegal Mix of Collations" error long before now...

                          While some of your tables are utf8_* collations, your main database collation is latin1_swedish_ci - as per the very last line underneath the last table in the list. You asked why new tables were created using this collation - because that's what your database is set to:

                          Click image for larger version

Name:	tables.png
Views:	55
Size:	27.2 KB
ID:	4466935 So the upgrade script is correctly determining what your database collation is and creating tables using that collation. The scripts only force a change on installs as Wayne's previously stated.

                          I would suggest rolling back to vB4 for the moment so you have a working site, then create a test site where you can use a copy of the database to make changes/test converting your data to the same collation first before attempting an upgrade. There are scripts available via sites such as StackOverflow etc that you can run to do this then check that the results on your vB4 test site are correct first before then running the upgrade on the test site:

                          https://www.vbulletin.com/go/testserver

                          Before running the upgrade, make sure that the main database collation is set to the same as all tables post-conversion.
                          Vote for:

                          - *Admin Settable Paid Subscription Reminder Timeframe*
                          -
                          *PM - Add ability to reply to originator only*
                          - Add Admin ability to auto-subscribe users to specific channel(s)
                          - "Quick Route" Interface...

                          Comment


                          • #43
                            Originally posted by Wayne Luke View Post

                            I absolutely know nothing about your database. You have refused to let us see it to provide support. I asked for a full copy because our developers won't be able to fix the issue on your site. The tools they need won't be available. I am not the one that is going to be fixing it.

                            Even if all of your existing tables are UTF-8, the DATABASE has its own default that it will use for new tables unless a character set and collation is defined. vBulletin upgrades do not define a character set and collation when creating new tables on upgrades.
                            You know everything you asked about my database.
                            Yes you do not have my database and as I already wrote it is probably ILLEGAL for me to send you my full database (RODO).
                            Developers can identify places in scripts which create new structure or where they rewrite data and they can respect existing encoding without having my database. I can test it.
                            You even know where the issue is - you described it: "vBulletin upgrades do not define a character set and collation when creating new tables on upgrades"

                            Originally posted by Trevor Hannant View Post
                            You've got a real mix of collations on your tables there, I'm actually surprised you haven't run into an "Illegal Mix of Collations" error long before now...

                            While some of your tables are utf8_* collations, your main database collation is latin1_swedish_ci - as per the very last line underneath the last table in the list. You asked why new tables were created using this collation - because that's what your database is set to:

                            Click image for larger version  Name:	tables.png Views:	0 Size:	27.2 KB ID:	4466935 So the upgrade script is correctly determining what your database collation is and creating tables using that collation. The scripts only force a change on installs as Wayne's previously stated.

                            I would suggest rolling back to vB4 for the moment so you have a working site, then create a test site where you can use a copy of the database to make changes/test converting your data to the same collation first before attempting an upgrade. There are scripts available via sites such as StackOverflow etc that you can run to do this then check that the results on your vB4 test site are correct first before then running the upgrade on the test site:

                            https://www.vbulletin.com/go/testserver

                            Before running the upgrade, make sure that the main database collation is set to the same as all tables post-conversion.
                            It is as it is. The base was created by your scripts in vB 3.8 over decade ago. Went well to 4.x and now is big issue with 5.x. This is bug in your scripts. You ignore existing encoding during creation new structure and rewriteing old data. It is enough to respect existing encoding in one of those places and it will be fine. Just update upgrade scripts.

                            I never had any issues with illegal collations or at least do not remember having any.
                            Database set is not an issue here. The issue here is that you ignore vB settings - your own settings. To be precise this one:
                            Code:
                            $config['Mysqli']['charset'] = 'utf8';
                            .
                            Which you simply loose during BAD upgrade (old files must be removed and script do not ask about it in db connection form, as I remember).
                            So your answer for my question is wrong. You create tables in latin1 not because of default database encoding, but because of BUG in upgrade procedure which looses vBulletin setting, clearly saying:
                            // ****** MySQLI OPTIONS *****
                            // When using MySQL 4.1+, MySQLi should be used to connect to the database.
                            // If you need to set the default connection charset because your database
                            // is using a charset other than latin1, you can set the charset here.
                            DEFAULT CONNECTION CHARSET is in vBulletin setting which you loose. It is not in database. Not for your code. and It is your developers who put it that way. So please correct actual update procedure to respect it. What hopefully will give appropriate results after upgrade.

                            Because everything is fine on vB4 I do not have to do any changes on vB4. I just need you to correct your bugs in upgrade code/procedure. So please do and let me know when I can do the upgrade again.
                            vBulletin Enterprise Translator (vBET) - your forum in 52 languages!

                            Comment


                            • #44
                              Connection Character set is not equal to the default character set of the database. It only determines how the application (vBulletin) talks to the database. It has absolutely nothing to do with how the data is actually stored in the database.

                              The default character set and collation of the database is just a default. You can change the character set of individual tables and fields without changing the default character set and collation of the database. These are all different variables used throughout the system.

                              https://www.tutorialspoint.com/set-t...r-set-in-mysql

                              As for the developers fixing data they don't have access to, that is impossible. The software is much more than just PHP code, the data is important as well and since your data isn't working with the scripts then there is something different with it and that data is needed to update the code in order to get the correct results. And since changing the character set and collation affects every single byte of data in your database, the database is needed. Preferably the one before the upgrade. I can't see how it can be illegal to have your agents work on your data but I am not a lawyer.
                              Translations provided by Google.

                              Wayne Luke
                              The Rabid Badger - a vBulletin Cloud demonstration site.
                              vBulletin 5 API

                              Comment


                              • #45
                                I never wrote that those two things are equal. I wrote that FOR YOUR CODE "DEFAULT CONNECTION CHARSET is in vBulletin setting which you loose. It is not in database".
                                And because you loose configuration you use WRONG charset from database default. Default charset also do not tell how data is stored in database. Just how it is done for tables which do not declare charset. You are loosing your own configuration, you are using wrong configuration, you are ignoring table configuration. Correct it.

                                Tutorial you send is not for me. Give it (or find better one) to your developers - let them update upgrade scripts to concern about all those things and upgrade will go smoothly in all encoding cases.

                                How it could be illegal to send you my whole (2GB) database? - it includes my users data, and probably RODO has something against sending those data. I'm also not lawyer. But I'm developer. And I know that knowing well what my bug is about I can find the code and change it. Cannot test it in environment I do not have, but can test does it still working in existing one. For example, if I know which important configuration I loose during upgrade I can add additional field to form and write it to configuration file. For example if I know that I'm using WRONG configuration and where is appropriate one, I can use good one instead of bad one. You don't need my full database to correct those things. Just instead of writing in name of developers send them ticket with issue description.

                                Developers do not have to fix my data - those are not broken until upgrade. Developers just need to fix upgrade scripts. And it is quite possible to not loose connection charset or to check table encoding, or to reencode data when rewriting to other charset (if you feel you must change original encoding). You just need to check what you are doing in your upgrade scripts. There is nothing different in my data - this is all your code which put it there since vB 3.8. Just like your table structures. You was handling it over decade and now you are unable to put it to new structure which also you made? Please stop telling what your developers cannot do, because when you know exactly which configuration you loose during upgrade and your developers are unable to add 1 field to form asking for connection details, then change tour developers.

                                Please push it instead of bouncing it.
                                vBulletin Enterprise Translator (vBET) - your forum in 52 languages!

                                Comment

                                Related Topics

                                Collapse

                                Working...
                                X