Announcement

Collapse
No announcement yet.

My guide to converting to UTF-8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • My guide to converting to UTF-8

    I manually upgraded to UTF-8 a few months ago. This was my experience and these are my notes. I hope they are helpful to you or to the vB team.

    I'm not sure why everybody else does so much work to change the databse. Unlike others, I upgraded the database to UTF-8 using MySQL's native ability to change one character set (latin1) to another (UTF-8). Simple as the ALTER TABLE statement. For instance:
    ALTER TABLE `forum`.holiday CHARACTER SET utf8 COLLATE utf8_unicode_ci, MODIFY COLUMN recuroption varchar(6) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL;
    This requires a good understanding of MySQL and Excel.

    The basic steps:
    0. Back up your database. Do this on a test server first. I'm not responsible for you messing up your forums.
    1. Database: Convert the database's tables and columns from latin1 to UTF-8*.
    2. HTML: Convert the connection, and HTML character set to UTF-8
    3. Convert old threads to UTF-8.

    The zip file contains the Excel workbook and also the three PHP files that are used to convert old usernames, threads and posts to UTF-8.

    Follow the steps in each of the five sheets in the Excel workbook to complete the upgrade. Do it on a test server first.

    A few notes:
    - *You can actually stop and rest after converting the database. Latin1 is a subset of UTF-8. Having the database in UTF8 and the HTML in latin1 (ISO-8859-1) is not a problem here since all characters in the database are in latin1. I ran my forums for a week without any reported problems.
    - I was worried about performance problems after the conversion. The only problem I ran into, and it may not be related, was sorting forums with more than 500k threads. I removed the headers that allow people to sort threads by username, thread title, etc, and my forums run extremely well now.
    - I am running vB 3.8.4, but these directions should work for 4.x as well.
    - I use Sphinx for search. If you do not use Sphinx, you probably want to test your forum's search functionality. It will probably work, but I haven't tested it.

    I hope my experience is helpful for some of you. Please report any problems that you have.

    EDIT: This thread has been moved to a forum where conversation is not allowed. Please PM me with any questions or problems.
    Sorry if I have offended vB.
    Though this is now in a vB4 forum, it applies to 3.8 as well.
    Attached Files
    Last edited by mk132; Mon 1st Nov '10, 7:29am.

  • #2
    Hi before start to test it i would ask a question...

    my actual forum has as default a collation (based from the information provided from phpmyadmin) " utf8_unicode_ci " and "utf8_general_ci "
    Does seems normal that i have two different collation?
    Do I have to convert just the "general" one?
    Wich is the language of your forum?
    how i can understand wich is the best collation based on my language? (italian)
    Last edited by valerios; Fri 29th Apr '11, 7:28pm.
    VOTE ->Allow to rotate and tag the album pictures
    ~~~~~~~~~~~~~~~~~~~~~~~
    Official vbulletin Italian Group on vbulletin.com, Join us HERE
    ~~~~~~~~~~~~~~~~~~~~~~~
    Helping people to settle in Australia:
    www.australianboard.com
    The community of the people that dream Australia:
    www.australianboardcommunity.com

    Comment


    • #3
      I was just looking for a way to do this.

      I think vB should recommend that new installations set up their dbs as UTF-8 from the beginning as this seems to be the way we are evolving.

      Thanks so much for putting this information together.

      Comment


      • #4
        iirc, there are some serialization problems if you do it this way. This is not so simple and that's there is no official solution yet.

        Comment


        • #5
          Originally posted by CvP View Post
          iirc, there are some serialization problems if you do it this way. This is not so simple and that's there is no official solution yet.
          People have been asking for an official solution from vB to convert a MySQL DB to UTF-8 for 5 years. The response is always "it's very difficult." Time to crowdsource this thing.

          Comment


          • #6
            Originally posted by feldon23 View Post
            People have been asking for an official solution from vB to convert a MySQL DB to UTF-8 for 5 years. The response is always "it's very difficult." Time to crowdsource this thing.
            someone opened an issue about this where a vb dev (may be ed) replied about this issue. cba to search.

            They said they'd have it in 4.0; we know how it went.
            Then they said they'd aim for it in 4.1 and when they begun to work on it, they said it'd be on 4.2.
            While all these events are hard to accept, I'm willing to wait till 4.2.
            The conversion is not the only problem, there are many other parts of vb that needs to be changed too.

            jfyi, if they don't get it done by 4.2, I'll really be pissed off

            Comment


            • #7
              Originally posted by valerios View Post
              Hi before start to test it i would ask a question...

              my actual forum has as default a collation (based from the information provided from phpmyadmin) " utf8_unicode_ci " and "utf8_general_ci "
              Does seems normal that i have two different collation?
              Do I have to convert just the "general" one?
              Wich is the language of your forum?
              how i can understand wich is the best collation based on my language? (italian)
              My forum is a translation forum and uses all languages. How to choose between the two? Look at both collations to see which works best for your language. I recommend utf8_unicode_ci due to its advanced character matching, but I think it might not matter for Italian.

              Originally posted by melbo View Post
              I think vB should recommend that new installations set up their dbs as UTF-8 from the beginning as this seems to be the way we are evolving.
              I agree, and suggested it to them when 4.0 was in beta. They said "there are problems", though I don't see any.
              Originally posted by CvP View Post
              iirc, there are some serialization problems if you do it this way. This is not so simple
              Not that my directions are simple, but I have been running my forums on UTF-8 without any problems for the last six months. Serialization problems? None that I have seen. Are there any bugs you can point at that show this?

              Personally, I think vBulletin doesn't have to do anything more than put all my instructions in PHP code and make it an upgrade option in the AdminCP. Simple as that.

              Comment


              • #8
                Originally posted by CvP View Post
                if they don't get it done by 4.2, I'll really be pissed off
                it'd kill my faith in vb, and I don't want that to happen
                People ask me why I don't like rats. Sorry, I'm not giving you the answer

                Comment


                • #9
                  Guys, just follow my directions and stop worrying about it.

                  Or vBulletin could decide "let's do it" and convert my directions to PHP code now, release it as a beta UTF-8 converter on Friday, and have it go gold on New Years Day 2011.

                  Comment


                  • #10
                    Moved to a more appropriate location. This isn't company feedback nor is it a support request.
                    Translations provided by Google.

                    Wayne Luke
                    The Rabid Badger - a vBulletin Cloud customization and demonstration site.
                    vBulletin 5 Documentation - Updated every Friday. Report issues here.
                    vBulletin 5 API - Full / Mobile
                    I am not currently available for vB Messenger Chats.

                    Comment

                    Related Topics

                    Collapse

                    Working...
                    X