Announcement

Collapse
No announcement yet.

Truncated posts and intermittent garbage characters (possibly UTF8 problem)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Truncated posts and intermittent garbage characters (possibly UTF8 problem)

    Background: I just upgraded from 4.1.7 to 5.3.3

    Problem: I have lots of users that copy/paste content from other websites. Lots of those posts are being truncated when they hit post, or portions of their post are garbage. They weren't having these issues on the old version of vb.

    I am guessing this is a UTF8 problem? So I checked the my.cnf in my bitnami lampstack setup and it's showing:

    PHP Code:
    [mysqld]
    ...
    character-set-server=UTF8
    collation
    -server=utf8_general_ci

    [client]
    ...
    default-
    character-set=UTF8 
    So the database seems to be okay. I am not sure what to do on the vBulletin side to see if things are set right. On AdminCP --> Languages & Phrases --> Language Manager I have the following:

    Click image for larger version

Name:	Capture.PNG
Views:	120
Size:	53.4 KB
ID:	4379332

    I am not sure what to test or change, or anywhere else to look. Do I need to change the HTML Character Set to UTF8 or something? Is there some setting I need to look at in Apache or is it out of the picture?

    Thanks!



  • #2
    More details. When I look at the old vb4.1.7 database, before the import I see this:

    PHP Code:
    mysqlshow variables like '%char%';
    +--------------------------+-------------------------------------+
    Variable_name            Value                               |
    +--------------------------+-------------------------------------+
    character_set_client     latin1                              |
    character_set_connection latin1                              |
    character_set_database   utf8                                |
    character_set_filesystem binary                              |
    character_set_results    latin1                              |
    character_set_server     utf8                                |
    character_set_system     utf8                                |
    character_sets_dir       | /usr/share/percona-server/charsets/ |
    +--------------------------+-------------------------------------+ 
    But the new database shows this:

    PHP Code:
    +--------------------------+-----------------------------------------------+
    Variable_name            Value                                         |
    +--------------------------+-----------------------------------------------+
    character_set_client     utf8                                          |
    character_set_connection utf8                                          |
    character_set_database   utf8                                          |
    character_set_filesystem binary                                        |
    character_set_results    utf8                                          |
    character_set_server     utf8                                          |
    character_set_system     utf8                                          |
    character_sets_dir       | /opt/lampstack-7.0.23-0/mysql/share/charsets/ |
    +--------------------------+-----------------------------------------------+ 

    I have no clue how to begin to fix this problem. Is it as simple as changing the HTML Character Set to UTF-8? I'm hesitant to try that if it's going to cause craziness.

    Comment


    • #3
      It is recommended to use UTF-8 as your HTML Character Set instead of ISO-8859-1. The latter is a subset of the UTF-8 so it shouldn't cause craziness. Also, make sure you have the mbstring library installed in PHP.
      Translations provided by Google.

      Wayne Luke
      The Rabid Badger - a vBulletin Cloud customization and demonstration site.
      vBulletin 5 Documentation - Updated every Friday. Report issues here.
      vBulletin 5 API - Full / Mobile
      I am not currently available for vB Messenger Chats.

      Comment


      • #4
        Originally posted by Wayne Luke View Post
        It is recommended to use UTF-8 as your HTML Character Set instead of ISO-8859-1. The latter is a subset of the UTF-8 so it shouldn't cause craziness. Also, make sure you have the mbstring library installed in PHP.
        Okay you got me brave enough to change the HTML Character Set in vb to UTF-8. I did verify that mbstring is loaded in PHP. However there is no change. I used wget --save-header https:.... and it's still sending this:

        HTTP/1.1 200 OK
        Date: Wed, 18 Oct 2017 18:54:45 GMT
        Server: Apache
        X-Frame-Options: SAMEORIGIN
        X-Powered-By: PHP/7.0.23
        ...
        Vary: Accept-Encoding
        Transfer-Encoding: chunked
        Content-Type: text/html; charset=ISO-8859-1

        So it's still serving up ISO-8859-1.

        php.ini has:

        default_charset = "UTF-8"


        Where else should I look? Do I need to bounce my apache service after making the change in the vb AdminCP?
        Last edited by alfreema; Wed 18th Oct '17, 11:06am.

        Comment


        • #5
          I just won a few months battle with this kind of issue.

          My guess is that you should :
          change your language character set to "UTF-8"
          uncomment the "$config['Mysqli']['charset'] = 'utf8" line in your core/includes/config.php file.

          You might have to change the locale to en.UTF-8

          Comment


          • #6
            Also, you should check with phpmy admin if you have some tables with latin1 collation.

            In this case you might have to run an "ALTER TABLE CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;" statement.

            After that, you should use the tools.php file to rebuild user and groups cache and setting cache.

            Comment


            • #7
              Originally posted by plongeur.com View Post
              I just won a few months battle with this kind of issue.

              My guess is that you should :
              change your language character set to "UTF-8"
              uncomment the "$config['Mysqli']['charset'] = 'utf8" line in your core/includes/config.php file.

              You might have to change the locale to en.UTF-8
              Ok I have done:

              1) change your language character set to "UTF-8", and
              2) uncomment the "$config['Mysqli']['charset'] = '[I]utf8" line in your core/includes/config.php file
              . . but I have not tried . .
              3) You might have to change the locale to en.UTF-8

              I will try that. Also, I am going to create my own little test.php file and look at the headers being sent back when I hit that -- that will take vBulletin completely out of the loop and help me boil it down to just my Apache/PHP configuration.

              Comment


              • #8
                Originally posted by plongeur.com View Post
                Also, you should check with phpmy admin if you have some tables with latin1 collation.

                In this case you might have to run an "ALTER TABLE CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;" statement.

                After that, you should use the tools.php file to rebuild user and groups cache and setting cache.
                Okay, good idea. I'll report back.

                Comment


                • #9
                  So this is interesting.

                  In order to take both vB5 and the database out of the loop, I created a "hello.php":

                  PHP Code:
                  <html>
                   <head>
                    <title>PHP Test</title>
                   </head>
                   <body>
                   <?php echo '<p>Hello World</p>'?>
                   </body>
                  </html>
                  Then I ran:

                  PHP Code:
                  curl -i http://localhost/hello.php | head -20 
                  That gave me the following result:

                  PHP Code:
                  HTTP/1.1 200 OK
                  Date
                  Thu19 Oct 2017 10:45:44 GMT
                  Server
                  Apache
                  X
                  -Frame-OptionsSAMEORIGIN
                  X
                  -Powered-ByPHP/7.0.23
                  Vary
                  Accept-Encoding
                  Connection
                  keep-alive
                  Content
                  -Length96
                  Content
                  -Typetext/htmlcharset=UTF-8

                  <html>
                   <
                  head>
                    <
                  title>PHP Test</title>
                   </
                  head>
                   <
                  body>
                   <
                  p>Hello World</p>
                   </
                  body>
                  </
                  html
                  So PHP/Apache IS working and configured properly to spit out UTF-8 by default. However when I run:

                  PHP Code:
                  curl -i http://localhost/hello.php | head -20 
                  Which brings mysql and vBulletin5 into the loop, something is causing the default to be overridden.

                  PHP Code:
                  HTTP/1.1 200 OK
                  Date
                  Thu19 Oct 2017 10:46:44 GMT
                  Server
                  Apache
                  X
                  -Frame-OptionsSAMEORIGIN
                  X
                  -Powered-ByPHP/7.0.23
                  Set
                  -Cookiesessionhash=xxxpath=/; secureHttpOnly
                  Set
                  -Cookielastvisit=1508410004path=/; secureHttpOnly
                  Set
                  -Cookielastactivity=1508410004path=/; secureHttpOnly
                  X
                  -UA-CompatibleIE=edge,chrome=1
                  Set
                  -CookiePHPSESSID=xxxpath=/
                  ExpiresThu19 Nov 1981 08:52:00 GMT
                  Cache
                  -Controlno-storeno-cachemust-revalidate
                  Pragma
                  no-cache
                  Vary
                  Accept-Encoding
                  Connection
                  keep-alive
                  Transfer
                  -Encodingchunked
                  Content
                  -Typetext/htmlcharset=ISO-8859-1

                  <!DOCTYPE html>
                  <
                  html id="htmlTag" xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xml:lang="en" lang="en" dir="ltr"
                  I tried changing the language to en.UTF-8 and that did not help. I will start diving into the database I guess, but I am not that familiar with mysql so I don't really know what to look for or how to troubleshoot.

                  Comment


                  • #10
                    Ok so I went into the database and ran a few things. From what I can see, things look good, but ... I dunno.

                    PHP Code:
                    mysqlshow full columns from text;
                    +----------------+-----------------------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
                    Field          Type                        Collation       Null Key | Default | Extra Privileges                      Comment |
                    +----------------+-----------------------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
                    nodeid         int(10unsigned            NULL            NO   PRI NULL    |       | select,insert,update,references |         |
                    previewtext    varchar(2048)               | utf8_general_ci YES  |     | NULL    |       | select,insert,update,references |         |
                    previewimage   varchar(256)                | utf8_general_ci YES  |     | NULL    |       | select,insert,update,references |         |
                    previewvideo   text                        utf8_general_ci YES  |     | NULL    |       | select,insert,update,references |         |
                    imageheight    smallint(6)                 | NULL            YES  |     | NULL    |       | select,insert,update,references |         |
                    imagewidth     smallint(6)                 | NULL            YES  |     | NULL    |       | select,insert,update,references |         |
                    rawtext        mediumtext                  utf8_general_ci YES  |     | NULL    |       | select,insert,update,references |         |
                    pagetextimages text                        utf8_general_ci YES  |     | NULL    |       | select,insert,update,references |         |
                    moderated      smallint(6)                 | NULL            YES  |     | NULL    |       | select,insert,update,references |         |
                    pagetext       mediumtext                  utf8_general_ci YES  |     | NULL    |       | select,insert,update,references |         |
                    htmlstate      enum('off','on','on_nl2br') | utf8_general_ci NO   |     | off     |       | select,insert,update,references |         |
                    allowsmilie    smallint(6)                 | NULL            NO   |     | 0       |       | select,insert,update,references |         |
                    showsignature  smallint(6)                 | NULL            NO   |     | 0       |       | select,insert,update,references |         |
                    attach         smallint(5unsigned        NULL            NO   |     | 0       |       | select,insert,update,references |         |
                    infraction     smallint(5unsigned        NULL            NO   |     | 0       |       | select,insert,update,references |         |
                    reportnodeid   int(10unsigned            NULL            NO   |     | 0       |       | select,insert,update,references |         |
                    +----------------+-----------------------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
                    16 rows in set (0.00 sec)


                    mysqlshow variables like 'collation%';
                    +----------------------+-----------------+
                    Variable_name        Value           |
                    +----------------------+-----------------+
                    collation_connection utf8_general_ci |
                    collation_database   utf8_general_ci |
                    collation_server     utf8_general_ci |
                    +----------------------+-----------------+
                    3 rows in set (0.00 sec)

                    mysqlshow variables like 'character_set%';
                    +--------------------------+-----------------------------------------------+
                    Variable_name            Value                                         |
                    +--------------------------+-----------------------------------------------+
                    character_set_client     utf8                                          |
                    character_set_connection utf8                                          |
                    character_set_database   utf8                                          |
                    character_set_filesystem binary                                        |
                    character_set_results    utf8                                          |
                    character_set_server     utf8                                          |
                    character_set_system     utf8                                          |
                    character_sets_dir       | /opt/lampstack-7.0.23-0/mysql/share/charsets/ |
                    +--------------------------+-----------------------------------------------+
                    8 rows in set (0.00 sec
                    Seems to me like the database is okay?

                    Comment


                    • #11
                      How about the collation for other tables ?

                      The encoding of the page is set in the language charset, so if you set it to utf-8 you should see the page in utf-8 in your brower (on firefox, the information button in the navbar, then more info, then general tab).
                      Now if this OK, the $config['Mysqli']['charset'] = 'utf8" tells vbulletin that the text is in utf8, if the text is in utf-8 and you do not set this, you should have squres with ? instead of special characters.
                      The locale mostly helps in system texts such as date from what i understood.

                      Comment


                      • #12
                        I just did another test. I created this test.php file:

                        PHP Code:
                         <?php
                        $servername 
                        "localhost";
                        $username "xxx";
                        $password "xxxx";
                        $dbname "xxx";

                        // Create connection
                        $conn mysqli_connect($servername$username$password$dbname);
                        // Check connection
                        if (!$conn) {
                            die(
                        "Connection failed: " mysqli_connect_error());
                        }

                        $sql "SELECT * FROM userlist";
                        $result mysqli_query($conn$sql);

                        if (
                        mysqli_num_rows($result) > 0) {
                            
                        // output data of each row
                            
                        while($row mysqli_fetch_assoc($result)) {
                                echo 
                        "id: " $row["userid"]. " - Name: " $row["type"]. " " $row["friend"]. "<br>";
                            }
                        } else {
                            echo 
                        "0 results";
                        }

                        mysqli_close($conn);
                        ?>
                        ... and when I ran it I got the following output ...

                        PHP Code:
                        HTTP/1.1 200 OK
                        Date
                        Thu19 Oct 2017 13:48:50 GMT
                        Server
                        Apache
                        X
                        -Frame-OptionsSAMEORIGIN
                        X
                        -Powered-ByPHP/7.0.23
                        Vary
                        Accept-Encoding
                        Connection
                        keep-alive
                        Transfer
                        -Encodingchunked
                        Content
                        -Typetext/htmlcharset=UTF-8

                         id
                        384 Nameignore no<br>id58 Namefollow yes<br
                        So the plot thickens. Hitting mysql with PHP and leaving vBulletin out of the loop, I get the proper result. The only time I get ISO-8859-1 is when I call vBulletin5 PHPs. So I think I can now definitively say that there is something specific to vBulletin 5 causing the default UTF-8 charset to be overwritten with ISO-8859-1.

                        Any hints on where to look for that?

                        Comment


                        • #13
                          As we already said, the two stuff making the charset are :
                          - the html language charset setting in the language options
                          - the $config['Mysqli']['charset'] in the php.config file

                          I took the liberty of going on your site, the text encoding is still latin1 as the screenshot state.
                          This means the html language charset setting is not set to UTF-8 (or not taken into account).

                          Click image for larger version

Name:	19-10-2017 16-42-01.jpg
Views:	71
Size:	113.8 KB
ID:	4379394

                          Comment


                          • #14
                            Originally posted by plongeur.com View Post
                            As we already said, the two stuff making the charset are :
                            - the html language charset setting in the language options
                            - the $config['Mysqli']['charset'] in the php.config file

                            I took the liberty of going on your site, the text encoding is still latin1 as the screenshot state.
                            This means the html language charset setting is not set to UTF-8 (or not taken into account).

                            Click image for larger version

Name:	19-10-2017 16-42-01.jpg
Views:	71
Size:	113.8 KB
ID:	4379394
                            Right, that's exactly the symptom. So far, nothing I have changed makes any difference. Maybe I am misunderstanding something you want me to change? Here are my current language settings and config.php settings:

                            Click image for larger version

Name:	language_settings.PNG
Views:	70
Size:	56.4 KB
ID:	4379396
                            Note: I also tried en.UTF-8 in the HTML Character Set, but nothing changed.


                            Click image for larger version

Name:	config_php.PNG
Views:	69
Size:	13.9 KB
ID:	4379397

                            Did I screw something up or misunderstand something you want me to change?

                            Comment


                            • #15
                              Well, if i am on the right site, it keeps sendig "<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">"
                              So i guess there is an issue somewhere.

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...
                              X