No announcement yet.

Apostrophe issues

  • Filter
  • Time
  • Show
Clear All
new posts

  • Apostrophe issues

    Using quick reply, apostrophes are fine:

    "testing quickreply apostrophe's. I'm We've They've"

    But when posting a message using "Go Advanced", they get mangled:

    "testing Advanced apostrophe's. I'm We've They've"

    Any ideas on how to fix this? Thanks!

  • #2
    User's are using UTF-8 characters for apostrophes and quotes. These are often referred to "Curly Quotes." The problem is that vBulletin 4.2.5 does not support explicitly UTF-8. It will attempt to convert these characters into HTML Entities ( i.e. ' ). You can try changing your character set from ISO-8859-1 (a subset of UTF-8) to UTF-8 in your AdminCP under Languages & Phrases -> Language Manager. Unfortunately, this may not resolve the issue for you.

    With vBulletin 5.5.X, you can convert the database tables to support UTF-8 characters along with using a UTF-8 character set and resolve the issue.
    Translations provided by Google.

    Wayne Luke
    The Rabid Badger - a vBulletin Cloud demonstration site.
    vBulletin 5 API - Full / Mobile
    Vote for your favorite feature requests and the bugs you want to see fixed.


    • #3
      Thanks Wayne. It's already set to UTF-8 so that probably won't solve it. So it looks like upgrading to 5.5.X is the way to go then?


      • #4
        It doesn't seem so.

        And in addition, what about the first quotation mark like this?

        Click image for larger version

Name:	Bildschirmfoto 2019-12-16 um 16.12.07.png
Views:	32
Size:	4.6 KB
ID:	4429696

        Do I have any chance to solve it about CP when my database has this collation? Does it correspond to the character set?

        Click image for larger version

Name:	Bildschirmfoto 2019-12-16 um 16.14.05.png
Views:	26
Size:	19.1 KB
ID:	4429697


        • #5
          The character set (i.e. utf8mb4) in the database controls how it is stored.
          The character collation (i.e. utf8mb4_general_ci) in the database controls how MySQL sorts query results.

          The value that controls how the browser displays data is the HTML Character Set. This is defined in the AdminCP under Languages & Phrases -> Language Manager -> Edit Settings for the language in question. On new installs of vBulletin 5, we set these to UTF-8. This means the browser will display UTF-8 characters. If you are upgrading from vBulletin 3, 4 and some older versions of vBulletin 5.X, then this is most likely set to ISO-8859-1, Windows-1259 or Windows-1256. Both of these are subsets of UTF-8 and do not support the modern "Curly" Quote characters. We attempt to change these into HTML entities but it doesn't work because the UTF-8 standard allows for character combinations that are combined to actually show variants of a single character.

          With ASCII storage, every character it supports is stored in 1 byte. We can use this to understand what we're displaying. However, the HTML Character set allows users to map these characters to different variants so different languages can be displayed. When vBulletin 3 was released over 15 years ago, the developers decided to work around UTF-8 by storing HTML Entities. These start with an ampersand (&) and end with a semi-colon (;). This worked great unless you used a language other than English. vBulletin 3 and 4 ended up storing all characters outside of the 256 provided by ASCII as HTML Entities

          With basic UTF-8 storage using the utf8 character set (and its corresponding collations) in MySQL, each character can take 1-3 bytes to be stored. So if the character being used is a character combination, it gets broken when stored. We have absolutely no way of knowing how many bytes a character takes when stored. When we request data from MySQL, we end up with the broken characters from MYSQL.

          So we recommend using utf8mb4 as the character. With this character set, we know that every single character is 4 bytes long. However, we still don't know what characters are what in your database. Is that & an ampersand or an HTML Entity? Should we convert HTML Entities into their UTF-8 characters or leave them alone? It is a very complicated problem. It gets even more complicated when you try to figure in RTL languages. It is even further complicated by older versions of MySQL because some of them never understood UTF-8.

          Today, a new installation of vBulletin 5.5.X will support UTF-8 out of the box. We set the database character set to utfmb4, the character collation to utf8mb_general_ci. Though utf8_general_ci_as would better for European sites. We set the HTML character set to "UTF-8" (yes, it is case sensitive). If all of these are in place and you're using MYSQL 5.6 or higher, then it all work in a new database.

          If your database is upgraded from an older version of MySQL, then it might not work.
          If you altered the character connection between PHP and MySQL in the past, then it might not work.
          If your database is predominantly HTML Entities then it might not work.

          To combat this, we have created a series of scripts to help people convert databases to UTF8MB4. Just changing the values on the tables and fields does not convert your data. Without converting the data, things may break. So we have to do many conversions in MySQL. We have to find out your previous connection character set (mostly Latin1) and your current database schemas. Then the scripts convert everything into its binary equivalent so we don't have to worry about how long each character is or try to rely on MySQL to give us the right character. Next we convert all the data to UTF8MB4.


          1 . If your site is in English or similar language without heavy use of accents or diacritics, then setting the HTML Character Set to UTF-8 will work in 75% of vBulletin sites. UTF-8 characters will show, curly quotes work, posts aren't cut off unexpectedly, etc...

          2. If you're using a European Language, you might be able to fix the issue by setting the HTML Character Set to UTF-8 and specifying a UTF-8 Locale in the Locale field. Setting a Locale, will require setting all the overrides on the Language settings page. Out of the remaining 25% of sites, this fixes about a third of them.

          3. All other sites will need to use the Database Conversion Scripts that we have included in the download package. However, we did find a few bugs when trying to convert this site. These are being fixed.

          An added bonus of converting to UTF8MB4, means that your site can support the Emoji that are popular with mobile devices.

          I can't really tell which of the three you groups above that you fall into without direct access to your server and database.
          Last edited by Wayne Luke; Mon 16th Dec '19, 10:30am.
          Translations provided by Google.

          Wayne Luke
          The Rabid Badger - a vBulletin Cloud demonstration site.
          vBulletin 5 API - Full / Mobile
          Vote for your favorite feature requests and the bugs you want to see fixed.


          Related Topics