Announcement

Collapse
No announcement yet.

Performance issue and extreme resource usage after upgrading to v5 when moving topic to busy forum

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance issue and extreme resource usage after upgrading to v5 when moving topic to busy forum

    Some backstory: I run the servers for one of your clients. The client started on VB3, got a decent level of success, outgrew their dedicated server at Fast Hosts, by fate ran into me and the rest is history I guess. We've since upgraded to V4 a few years back and delayed moving to v5 in the hope that VB would sort out all the kinks. We're now on v5 and are running into some issues.

    Performance wise; I find that v5 is using roughly 4x as much resources on the server. Disk IO usage is massively higher. I run the forum on a pair of HP servers, E3 CPU, RAID 10 SATA storage, plenty of RAM. We're running a hot spare setup. I ended up temporarily moving the forum to a RAID 10 SSD based server at Rackspace as the forum was swamping the IO on the server. The server is running some other small sites and email for a fairly large number of people. Prior to the VB upgrade server load was typically below 2.0 (on a quad core server). Are there any tips to impove the performance of the forum in general especially in regard of performing less writes to the filesystem or preventing MariaDB from creating temp tables on disk which is what appears to be swamping the servers IO?

    My client is experiencing difficulty moving threads from one forum to another. On the production setup they are seeing errors related to the PHP processes going away when they attempt this action. Upon inspection I found that this is caused by PHP attempting to exceed its configured memory limit - so I increased it to 256MB, then 512MB, then 1GB, then decided to move testing away from the production server.

    I copied the forum to a test bed server, NVME SSD, 32GB RAM, Quad Core 5GHZ CPU (yes overclocked). Setup mirrors the live setup - php5.6 mariadb10 nginx. I cleared the systemcache and optimised tables before starting.

    When moving a thread into the main forum; which has about 1.25 million posts, I am seeing extreme resource usage on the server. After increasing various resource limits I see the forum attempting to run a 23+MB query (just the query text) on the database server. The query starts SELECT * FROM cacheevent WHERE `event` IN ('nodeChg_2426203'. After increasing the maximum query size on the server I see mysqld and php-fpm processes using around 3GB of RAM each. After about 2 minutes of CPU limited resource usage the forum thread is finally moved.

    As you can hopefully appreciate the resource limitations required to get this to run far exceed what is workable for most shared environments. Indeed I will not be offering to increase resource usage limits to this degree on my shared cluster as this would allow this user to impact other clients. Hopefully we can optimise the performance of this operation somehow so that the client does not need to either go without the ability to move threads or upgrade to a very high end dedicated server that shall largely be sitting idle and still take several minutes to simply move a thread?

  • #2
    You should probably truncate the cache, cacheevent and searchlog tables. (Don't do this on any other table).
    Also upgrading php to 7.2 would be beneficial.
    MARK.B | vBULLETIN SUPPORT

    TalkNewsUK - My vBulletin 5.6.3 Demo
    AdminAmmo - My Cloud Demo

    Comment


    • #3
      Hi Mark. I ran the truncate cacheevent; truncate cache; and truncate searchlog; commands on the sql database. Unfortunately the performance is just the same afterwards. I did try moving two topics with the same results.

      Comment


      • #4
        You will never find moving topics is particularly fast. As long as it works then I wouldn't be unduly concerned if the rest is ok. Moving topics isn't something that is being done constantly, it should be fairly infrequent at best. Certainly, it's not designed to be a function that is in constant use.
        MARK.B | vBULLETIN SUPPORT

        TalkNewsUK - My vBulletin 5.6.3 Demo
        AdminAmmo - My Cloud Demo

        Comment


        • #5
          OK thanks for confirming this. I cant support that level of resource usage on my shared hosting sadly.

          Regarding the disk io usage will this have been reduced by truncating the cacheevent (had 221k lines) cache (had 18578 lines) tables? Searchlog was already 0 lines.

          Also regarding the database; this database has been upgraded from vb3 to vb4 to vb5. I notice we have a mix of innodb tables and myisam tables. Is this intentional? Would we not see better performance by converting larger tables to innodb?

          Comment


          • #6
            vBulletin creates tables using the default table engine defined in your database settings. If you have a mix of tables, it is because older versions of MySQL were set to default to MyISAM. You should convert any tables necessary to INNODB. All new vBulletin 5 tables will be created as INNODB if that engine is the default in MySQL.

            You should divide your forum into more channels. When when you move a topic, it has to update the closure table. This table contains a parent-child hierarchy for every item in the database. When you have a large chain to update, it will require more time. There is no way around this. Closing large topics and splitting large channels into sub-channels can help alleviate the problem. Making sure that you're running INNODB and that MySQL is properly optimized with buffer sizes to fit your content. You cannot perform MySQL optimizations on shared hosting though.

            Translations provided by Google.

            Wayne Luke
            The Rabid Badger - a vBulletin Cloud demonstration site.
            vBulletin 5 API

            Comment


            • #7
              Thanks for the pointers. The language table will not convert to innodb:

              MariaDB [watrb42]> ALTER TABLE `language` ENGINE=InnoDB;
              ERROR 1118 (42000): Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline.

              Comment


              • #8
                We're seeing a considerable amount of temporary tables created on disk (around 30%). I've attempted to optimise this by increasing tmp_table_size and max_heap_table_size however this has not had the desired effect. This leads me to believe that perhaps VB is doing sorts which cannot be completed without tmp tables? The IO this creates is currently less of an issue, as the site is running on a cloud server which has SSD based storage, however the IO this created caused IO starvation on my usual RAID 10 SATA HDD based server. Any pointers re if its possible to optimise this - ie by reducing the amount of tmp tables created on disk would be appreciated.

                Comment


                • #9
                  You cannot use INNODB on the Language or Phrase tables unless you are using MySQL 5.6 or higher. Prior to that version, the INNODB engine does not support fulltext indexes. I don't know what version of MariaDB added fulltext on INNODB. However MariaDB 10+ is supposed to be a fork of MySQL 5.6.

                  Are you keeping a slow query log? If so what are the queries and explains for them? https://dev.mysql.com/doc/refman/8.0/en/explain.html

                  I also recommend upgrading to PHP 7.1.X and installing memcached to use as your regular cache (class[0]) in /core/includes/config.php.

                  You can also get more performance by offloading search to Sphinx as well.
                  Translations provided by Google.

                  Wayne Luke
                  The Rabid Badger - a vBulletin Cloud demonstration site.
                  vBulletin 5 API

                  Comment


                  • #10
                    Interesting re the fulltext indexes. I assume those little tables wont be hurting performance much so will leave them alone.

                    We are keeping a slow query log; however the queries creating the temp tables are not taking more than 1 second, so were not logged. I did try setting the slow query time to 0 seconds and thus logging everything for 10 seconds, but with an average of hundreds of qps this resulted in a huge amount of output which very hard to look through.

                    Will upgrading to php 7.1 result in lower memory usage? Will using memcache result in lower disk io? We're currently using zend opcache and apc for level 1 and 2 caches. Level 0 cache is disabled because it caused posts not to display for many minutes and the client complained about this.
                    Last edited by agoodm; Mon 30 Apr '18, 5:26pm.

                    Comment


                    • #11
                      During my time running the 0 second slow query log the slowest query took around 0,5 seconds. There were other similar queries taking similarly long. The query is:

                      SELECT DISTINCT node.starter AS nodeid
                      FROM node as node
                      LEFT JOIN node AS starter ON starter.nodeid = IF(node.starter = 0, node.nodeid, node.starter)
                      LEFT JOIN sentto AS sentto ON node.nodeid = sentto.nodeid AND sentto.userid = 3733 AND sentto.deleted = 0
                      WHERE starter.sticky <> '1' AND
                      node.contenttypeid <> 29 AND
                      node.parentid = 20 AND
                      node.showpublished > 0 AND
                      node.showapproved > 0 AND
                      (node.viewperms > 0 ) AND
                      node.inlist = 1 AND
                      ( node.contenttypeid <> 15 OR sentto.nodeid IS NOT NULL) AND
                      node.userid NOT IN ('1533','4531','4483','4460','4570','1373','2582','3087','3091','3142','3157','3148','2181 ','5409') AND
                      node.protected <> '1'

                      ORDER BY starter.lastcontent DESC,node.nodeid ASC

                      The explain for this query is:

                      +------+-------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------+-------------+---------+---------------------------+--------+----------------------------------------------+
                      | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                      +------+-------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------+-------------+---------+---------------------------+--------+----------------------------------------------+
                      | 1 | SIMPLE | node | ref | node_parent,node_user,node_showapproved,node_ctypid_userid_dispo_idx,contenttypeid_parenti d,node_inlist,showpublished | node_parent | 4 | const | 140896 | Using where; Using temporary; Using filesort |
                      | 1 | SIMPLE | sentto | ref | PRIMARY,nodeid,user_read_deleted | PRIMARY | 8 | watrb42.node.nodeid,const | 1 | Using where; Distinct |
                      | 1 | SIMPLE | starter | eq_ref | PRIMARY,node_sticky,nodeid | PRIMARY | 4 | func | 1 | Using where; Distinct |
                      +------+-------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------+-------------+---------+---------------------------+--------+----------------------------------------------+

                      Clearly this uses a temporary table, but its not clear if its an ondisk temp table? If its an on disk temp table what size limit would the in memory limit need setting to in order to avoid it? Or is it not avoidable?

                      Comment


                      • #12
                        I am surprised your system is running at all with the level 0 cache disabled. Probably accounts for all the I/O since most data is pulled from the cache for display. So you're rebuilding all that information every single page load by every single user. MySQL usually isn't the optimization issue that most people think it is. It usually accounts for less than 10th of a second in page generation times.

                        Moving level 2 from memory to APC can also have performance detriments but may lower your memory requirements. How much RAM are you allotting to each PHP Script?
                        Translations provided by Google.

                        Wayne Luke
                        The Rabid Badger - a vBulletin Cloud demonstration site.
                        vBulletin 5 API

                        Comment


                        • #13
                          Originally posted by agoodm View Post
                          During my time running the 0 second slow query log the slowest query took around 0,5 seconds. There were other similar queries taking similarly long. The query is:

                          SELECT DISTINCT node.starter AS nodeid
                          FROM node as node
                          LEFT JOIN node AS starter ON starter.nodeid = IF(node.starter = 0, node.nodeid, node.starter)
                          LEFT JOIN sentto AS sentto ON node.nodeid = sentto.nodeid AND sentto.userid = 3733 AND sentto.deleted = 0
                          WHERE starter.sticky <> '1' AND
                          node.contenttypeid <> 29 AND
                          node.parentid = 20 AND
                          node.showpublished > 0 AND
                          node.showapproved > 0 AND
                          (node.viewperms > 0 ) AND
                          node.inlist = 1 AND
                          ( node.contenttypeid <> 15 OR sentto.nodeid IS NOT NULL) AND
                          node.userid NOT IN ('1533','4531','4483','4460','4570','1373','2582','3087','3091','3142','3157','3148','2181 ','5409') AND
                          node.protected <> '1'

                          ORDER BY starter.lastcontent DESC,node.nodeid ASC

                          The explain for this query is:

                          +------+-------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------+-------------+---------+---------------------------+--------+----------------------------------------------+
                          | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
                          +------+-------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------+-------------+---------+---------------------------+--------+----------------------------------------------+
                          | 1 | SIMPLE | node | ref | node_parent,node_user,node_showapproved,node_ctypid_userid_dispo_idx,contenttypeid_parenti d,node_inlist,showpublished | node_parent | 4 | const | 140896 | Using where; Using temporary; Using filesort |
                          | 1 | SIMPLE | sentto | ref | PRIMARY,nodeid,user_read_deleted | PRIMARY | 8 | watrb42.node.nodeid,const | 1 | Using where; Distinct |
                          | 1 | SIMPLE | starter | eq_ref | PRIMARY,node_sticky,nodeid | PRIMARY | 4 | func | 1 | Using where; Distinct |
                          +------+-------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------+-------------+---------+---------------------------+--------+----------------------------------------------+

                          Clearly this uses a temporary table, but its not clear if its an ondisk temp table? If its an on disk temp table what size limit would the in memory limit need setting to in order to avoid it? Or is it not avoidable?
                          The node table will be one of the largest tables in your database. Not only in size but by row size. Every single piece of content including channels/forums and every single post has a node record. So on your site, it is probably over 1.5 million rows depending on how many attachments, private messages and so forth you have on the site. Each of those is a node as well. I doubt you'll be able to sort 1.5 million rows in memory. This query probably needs to be optimized but I don't how to do that myself. Half a second is too long though. You can create a bug report here:
                          http://tracker.vbulletin.com/secure/...91&issuetype=1

                          What is contenttype.contenttypeid 29 in your database? Private Messages?
                          Translations provided by Google.

                          Wayne Luke
                          The Rabid Badger - a vBulletin Cloud demonstration site.
                          vBulletin 5 API

                          Comment


                          • #14
                            IO is broadly similar with level 0 cache on / off. The IO that was hurting the normal server this runs on appeared to be random write io. Actual throughput wasnt very much (1-2MB/sec maybe) however the RAID10 was completely saturated.

                            Main server was running a memory limit of 256MB per process when I took the production load away from it. I had to increase this to 4GB on the testbed in order to get thread moves and copies to work. Additionally main server had APC limit increased to 256MB. I also experimented with varying settings in mariadb to try and alleviate the disk IO issues but kept reverting to standard settings as things were not improving.

                            Comment


                            • #15

                              "What is contenttype.contenttypeid 29 in your database? Private Messages?"

                              I'm not sure how to answer this question. Contenttype with id 29 fromj the database has class 4368616e6e656c

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...
                              X