Announcement

Collapse
No announcement yet.

Hourly Cleanup #2 - V3.8.4 taking a *long* time to execute. PMs unusable

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hourly Cleanup #2 - V3.8.4 taking a *long* time to execute. PMs unusable

    Our PM system is unusable for 3 to 9 minute every time the Hourly Cleanup #2 is running, which can take 3 to 9 minutes.

    Any ideas how to make things better? I'm stumped.



    Similar to this persons problem:
    http://www.vbulletin.com/forum/showt...y+cleanup+%232

    In the ./includes/cron/cleanup2.php file is this part that is taking 3.5 to 9 minutes to execute:
    Code:
    // Orphaned pmtext records are removed after one hour.
    // When we delete PMs we only delete the pm record, leaving
    // the pmtext record alone for this script to clean up
    $pmtexts = $vbulletin->db->query_read("
        SELECT pmtext.pmtextid
        FROM " . TABLE_PREFIX . "pmtext AS pmtext
        LEFT JOIN " . TABLE_PREFIX . "pm AS pm USING(pmtextid)
        WHERE pm.pmid IS NULL");
    Here is the process it produces when it is running, here is shows it has been running for 425 seconds:
    Code:
    mysql> show full processlist;
    +--------+-----------------+-----------+--------------------+---------+------+--------------+----------------------------------------------------------------------------------------------------------+
    | Id     | User            | Host      | db                 | Command | Time | State        | Info                                                                                                     |
    +--------+-----------------+-----------+--------------------+---------+------+--------------+----------------------------------------------------------------------------------------------------------+
    | 873254 | root            | localhost | NULL               | Query   |    0 | NULL         | show full processlist                                                                                    |
    | 875770 | db_username     | localhost |  database_name     | Query   |  425 | Sending data | SELECT pmtext.pmtextid
            FROM pmtext AS pmtext
            LEFT JOIN pm AS pm USING(pmtextid)
            WHERE pm.pmid IS NULL |
    +--------+-----------------+-----------+--------------------+---------+------+--------------+----------------------------------------------------------------------------------------------------------+
    2 rows in set (0.05 sec)
    If there is anything regarding PMs that needs to be attended to while that process is running they are marked as 'locked' and will not run. People cannot even view their PM inbox or open a PM while this is happening. The little thingy in the browsers tab spins and spins until this process has completed, which can be up to 9 minutes.

    I went into the phpmyadmin and see there are less than 20K PMs in the entire database (we are a small forum).

    Click image for larger version

Name:	number of pms.JPG
Views:	1
Size:	23.2 KB
ID:	3718046

    Then I query for pm.pmtextid == NULL for all 20k PMs and that takes less than a millisecond (as you can see)
    Click image for larger version

Name:	quiry of NULL.JPG
Views:	1
Size:	87.7 KB
ID:	3718045

    Then I run this command from within the phpmyadmin, and as you can see it takes 331 seconds to execute.
    Code:
    SELECT pmtext.pmtextid
    FROM pmtext AS pmtext
    LEFT JOIN pm AS pm USING(pmtextid)
    WHERE pm.pmid IS NULL
        ;
    Click image for larger version

Name:	LONG query.JPG
Views:	1
Size:	54.7 KB
ID:	3718044

    More info:
    • I am not that technical, it has taken me a while to get to this point.
    • While I understand the concept of what the script is trying to accomplish, I do not understand the syntax of the mysql commands.
    • I have disabled hooks in the config.php (define('DISABLE_HOOKS', true); ) and it makes no difference at all.
    • Removed the few pugins / modifications we did have.
    • I clicked repair database and there were no issues found
    • Contacted our hosting provider and they confirmed the issue, then said because it came back with "Empty set (9 min 31.63 sec)", that although the command took over 9 minutes for them, it did not return an error.


    While this is happening the MySQL is using 100% of the time it can (as far as I can tell). We are on a vps and of course limited to our share of cpu time. Here is part of a top command output while the issue is occurring. Normally our load averages are about 0.10, when this is occurring it goes up to almost 2:

    Code:
    top - 22:22:52 up 26 days, 22:31,  1 user,  load average: 1.28, 0.63, 0.34
    Tasks:  64 total,   1 running,  62 sleeping,   0 stopped,   1 zombie
    Cpu(s):  3.0%us,  5.3%sy,  0.0%ni, 91.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   1572864k total,   428912k used,  1143952k free,        0k buffers
    Swap:        0k total,        0k used,        0k free,        0k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    24052 mysql     15   0 78492  40m 3688 S 100.2  2.6   4633:02 mysqld
     7172 myforum  16   0     0    0    0 Z  5.0  0.0   0:00.15 php <defunct>
        1 root      15   0  2148  668  580 S  0.0  0.0   0:05.73 init
     3881 root      15   0 16732  14m 1376 S  0.0  0.9   0:15.00 lfd
     5776 root      18   0  4700 3000 1664 S  0.0  0.2   0:02.67 authProg
     5961 nobody    15   0  9136 2148  680 S  0.0  0.1   0:00.00 httpd
     6127 nobody    18   0  9136 2144  676 S  0.0  0.1   0:00.00 httpd
     6128 nobody    18   0  9136 2144  676 S  0.0  0.1   0:00.00 httpd
    11568 named     18   0  192m 7260 2092 S  0.0  0.5   2:26.27 named
    11976 root      15   0  4700 2996 1664 S  0.0  0.2   0:02.80 authProg
    13482 root      18   0  1660  428  360 S  0.0  0.0   0:00.00 courierlogger
    13535 root      18   0  2000  628  516 S  0.0  0.0   0:00.01 authdaemond
    14142 root      15   0  2000  372  256 S  0.0  0.0   0:00.75 authdaemond
    14143 root      15   0  2000  372  256 S  0.0  0.0   0:00.82 authdaemond
    16279 nobody    18   0  9272 2864 1280 S  0.0  0.2   0:00.00 httpd
    16370 root      18   0  7808 5164 1744 S  0.0  0.3   0:16.65 tailwatchd
    16380 root      18   0  6728 3676 1380 S  0.0  0.2   0:00.17 cphulkd
    17468 root      19   0 14080 7892 1388 S  0.0  0.5   0:00.00 cpdavd
    17474 root      15   0  5320 3628 1280 S  0.0  0.2   0:37.59 queueprocd
    17501 root      33  18  3800 1528  608 S  0.0  0.1   0:00.18 cpanellogd
    17553 nobody    18   0  9272 2816 1268 S  0.0  0.2   0:00.00 httpd
    17604 root      18   0 15828 8056 1288 S  0.0  0.5   0:15.86 cpsrvd-ssl
    18308 root      15   0 24764 1136  816 S  0.0  0.1   0:40.88 rsyslogd
    18325 nobody    15   0  9272 2840 1280 S  0.0  0.2   0:00.01 httpd
    19592 root      18   0  9136 4072 2628 S  0.0  0.3   1:27.25 httpd
    19621 root      18   0  5272 1452 1104 S  0.0  0.1   0:00.00 pure-ftpd
    19623 root      16   0  4976 1144  900 S  0.0  0.1   0:00.00 pure-authd
    19892 root      15   0 10036 2788 2212 S  0.0  0.2   0:00.05 sshd
    Any input would greatly be appreciated
    Last edited by Abomination; Sat 19th Jun '10, 3:52pm.

  • #2
    Ok, I did the same command:
    SELECT pmtext.pmtextid
    FROM pmtext AS pmtext
    LEFT JOIN pm AS pm USING(pmtextid)
    WHERE pm.pmid IS NULL
    ;
    On an old unused database with 5K PMs in it. It took 0.15 seconds.

    Anyone have a clue what could be causing this? What to look for? Obviously the table could be corrupted, but how to diagnose?





    edit: is there a way of limiting that process to a certain number of pm.pmtextid's?

    Maybe let it run on the first 10k PMs only for debug purposes? Then the 2nd 10k PMs, then try to figure out which PMs might be causing the issue?

    I don't know the syntax for those commands. I'll look them up tomorrow though.
    Last edited by Abomination; Sat 19th Jun '10, 10:29pm.

    Comment


    • #3
      Someone is trying to help me in the background. Anyway I was asked about the Indexes / structure of the table.

      This is the data structure (and I believe index) of the database that has the issue. Notice that 'forumuserid' says 'disabled' in the comment field. Anyone know what that means?



      Here is the data structure (and I believe index) of the database that works, it is an old database. Notice that 'forumuserid' does not say disabled in the comment field.

      Attached Files

      Comment


      • #4
        If an Index is disabled, then it won't be used for any queries and those queries will take long to run. According to the MySQL documentation, one can disable an index if they are making schema changes to a large table in order to increase the speed at which those changes are applied. However the index should be enabled again afterwords. You should re-enable the index...

        ALTER TABLE ... ENABLE KEYS

        Replace ... with the table name.
        Translations provided by Google.

        Wayne Luke
        The Rabid Badger - a vBulletin Cloud demonstration site.
        vBulletin 5 API - Full / Mobile
        Vote for your favorite feature requests and the bugs you want to see fixed.

        Comment


        • #5
          Originally posted by Wayne Luke View Post
          If an Index is disabled, then it won't be used for any queries and those queries will take long to run. According to the MySQL documentation, one can disable an index if they are making schema changes to a large table in order to increase the speed at which those changes are applied. However the index should be enabled again afterwords. You should re-enable the index...

          ALTER TABLE ... ENABLE KEYS

          Replace ... with the table name.
          Regarding this test query
          Code:
          SELECT pmtext.pmtextid
          FROM pmtext AS pmtext
          LEFT JOIN pm AS pm USING(pmtextid)
          WHERE pm.pmid IS NULL
              ;

          On the database without problems I did this:
          ALTER TABLE pmtext DISABLE KEYS
          Confirmed that the forumuserid was DISABLED
          then ran test query.
          Then:
          ALTER TABLE pmtext ENABLE KEYS
          Confirmed that the forumuserid was ENABLED
          then ran test query.

          And I did that multiple times. No difference from my perspective, but obviously the time differences might be difficult to measure using a stop watch.

          On the database with problems I did this:
          ALTER TABLE pmtext DISABLE KEYS
          Confirmed that the forumuserid was DISABLED
          then ran test query 2 times: 371 & 691 seconds.
          Then:
          ALTER TABLE pmtext ENABLE KEYS
          Confirmed that the forumuserid was ENABLED
          then ran test query. 735 seconds.
          During those 3 querys the load averages were about 4.


          Is it possible to run this command on a subset of the PMs?
          Can I make a back up and start testing that, maybe deleting 5K pms at a time and try to find out which ones might be causing the problem?

          Comment


          • #6
            I almost forgot to say, thank you very much for your input Wayne, much appreciated.


            More information:
            • This issue has been going on for a long time, at least for several weeks, possibly for months. The only symptom is slow PMs, and of course higher loads if those are looked at.
            • I created another database and restored a back up less that 24 hours old.
            • I ran the test query on that database, and it worked. It only took a few milliseconds.
            • I am re-running the query on the live database, and it is taking forever again. Probably 10 minutes before it gets done running. EDIT: 337 seconds to make the query.


            Can anyone guess what might be going on?



            EDIT: is it possible that turning off the forum would help for debug? Is there any way to just freeze it then do the query?
            Last edited by Abomination; Mon 21st Jun '10, 2:44pm.

            Comment


            • #7
              Please run this query directly in phpMyAdmin and post the results here:
              Code:
              EXPLAIN SELECT pmtext.pmtextid FROM pmtext AS pmtext
              LEFT JOIN pm AS pm USING(pmtextid) WHERE pm.pmid IS NULL;
              Translations provided by Google.

              Wayne Luke
              The Rabid Badger - a vBulletin Cloud demonstration site.
              vBulletin 5 API - Full / Mobile
              Vote for your favorite feature requests and the bugs you want to see fixed.

              Comment


              • #8
                I appreciate you helping Wayne

                Here is the output:

                Code:
                Generation Time: Jun 21, 2010 at 03:39 PM
                Generated by: phpMyAdmin 3.2.4 / MySQL 5.0.90-community-log
                SQL query: EXPLAIN SELECT pmtext.pmtextid FROM pmtext AS pmtext LEFT JOIN pm AS pm USING(pmtextid) WHERE pm.pmid IS NULL;
                Rows: 2
                id     select_type     table     type     possible_keys     key     key_len     ref     rows     Extra
                1     SIMPLE     pmtext     index     NULL     PRIMARY     4     NULL     19962     Using index
                1     SIMPLE     pm     ALL     NULL     NULL     NULL     NULL     29080     Using where; Not exists
                Attached Files

                Comment


                • #9
                  Ideally this query should return 0 rows. That is you shouldn't have any orphan PMs in the system. I recommend deleting these manually and then running a repair and optimize on the table. This should allow the cron script to run properly in the future.
                  Translations provided by Google.

                  Wayne Luke
                  The Rabid Badger - a vBulletin Cloud demonstration site.
                  vBulletin 5 API - Full / Mobile
                  Vote for your favorite feature requests and the bugs you want to see fixed.

                  Comment


                  • #10
                    Originally posted by Wayne Luke View Post
                    Ideally this query should return 0 rows. That is you shouldn't have any orphan PMs in the system. I recommend deleting these manually and then running a repair and optimize on the table. This should allow the cron script to run properly in the future.
                    Deleting the orphaned PMs sounds great to me, but I've no idea which ones are orphaned.

                    I've been studying how the different tables interact for many hours now. And I've been looking at the output of that EXPLAIN command. From that output, how can I determine which PMs are orphaned?

                    Comment


                    • #11
                      Originally posted by Wayne Luke View Post
                      Ideally this query should return 0 rows. That is you shouldn't have any orphan PMs in the system. I recommend deleting these manually and then running a repair and optimize on the table. This should allow the cron script to run properly in the future.
                      Also, the problem appears to be database related. I took the forums offline, made sure the database was not being used at all by editing the config.php file, and tried running this query:
                      Code:
                      SELECT pmtext.pmtextid 
                      FROM pmtext AS pmtext
                      LEFT JOIN pm AS pm USING(pmtextid)
                      WHERE pm.pmid IS NULL
                       ;
                      And it took about 9 minutes to execute. I *thought* it may have been a problem that only occurred when the database was being used online, but the database itself seems to be the problem.

                      As stated earlier, a recent backup which has this problem was restored into a brand new database using this command:
                      mysql -u dbusername -p databasename < backupname.sql

                      And it worked great

                      Is it possible by backing up and restoring into a clean database, the the orphaned PMs will be deleted, and possibly the tables would be repaired and optimized?

                      Comment


                      • #12
                        Originally posted by Wayne Luke View Post
                        Ideally this query should return 0 rows. That is you shouldn't have any orphan PMs in the system. I recommend deleting these manually and then running a repair and optimize on the table. This should allow the cron script to run properly in the future.
                        I still don't know which ones, if any, are orphaned based on your response.

                        I 'repaired and optimized' the table, that did not help.

                        Comment


                        • #13
                          Code:
                          SELECT pmtext.pmtextid
                          FROM pmtext AS pmtext
                          LEFT JOIN pm AS pm USING(pmtextid)
                          WHERE pm.pmid IS NULL;
                          This query will find the ids of your orphan PMs. You could write a query to delete these PMs, but I'm wary of posting it because it could be dangerous in the hands of someone who isn't totally aware of what they are doing.

                          What we have done on our site, which has 8-10 million pms (we prune unread PMs after 6 months), is to disable this periodic cron job (which is a nasty one due to the way the PM system is designed in 3.x), and write a script that we run once a week to find and delete the orphan PMs.

                          I would post the code we use to do so, but it's sort of proprietary in that we actually slave the query to find the orphan PMs to a MySQL slave server (using query_read_slave()), then batch delete (in 500-1000 PM id increments) the orphan PMs on our master server. We delete about 75,000 orphan PMs a week (on Sunday night) this way.

                          The query to find the orphan PMs is nasty still, but a) it's not running on a regular basis locking people out of doing PM related stuff and b) it's running on the slave server, so it's not as bad.

                          Thankfully, I believe that the PM system was overhauled to fix this design flaw in 4.x, but I could be wrong.
                          Last edited by KrON; Mon 5th Jul '10, 8:06pm.
                          Kyle Christensen
                          PbNation.com - one of the biggest and busiest vbulletin forums on the net!

                          Comment


                          • #14
                            This query will find the ids of your orphan PMs.
                            You have *no* idea how grateful I am for your clearly worded reply!

                            I have been getting other input that was not nearly as easy to understand.

                            Here are the results, it looks to me that there are no orphaned PMs, true?
                            Code:
                            MySQL returned an empty result set (i.e. zero rows). ( Query took  372.3083 sec )
                            What does fix the problem is to create a new database and import a backup of our existing database into it. I've no idea why that fixes it.

                            For the record I know almost nothing about these types of things but have been learning just enough to be dangerous these last few weeks. We only have 20k PMs

                            Comment


                            • #15
                              That does indeed mean you have no orphan PMs. I don't think dumping and reimporting your database fixed it, I think that what probably happened is that the cronjob responsible for Orphan PM cleanup was able to complete successfully at some point.

                              The issue is though, that even though that query returned no orphan PMs, it still took 372 seconds to run, and during that time it's going to be locking your pm and pmtext tables, rendering the PM portion of your site useless for a period of time.

                              Sadly, short of doing what I had to do, I don't know of a good solution to prevent this from being a problem on a regular basis given your setup.
                              Kyle Christensen
                              PbNation.com - one of the biggest and busiest vbulletin forums on the net!

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...
                              X