Announcement

Collapse
No announcement yet.

Performance, Replication, Slave Server, etc

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Agg
    replied
    72 hours with no response?

    I'll try again in Server Configuration I guess? Reposting now.

    Leave a comment:


  • Agg
    replied
    Poll issue seems to be this bug so I've added some more info there.

    No ideas from anyone about the rest of our issues? Should I re-post in the Server Configuration forum?

    Leave a comment:


  • Agg
    replied
    Actually, on further testing the poll issue I described above (2 paragraphs above question 3) is NOT related to low-priority-updates as I have reproduced it when that is turned off.

    Basically if you have a long search running on the slave while someone votes, they are taken back to the vote screen, instead of to the results. So they vote again, and they are not taken to the "you have already voted, press back" screen like they should be. Again they taken to the voting page and can repeat this process many times (my record is 9) while the search is running. Then suddenly it says they have already voted and when you check the poll, ALL their votes were counted when only one should have been. It even lists their name multiple times in the poll results. So I think this should be handled by the master not the slave, and maybe even some sanity-checking code on the poll results to make sure one person doesn't appear multiple times.

    This seems to be an issue with 3.6.8p2 in master+slave mode as we didn't have the problem reported to us in 3.6.4.

    Leave a comment:


  • Agg
    started a topic Performance, Replication, Slave Server, etc

    Performance, Replication, Slave Server, etc

    I apologise in advance for the long message and compex series of questions! I'm hoping Jelsoft staff can shed some light. Not sure if it should go in here or the Server Configuration forum but here goes.

    I run a reasonably busy forum (up to 900 or so users most nights) with nearly 6M posts. Over the years I've been wrestling with performance issues as anyone with a big forum does. Lately I've been using the slave server setup in VBulletin with some success but I've noticed some strange things about the way VBulletin uses the slave server.

    Here's our server details:

    Server 1 (main forum server, apache + mysql):
    4x Opteron 850 (2.4GHz, 1MB cache)
    8GB RAM
    4x U320 15k-RPM HDD in RAID10

    Server 2 (originally webserver, but using as a mysql slave):
    2x Xeon 3GHz
    4GB RAM
    2x U320 15k-RPM in RAID 1 (mirror)

    Both running SLES9, PHP 4.3.4, MySQL 4.0.18. VBulletin recently upgraded from 3.6.4 to 3.6.8p2 (latest official build).

    Now, historically Server 1 can cope fine with the load of our forums, both the MySQL and Apache side of things. The only real issue is the table-locking problem when people run intensive searches. We find the long-running SELECT makes the next UPDATE have to wait (because the table is locked). However with a pending UPDATE all other queries have to wait because by default UPDATE is higher priority than SELECT etc. So sometimes hundreds of other queries start queueing up and the forums "hang" and are unresponsive to people while everyone waits for that one person's search to finish. But, apart from this issue with searching, our server can easily handle 1000+ users.

    So, question 1: I know this table locking issue is a fairly common problem, so if there's an easyish solution, maybe we can stop this thread right here. I've heard of changing the database type to one allowing row locking instead of just table locking - is this something that works with VBulletin? Any other ideas? I know setting UPDATE to lower priority than SELECT can help, but that's undesireable for reasons I will explain a little further (I've been using low-priority-update on the slave server and have observed some bad effects).

    Anyway, continuing on, I did some reading and discovered that VBulletin has the master/slave capability and that search queries are executed on the slave. This seems the perfect solution! While the slave server is handling that long search SELECT, everyone else is reading and writing from the master server and will not experience any delay. So I set up MySQL replication and configured the slave and all was good.

    However, I notice VBB doesn't ONLY run the search queries on the slave. All kinds of queries run on the slave and we can experience the same problem on the slave as described above. While the search SELECT is running, an UPDATE (from the master) may queue up and then we get a lot of queued queries behind it. However this gives us a strange situation where certain pages can be delayed (ones coming from the slave) while other pages are not delayed (those coming from the master). So (for example, I forget specifics now) we might be able to view a specific thread, but not the forum thread listing. Or you can view threads but not the UserCP or calendar. Which of course is better than not being able to view anything at all but still not great. I notice since we upgraded from 3.6.4 to 3.6.8p2 (last night) that even more tasks have been moved over to the slave. So even more things are delayed while the slave is processing a search query, so there's not really any benefit to splitting the server load across two servers for us.

    Question 2: Does Jelsoft have any plans to add some configurability where people can decide which parts of the forums are handled by the master or the slave? Even if only in very general terms, not every query obviously. For example I personally would choose everything to run from the master except for searching. ONLY search queries would run on my slave, ideally.

    Anyway, some more reading of MySQL documentation and I discovered "low-priority-updates". This essentially removes the table-locking bottleneck by stopping the SELECTS from queuing behind the UPDATE. In fact the UPDATE has to wait until all the SELECTS have finished. So this seems ideal to run on the slave which is now handling the search queries. But this produces another strange effect. It may have changed in 3.6.8p2 but here's how it worked in 3.6.4 (and I believe it is the same or worse now that more work is done by the slave):

    - Someone submits a post and the system accepts their message.
    - The system returns them to the thread they posted in and they can see their message there (because this page is served to them by the master, which has completed the UPDATE of their post)
    - They return to the forum listing, and notice their reply has not been registered as the last reply of the thread (because THIS page is served from the slave, and on the slave the low priority UPDATE is still waiting for other SELECTS to finish). Cue confusion and complaints.

    Similarly:
    - Person A sends a PM to Person B
    - Person B gets notification of a new PM (presumably this is sent by the master)
    - Person B goes to PM mailbox but there is no new PM (because the PM mailbox is served by the slave, which has not yet processed the UPDATE)

    Now obviously this only happens during times of relatively high server load. But even during quiet times an intensive search can cause this to appear. Since we upgraded to 3.6.8p2 a more severe issue has appeared with polls, where someone votes, is returned to the poll and their vote is not shown. So they vote again assuming a browser issue, and the same thing happens. We had someone do this 5 times and still nothing, until suddenly all the UPDATEs were processed and the poll now showed them as having voted 5 times for the same thing! And all their votes counted! Even though you should only be able to vote once on that poll. So that's a a big issue.

    So it seems low-priority-updates is not the solution either, although most of the time it is better on than off, because when it's off, there's no real benefit to having a slave server given search queries cause the same forum performance issues as when we only had 1 server.

    Question 3: Should we reverse our server setups? Obviously Server 1 is the real database workhorse. Server 2 was originally just a webserver so doesn't have a fast RAID setup etc. But it seems that Jelsoft are pushing more and more work onto the slave, so I wonder if the slave server should be more powerful than the master? This would be a big deal for us to swap over (re-setting up replication etc) so I'm reluctant to just give it a try as a test, I'd prefer an official word on that.

    However the real core issue is the table locking when searching. So maybe there's a solution to that and the rest will sort itself out. Apologies again for the long post!

    edit: on 2nd thoughts this probably is better suited to the Server Configuration forum.. feel free to move it if you like.
    Last edited by Agg; Fri 14th Dec '07, 9:00pm.

Related Topics

Collapse

Working...
X