Announcement

Collapse
No announcement yet.

phpBB2 -> vbCMS Problems (Waaay too long to import)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • phpBB2 -> vbCMS Problems (Waaay too long to import)

    I ran an initial import and things went well. I have about 90,000 users and about 400,000 posts. On my first run I set the import users to 100 and posts at 1000. Runtime for importing users was 188 min and for posts it was 16 min.

    Second time around (another test) the numbers have gone through the roof. I haven't even gotten past the user import yet. Based on some samples, it's taking about 1 minute to insert 120 users into vB. 90,000 / 120 = 750 min (12.5 hours). How can it go from a little over 3 hours on the first pass to 12.5 hours on the second?

    I've started and stopped the import several times, deleted the session, and restarted the import but I can't seem to get past about 120 users per minute.

    Unfortunately, I didn't check it during the first import but I have also noticed that the imported users have the following vb_user fields botched during the import:

    joindate
    birthday
    birthday_search

    Basically joindate and birthday are set to a year and birthday search is blank. And just glancing through, the year is wrong.

    I also don't know what's going on with salt either because it only ends up being like 3 or 4 characters long for the imported IDs and around 30 characters for vB users.

    Any thoughts?
    Community manager at Thailand Friends

  • #2
    BTW, found the issue with the joindate. In 004.php there's a line that says

    PHP Code:
    if(strpos($user['user_regdate'],',')) 
    But dates are formatted as such:

    2003-08-18 12:15:57

    So the correct code would be:

    PHP Code:
    if(strpos($user['user_regdate'],'-')) 
    I made that change and tested on an import and the date appears as a unix timestamp in the database now.

    Same thing with last user activity. Wrapping strtotime around lastvisit turns it into a unix timestamp as it's supposed to be.
    Community manager at Thailand Friends

    Comment


    • #3
      Bill, could you please report this in the vB4 Bug Tracker here:

      http://tracker.vbulletin.com/secure/Dashboard.jspa
      Steve Machol, former vBulletin Customer Support Manager (and NOT retired!)
      Change CKEditor Colors to Match Style (for 4.1.4 and above)

      Steve Machol Photography


      Mankind is the only creature smart enough to know its own history, and dumb enough to ignore it.


      Comment


      • #4
        Hmmmm . . . how about helping me with my problem first? LOL
        Community manager at Thailand Friends

        Comment


        • #5
          Fix what? You said you found a bug in Impex and I asked you to to report it as a bug as a bug so the Developers can look at the problem and fix it.
          Steve Machol, former vBulletin Customer Support Manager (and NOT retired!)
          Change CKEditor Colors to Match Style (for 4.1.4 and above)

          Steve Machol Photography


          Mankind is the only creature smart enough to know its own history, and dumb enough to ignore it.


          Comment


          • #6
            The bug is simply annoying . . . 120 users being imported per minute is the problem. I happened to discover the bugs while searching for an answer to why the thing was so slow. Speeding things up would be what I would like to get some help on.

            Or if possible, I can write my own importer for users but how can I skip that stage in the import (I don't want to write the importers for everything else).
            Community manager at Thailand Friends

            Comment


            • #7
              In order for the Devs to be aware of a bug and investigate it, then someone having this issue needs to report it in the Bug Tracker.

              You should not skip the User import because almost everything that comes after that depends on having accurate user data.
              Steve Machol, former vBulletin Customer Support Manager (and NOT retired!)
              Change CKEditor Colors to Match Style (for 4.1.4 and above)

              Steve Machol Photography


              Mankind is the only creature smart enough to know its own history, and dumb enough to ignore it.


              Comment


              • #8
                And found another bug while trying to figure out the original problem:

                Impex does not clean up the usertext table after a test run. I currently have 241,000 rows in that table but only 90,000 users.

                What it looks like is happening is that when a new user is created it creates a corresponding row in the usertext field. However it doesn't create an import id so when Impex "cleans up" after an import it doesn't delete those rows. And when you do subsequent imports it finds no conflict since the new user id is incrementing up in the user table so it just keeps adding new rows.
                Community manager at Thailand Friends

                Comment


                • #9
                  Steve,

                  Considering I've spent the last 2 days trying to get an import going, allow me to apologize beforehand if my wording sounds terse as I need to get this to work because I simply don't have the option to do a 12 hour user import. The original issue I asked for help on (the performance problem) has absolutely nothing whatsoever to do with the bugs I've found. Even after fixing those bugs the performance of the user import has not changed one single bit. I'm still averaging about 120 users per minute.

                  I have already reported the two bugs that I have found but I only mentioned them in the original post as anomalies that perhaps might uncover the problem with the performance issues. When I did further investigation on my own I found the reason for these anomalies, fixed them, and my problem has not been resolved.

                  So the original issue remains: I'm only averaging 120 users per minute on the user import.

                  Can I get some help on that issue?

                  Bill
                  Community manager at Thailand Friends

                  Comment


                  • #10
                    Looks like I solved the problem myself. I thought I would post what worked for me in here since the documentation for Impex is wrong.

                    First off, the Impex documentation states that when you do multiple imports (for testing purposes before you do your live one) that:

                    When you re-run a module, for what ever reason it will clean up any previous data imported of that type.
                    This is not true. It does wipe out some of the rather obvious data such as users and posts but it does not delete the following data:

                    pm
                    userlist
                    usernote
                    usertextfield

                    In each of these tables I had hundreds of thousands of records on a board that is fresh (okay, I had five test accounts but it was relatively brand spanking new). In the tables that included a userid the userid values ran well into the hundreds of thousands which made it quite evident that they had been created during one of my multiple import attempts.

                    What makes this even worse is that pm and usernote have importid's associated with the records so they 100% should be able to associate the records with the imports and delete them accordingly. That also means that userlist and usertextfield don't have importid's even though data is being imported into those tables. So Impex isn't deleting records that have importid's and it's not assigning importid's to inserts that it should be assigning id's to.

                    I haven't dug deep enough into the code yet but I'm pretty sure that this is going to be the case for all boards being imported into vB using Impex because this is on the insert and cleanup on the vB database and has nothing to do with the source database.

                    The biggest problem with this is what I mentioned in post #8 above. When you rerun Impex it looks to see if a record already exists for that user in various tables. If the record exists it updates. if the record doesn't exist it inserts a new row. Well, if you have no importid like on usertext and it looks at at the userid column then it's not going to find a user because on this new import the user has already been assigned a new userid. So every time you rerun it it just keeps adding additional records to the table. Soon you're looking at hundreds of thousands or millions of records (if you have a large enough board).

                    When I deleted all of the data belonging to imported users (after having Impex clean up so I could try another run) I went from 120 users per minute to 1500 users per minute with no other changes made between runs.
                    Community manager at Thailand Friends

                    Comment


                    • #11
                      bump because I haven't received a response in 12 hours.
                      Community manager at Thailand Friends

                      Comment


                      • #12
                        Sorry, I don't have a fix for this. However if I understand you correctly, this still involves what you believe are bugs in the code. Is that not correct?
                        Steve Machol, former vBulletin Customer Support Manager (and NOT retired!)
                        Change CKEditor Colors to Match Style (for 4.1.4 and above)

                        Steve Machol Photography


                        Mankind is the only creature smart enough to know its own history, and dumb enough to ignore it.


                        Comment


                        • #13
                          I guess you could say that I believe there are bugs in the code. I would phrase it more like, Impex seems to be a complete mess.

                          1. There are errors in the actual data manipulation. I mean, come on, the strpos separator is a ","? What time format is separated with a comma? That should be very easy to catch and I've seen many threads complaining about this behavior going back quite a long way.

                          2. Again, another issue I've seen many questions about in these forums is the last activity. This is another rather obvious error as the string isn't being converted to a unix timestamp. It's one function call to fix.

                          Those were just rather rather obvious bugs that I ran across in trying to debug the performance issue I was encountering. The performance issue I'm not even sure I term as a bug. It's a glaring and fundamental flaw. In two out of four tables that don't get cleaned up during a reimport there are no importid's. In my opinion that goes way beyond being a bug. That's a failure to perform one of the basic objectives of the software. The documentation repeatedly states that all imported data has an importid so that it can be cleaned up later. Yet, in two tables, this isn't the case as no importids are assigned. And in two others with importids, there is no cleanup whatsoever.

                          And that flaw, which is a serious issue by itself, then leads to performance issues which if people follow your advice in the manual they are going to encounter given a significantly large board or multiple test runs.

                          So, what I'm saying is that instead of ignoring what I've posted for 12 hours or saying "sorry, I don't have a fix for this" you should be flagging this as a major issue and getting out a fix as soon as possible since it goes the the core of what Impex is supposed to do. It is leaving hundreds of thousands, if not millions in some cases, of extra rows in the database which significantly impacts not just the performance of Impex but the subsequent performance of the board once it goes live.
                          Community manager at Thailand Friends

                          Comment


                          • #14
                            Wow, I feel like copy and pasting this entire thread to the bug tracker. Good find and hats off for doing your homework.

                            Comment


                            • #15
                              wow thats a big find indeed.
                              bookmarked!

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...
                              X