Announcement

Collapse
No announcement yet.

Need Info - Custom Importer for XML Source

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need Info - Custom Importer for XML Source

    We are moving several projects from disparate forums into VBulletin 3.6.2. The source forums are all different, but one in particular is causing us a bit of grief. There is no access to the backend database, so we screen scraped all of the forum posts into an XML document, using thread ids, userids, and parent thread ids to maintain the post / reply /reply-to-reply hierarchy.

    Now that we have the posts and users OUT of the old forum, we need to get it into VBulletin. I can manipulate the data to conform to a DTD, or write a program to populate a database with it (given a schema that the impex understands?), but I need a schema either way. Is there a default schema that I can conform my data to, then code a custom importer to read that database?

    Thanks in advance for any help!

  • #2
    Just about any database schema where the id's of the individual data and the id's that link them would be enough.

    You try copying any of the source systems schemas, though literally something like this would be fine:

    User
    userid, username, email

    Forum
    forumid, title, description

    Thread
    threadid, forumid, title

    Post
    threadid, postid, userid, text

    That way they can all be linked.
    I wrote ImpEx.

    Blog | Me

    Comment


    • #3
      Thanks for the start Jerry! I like your idea of using the schema of an existing source system, then I don't have to figure out how to write an importer

      Is there a source system you can recommend that has the most simplified schema? We only have rudimentary data, like userid, threadid, post, postid, a few dates etc.

      Otherwise I have seen in other threads where you have written the importer if the user provides the database - is this something you could do for us once I transform the data into a database?

      thanks again in advance!
      Gretch

      Comment


      • #4
        I'll write importers for systems that are commonly availably so lots of people can use them, i.e. makes business sense, everything is are custom contracts.

        phpBB2 is a very common import, you could look at a schema from that.

        Users use the user tables, forums are forums, threads are topics and posts are posts.

        If you install ImpEx and then look at the database you'll see a table called vbfields, using this SQL you can see what the mandatory data needed is:

        Code:
        SELECT fieldname, vbmandatory FROM vbfields WHERE  tablename='post' AND product='vbulletin' ORDER BY vbmandatory;"
        So that returns :

        Y threadid
        Y importthreadid
        Y userid

        N importpostid
        N attach
        N visible
        N iconid
        N ipaddress
        N showsignature
        N allowsmilie
        N pagetext
        N dateline
        N title
        N username
        N parentid

        A postid

        So you need the top three, the middle ones are nice to have and postid is A meaning auto_inc so you don't fill that as MySQL will. Then just change post for which ever type you want.
        I wrote ImpEx.

        Blog | Me

        Comment


        • #5
          so close.. but threadids are 0 ??

          Hi Jerry,

          I am soooo close with the solution we discussed here. I used a simpleboard schema, because it seemed to support the threaded data that we have. I transformed my forum data into a simpleboard schema. I populated the following tables:

          users
          sb_users
          sb_categories
          sb_messages
          sb_messages_text

          The import seemed to go smoothly, but when I examined the VBulletin forums after the maintenance steps, my threads were available, but not the posts for the threads. I then checked the VBulletin database, and the threadid = 0 in the posts table for my new entries. I looked at the PHP code, I looked over the ids and data in my source database, and all looks like it should work.... but alas, it does not. Would you know of any reason why the threadid in the posts table would end up zero, even though all other fields (importthreadid, parentid, importforumid, etc) are all populated OK?

          Any hints or advice you can extend would be greatly appreciated!

          kind regards,
          Gretchen

          Comment


          • #6
            One other oddity...

            One other thing I have noticed when I examine the imported data - in the posts table, all posts that are the first post in thread should have a parentid of 0. Well, in my newly imported entries, the parentid of the first posts in the thread are all 65668. Hmmmm...wierd. This doesn't sound familiar to you by chance?

            thanks for any help,
            Gretchen

            Comment


            • #7
              The bit that adds the threadid is :

              PHP Code:
              $try->set_value('mandatory''threadid'$thread_ids_array["$post_details[threadid]"]); 
              The $post_details[threadid] being the origional and $thread_ids_array being the array of the importthreadid => threadid look up to get the new thread id's for the posts.

              I would double check that is being passed correctly and the lookup is happening.
              I wrote ImpEx.

              Blog | Me

              Comment


              • #8
                Got it!

                Just in case anyone else tries this nonsense that I am attempting, I wanted to let you know I found what my issue was.

                I happened to choose to port data into a database schema that has logical relationships between the messageId and the threadId. It seems that the ids in the sb_messages table have a special pattern for first post in a Simpleboard thread. The following pattern denotes a thread - the message id and the thread id will be the same, and the parent id will be zero. The first post in that thread will have this pattern - unique message id, the parent id = the thread id (which is the message id of the parent). It still hurts my head to follow it, but anyway, a bit of massaging of the ids, and my threads and posts imported fine.

                Whew! I'm almost there!

                By the way, Jerry, impex and VBulletin are AWESOME! I have a long BI and warehousing background, so migrating and ETL'ing data is easy for me - the part I'm loving about this migration is that VBulletin is doing everything I could have hoped it would do, and its going to make my community majorly happy that we could preserve so much of this forum info!

                Thanks for a great set of tools and a suuweeet product!
                kind regards,
                G

                Comment


                • #9
                  Excellent news, well done
                  I wrote ImpEx.

                  Blog | Me

                  Comment

                  widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                  Working...
                  X