Announcement

Collapse
No announcement yet.

Forum Harvesting ImpEx?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Forum Harvesting ImpEx?

    I am just curious if any work has been done on a standalone PHP script to harvest 3rd party forum posts and turn them into vBulletin posts.

    So far, vBulletin has one example of this. Specifically an EZBoard type harvester which I was very thankful for the existence of many years ago. The code behind this is not available to the public.

    It seems that rather than a reduction (and eventual elimination) in free forums that hold your data prisoner, there has instead been an influx of more of these shady operators who install a modified copy of phpBB that can handle thousands of boards, collect the ad revenue, and meanwhile refuse any request for the post data -- regardless of how much money is offered.

    For a couple of years now, I have been considering trying to write a script on my own that would trawl a forum, grab the threads and posts, and repost them to an empty vBulletin forum. After having to literally copy and paste hundreds of posts from a ForumCo forum, I made my mind up to try to write something!

    Essentially it is a matter of identifying the start and end of each post in a thread, and further identifying HTML snippets which can be converted to bbCode. I have written some test code that correctly identifies each post (and its author) in a thread plus the name and ID of the thread. It would not be too many steps further to use some of the example code at vBulletin.org to make the script post into an empty vBulletin forum. Then the User-Post Associator script can be used to marry all those posts back to users as they sign up on the new forum.

    The original [How-To] Create Threads thread (mentioned here) seems to have vanished without a trace. But fortunately I found How To Create a Post and How to Create a Thread.

    Is there enough interest in a script like this? I know some people have asked for Yuku and ProBoards importers and this type of script would be perfect for this. It is really just a question of how much time do I invest?

    Also, ideally, the config file would use RegExes but I am not so good with those...
    Last edited by feldon23; Wed 2 Apr '08, 6:15pm.

  • #2
    Well ImpEx will take care of the forum/thread/post creation that's the easy part, it's a case of building the spider to go though the sites and grab the content.

    That is hampered by styles, inconsistent markup, variations from the norm, etc.

    As for actually doing it I started investigating doing it for one, though was a bigger project than I first thought, and it's not something that can be released as it could be used maliciously against boards that the script runner doesn't own.
    I wrote ImpEx.

    Blog | Me

    Comment


    • #3
      See and the hard part for me is OOP makes my eyes cross. The recursive nature of ImpEx is necessary, I understand, but I just can't grok it. So writing a script which captures all those posts, and then, as a guest, posts them to a vBulletin forum one at a time may be a kludge but it's what I know.

      And I realize there is always the chance it will be used for malicious purposes. There is no bulletproof way for Jelsoft to confirm that someone is the owner of a board. It's why I thought this would be available at vBulletin.org and not official. I dunno. It's just a thought I had and I don't know how many folks are stuck on ProBoards, Yuku, etc.

      Each type of forum import would have to have a PHP file which defines the HTML markup to match against for post ID, post start, post end, thread ID, thread name, user name, user ID, etc. Then there is bbCode cleanup which I was just going to do some rudimentary cleanup like <b> to [b]. It really does depend upon forums not having too complicated stuff.

      From what I know, I think forum admins are just happy to have the posts come over, even if some of the bbCode is broken. They are used to free and crappy, so just taking the posts is a big upgrade.

      Comment


      • #5
        Well I've added ProBoards to the poll.

        I was half way though writing a generic framework for a board spider, though it's on hold at the moment and will probably be picked up after ImpEx2 is in place.
        I wrote ImpEx.

        Blog | Me

        Comment


        • #6
          A friend of mine has created a script that does this, I will poke him and see if he wants to post here.

          Comment


          • #7
            I believe there were open source importers for phpbb and smf that would import the data from ezboard and other such sites. You may want to try that and them use impex to import over from phpbb or smf. Ugly solution but its an ugly task...
            Plan, Do, Check, Act!

            Comment


            • #8
              Even in the event of one existing (from us) it could be released and would have to be managed in the same way that the ezboard imports are.
              I wrote ImpEx.

              Blog | Me

              Comment


              • #9
                I'm guessing you meant "would not be released".

                Comment


                • #10
                  Yes, it would be under lock and key
                  I wrote ImpEx.

                  Blog | Me

                  Comment

                  widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                  Working...
                  X