No announcement yet.

Need an angry mob... I found a bandwidth leecher

  • Filter
  • Time
  • Show
Clear All
new posts

  • Need an angry mob... I found a bandwidth leecher

    I recently wrote an anti-email harvester script and installed it on my website. It detects email harvesters through a hidden link that is banned in .htaccess.

    I have logged and banned about a dozen email harvesters in the week that it has been online. The most interesting entry in the log is the following:

    IP Address:
    User Agent: libwww-perl/5.803
    Timestamp: 1141421105
    I was surprised that there was an authentic spider user agent (all of the others were spoofs of browsers, almost all being IE. I went to the blog mentioned in the User Agent and I discovered that the guy was intentionally browsing files listed in robots.txt! I immediately thought of requesting an angry mob.

    Anyway, anyone running a website might want to block his IP address, and perhaps contact him with your feelings on bandwidth leeching.
    Last edited by Shining Arcanine; Sat 4 Mar '06, 3:26pm. Reason: Corrected language usage to be more clear

  • #2
    I ban any bot that doesn't respect robots.txt. Most, if not all, bots that don't respect it are spam bots. I wrote the scripts about a year ago and so far I have over 500 bots banned. You will also find bots with useragent mispelled. I have seen Widows NT, Mozzilla and the like. I have seen Java, Python and the perl one like you found although I haven't seen that one you listed.
    Admins Zone - Resources for Forum Administrators


    • #3
      The strangest I have seen so far is "Syntryx ANT Scout Chassis Pheromone." This particular bot is strange not because it has a user agent, but because the user agent was sent as the referrer.

      I wonder who write these spiders since they tend to make queer mistakes.


      • #4
        I get alot of MSN and Yahoo bots, i dont mind them at all..
        MCSE, MVP, CCIE
        Microsoft Beta Team


        • #5
          Those abide by robots.txt though. This does not only refuses to abide by it, but it exploits it to gain access to pages that it would not have otherwise known about.


          • #6
            Shining Arcanine...
            thnx man...we have it ban it..

            do u have any more ?


            • #7
              That is the only one that I know of which calls robots.txt to spider pages it should not know about.


              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.