Announcement

Collapse
No announcement yet.

Need an angry mob... I found a bandwidth leecher

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shining Arcanine
    replied
    That is the only one that I know of which calls robots.txt to spider pages it should not know about.

    Leave a comment:


  • Zia
    replied
    Shining Arcanine...
    thnx man...we have it ban it..

    do u have any more ?

    Leave a comment:


  • Shining Arcanine
    replied
    Those abide by robots.txt though. This does not only refuses to abide by it, but it exploits it to gain access to pages that it would not have otherwise known about.

    Leave a comment:


  • Joe Gronlund
    replied
    I get alot of MSN and Yahoo bots, i dont mind them at all..

    Leave a comment:


  • Shining Arcanine
    replied
    The strangest I have seen so far is "Syntryx ANT Scout Chassis Pheromone." This particular bot is strange not because it has a user agent, but because the user agent was sent as the referrer.

    I wonder who write these spiders since they tend to make queer mistakes.

    Leave a comment:


  • AWS
    replied
    I ban any bot that doesn't respect robots.txt. Most, if not all, bots that don't respect it are spam bots. I wrote the scripts about a year ago and so far I have over 500 bots banned. You will also find bots with useragent mispelled. I have seen Widows NT, Mozzilla and the like. I have seen Java, Python and the perl one like you found although I haven't seen that one you listed.

    Leave a comment:


  • Need an angry mob... I found a bandwidth leecher

    I recently wrote an anti-email harvester script and installed it on my website. It detects email harvesters through a hidden link that is banned in .htaccess.

    I have logged and banned about a dozen email harvesters in the week that it has been online. The most interesting entry in the log is the following:

    IP Address: 216.179.125.69
    User Agent: WebVulnCrawl.blogspot.com/1.0 libwww-perl/5.803
    Referrer:
    Timestamp: 1141421105
    I was surprised that there was an authentic spider user agent (all of the others were spoofs of browsers, almost all being IE. I went to the blog mentioned in the User Agent and I discovered that the guy was intentionally browsing files listed in robots.txt! I immediately thought of requesting an angry mob.

    Anyway, anyone running a website might want to block his IP address, and perhaps contact him with your feelings on bandwidth leeching.
    Last edited by Shining Arcanine; Sat 4 Mar '06, 3:26pm. Reason: Corrected language usage to be more clear
widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
Working...
X