Announcement

Collapse
No announcement yet.

How to block offline browsers A.K.A site rippers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jdelasko
    replied
    Originally posted by Darkblade View Post
    Thanks for the post!

    About the last line, do I have to replace "RewriteRule ^.* - [F,L]" with "RewriteRule /*$ http://english-111745432354.spampoison.com [L,R]"? I'm kinda confused here and wanted to make sure.

    You can use either line but NOT both. The two Rewrite rules just do something different.


    RewriteRule ^.* - [F,L]

    If used, will simply give an html error code 403 for every page that the site ripper requests. The site ripper may spend a little time on your site, but will give up after enough 403 errors.


    RewriteRule /*$ http://english-111745432354.spampoison.com [L,R]

    Does something a little more nasty. As soon as one of the site rippers shows up on your site, it will be redirected immediately to the url specified in this line. The particular url I use, *.spampoison.com, is a website that will recognize the visitor as a site ripper and provide it with and endless supply of dynamically generated pages that are full of dynamically generated, useless email addresses. The site ripper thinks it has struck gold, but will produce an email list that is useless and of no commercial value.

    This last Rewrite Rule is just a way of fighting back at these bandwidth thieves. You can use any url you want in this last rule. Perhaps there's a site that sends you tons of annoying email spam, that you'd like to redirect site rippers to.

    Leave a comment:


  • ChrisLM2001
    replied
    Originally posted by jdelasko View Post
    Any of these companies that say these tools are for legitimate purposes are just blowing smoke. For instance, what good would downloading an entire vbulletin site be to the average user?
    Yeah, the average user won't be downloading a whole site, but a designer might, if they're sick of a site with seizure ridden Flash ads, and a DTD and markup from the Stone Age.

    Somewhere out there is a program that allows redesigning sites based on your specifics (more so than what browsers allow with a simple stylesheet replacement) -- that proggie's name escapes me, but is also a good alternative.

    Use a site archiver on a couple of sites, mainly because the sites I feel won't be around much longer (this is especially true with videogame sites, that seem to vanish within 3 years after a game is released). They spider the site to your specifics (including deep linking), and if you want it to be an exact copy, or modified (like removing the ads).

    Leave a comment:


  • Darkblade
    replied
    Thanks for the post!

    About the last line, do I have to replace "RewriteRule ^.* - [F,L]" with "RewriteRule /*$ http://english-111745432354.spampoison.com [L,R]"? I'm kinda confused here and wanted to make sure.

    Leave a comment:


  • jdelasko
    replied
    Originally posted by noppid View Post
    jdelasko, actually, they get the site delieverd to them in static HTML and can publish it and it will work fine. It just won't be dynamic or update unless they pull the updates and add them too.

    The threat is real. I use geo IP and IP blocking of rouge datacenters to do the same. Since I have done this, the site much better off.

    Actually, they will download and store just about any file type.

    Leave a comment:


  • noppid
    replied
    jdelasko, actually, they get the site delieverd to them in static HTML and can publish it and it will work fine. It just won't be dynamic or update unless they pull the updates and add them too.

    The threat is real. I use geo IP and IP blocking of rouge datacenters to do the same. Since I have done this, the site much better off.

    Leave a comment:


  • jdelasko
    replied
    Originally posted by Reece^B View Post
    What's the point in site rippers, purposely to slow down your site or is it for archive purposes like waybackmachine.org
    Most of the software companies that provide offline browsers market them by saying something like "Download entire websites and view them at your convenience"

    The truth is, offline browsers are primarily used by hackers and spammers. These people use these tools to download every single file they can from your website in an attempt to gather any personal information they can or simply email addresses. Any of these companies that say these tools are for legitimate purposes are just blowing smoke. For instance, what good would downloading an entire vbulletin site be to the average user? The average user isn't going to set up a local php server to be able to view the site which is based on php and besides, without the sql data bases the fies themselves are useless. The main things site rippers are after include email addresses, photos and multimedia, or any other personal information they can get their hands on.
    Last edited by jdelasko; Sat 27th Oct '07, 8:02am.

    Leave a comment:


  • Reece^B
    replied
    What's the point in site rippers, purposely to slow down your site or is it for archive purposes like waybackmachine.org

    Leave a comment:


  • Dean C
    replied
    Nice post indeed, your rules could be optimized quite a lot though Also wouldn't it be nice if all these site rippers delivered a custom user-agent like the "honest" ones in your list

    Leave a comment:


  • devilsown
    replied
    Great post

    Leave a comment:


  • jdelasko
    started a topic How to block offline browsers A.K.A site rippers

    How to block offline browsers A.K.A site rippers

    There's a lot of offline browsers that will sit on your site and download every single file. You don't need one of these on your site sucking up your bandwith and slowing your site down for legitimate users. Here's how to deal with them:

    Put the following lines in a .htaccess file in your site root directory. It will give ANY IP address using one of these ofline browsers an error 403. Included below are most currently known offline browsers:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus
    RewriteRule ^.* - [F,L]



    Replace the last line above with this optional Rewrite Rule and any ofline browser will be redirected to the site you specify. This is handy if you simply want them off your site immediately. You can redirect them to that site that site that emails you all that presciption drug spam for instance:


    RewriteRule /*$ http://www.yourdomain.com [L,R]


    Just replace the 'yourdomain.com' with whatever web site you choose and the site ripper will be redirected there.

    Edit: in my original post, somehow a lone formatting tag, [/small], snuck in at the end of the above line of code. Sorry about that.... remove it or the code won't work... you'll get an internal server error. The code above is all correct and tested.

    Below is the RewriteRule I am currently using on my site:

    RewriteRule /*$ http://english-111745432354.spampoison.com [L,R]


    This rule redirects these bots to a site that will provide them with an endless supply of dynamically generated fake email addresses so that the user is provided with a gigantic collection of useless email addresses that has no commercial value. It's an excellent line to use if you want to waste a lot of their time.
    Last edited by jdelasko; Sat 27th Oct '07, 8:13am.

Related Topics

Collapse

Working...
X