Search blog.co.uk

Keeping Phorm from spying on your web site's visitors

by The_Walrus @ 05 May. 2008 - 22:39:09

Now that is well worth doing, if you have any respect for the people who look at your web site. I'm assuming you have a site that is not designed to rip people off, ram advertising at them, etc.

It says in this excellent document...

39. When a website is first visited (by any ISP customer) the pages are not inspected. Instead, a request is queued to fetch the site's "robots.txt" file; viz: a file maintained by the website owner which tells web crawlers and other automated systems which parts of the website should not be indexed or processed.

40. Once the robots.txt file (if any) has been fetched, it will be cached. The cache retention period will be value set by the website using standard HTTP cache-control mechanisms, or for one month if no period is specified. The minimum period that the file will be cached for is two hours.

41. The robots.txt file will be inspected and URLs that fall within forbidden areas of the website will not be processed by the Phorm system.

42. This mechanism, which will permit website owners to opt their pages out of the Phorm system, does not seem to have been previously described in any of Phorm's documentation. They were unable to provide an explanation as to why this had not previously been disclosed.

It is pretty damned clear why they kept quiet about it. I shall be fixing my web site so people who view it are not spied on.

Another thing that has not been mentioned, and you can see why, is that if anyone thinks BT won't also pass all this data to the Home Office, MI5, MI6 and the CIA, they are fools. That is the most likely reason the Home Office refuses to prevent this from being done to us.

This man wants to spy on you.


 
 

Trackback address for this post:

authimage

Comments, Trackbacks: Hide subcomments

Useful data - I have taken note. :)

loiswakemanloiswakeman [Member]
http://lois.co.uk
2008-05-06 @ 11:18

How did you find out the user agent that Phorm uses? I don't want to disallow everything for everyone.

Other ways to keep them out - reading that very useful paper for which thanks - would be to change the port (not practicable) or change the MIME type of your pages to application/xhtml+xml (could be done but might cause problems for older browsers?), or spoofing your browser to - for example, Lynx or Konqueror - easily done, but might degrade the experience for sites that do browser sniffing - though few probably check for obscure browsers, so the result depends on the fall-back.

The_WalrusThe_Walrus [Member]
http://www.doctor-dark.co.uk
2008-05-06 @ 14:14

Errm, have I found out the "user agent"? I'm not that web-techy. I do Unix, C, AWK, and dear old Z80 assembler, when I can remember who I am.

I was thinking it might be good to start using cookies, if I could write some Javascript that would spot Phorm had piggybacked itself onto them, and just serve a warning page.

But the simple truth has to be that if BT go ahead with this crime, they can do without our money, for I will shift to an honest ISP. I don't care if I have to go all the way to a two way satellite link, I will not let them do this to me.

loiswakemanloiswakeman [Member]
http://lois.co.uk
2008-05-06 @ 14:43

Sorry old bean: I assumed as you were spouting knowledgeably about robots.txt, you were going to use that route to block access to the Phorm robot if it came calling!

I agree that this whole thing is utterly shabby and reprehensible.

The_WalrusThe_Walrus [Member]
http://www.doctor-dark.co.uk
2008-05-06 @ 14:50

I have a vague idea, gained yesterday, of what robots.txt does, and intend to use it for the benefit of visitors to my site, many of whom will not want Phorm passing on information to anyone about their interest in this kind of equipment, and the songs that go with it...

A musical instrument, obviously.

Leave a comment :

Your email address will not be displayed on this site.
Your URL will be displayed.
Allowed XHTML tags: <!, p, ul, ol, li, dl, dt, dd, address, blockquote, ins, del, a, span, bdo, br, em, strong, dfn, code, samp, kdb, var, cite, abbr, acronym, q, sub, sup, tt, i, b, big, small, img>
URLs, email, AIM and ICQs will be converted automatically.
Options:
 
(Line breaks become <br />)
(Set cookies for name, email & url)
Validation code:
Please enter the above code here:
For protection from spambots (case-sensitive).