Wednesday, February 8, 2012

The Internet Knows Where you Live

Let’s establish some scope here. The objective of this article is to help you prevent third parties from obtaining information about you using legal methods. Out of scope is protecting you against those using illegal means: phishing, viruses, worms, etc. Hopefully everyone knows how to protect themselves from the latter category: run anti-virus, don’t give out your password, update/patch all software, and drink plenty of fluids.

With the upcoming IPO of Facebook it seems that online privacy is the topic du jour. I’ve been hearing a lot of talk and some mischaracterizations in these news pieces. It’s occurred to me that a lot of what’s being discussed about online privacy is not widely understood by most of us. This being (the) one topic I’m somewhat qualified to speak on, I’d like to take a crack at helping those unfamiliar with this subject understand it a little better (and protect themselves in the meantime).

This is a huge topic, and I could easily write pages about each of the items I touch upon so distilling them into something brief yet understandable is difficult. Some of what I’m writing is a significant simplification, but done so as to make this more digestible for everyone. Additionally, I broke each section into “What it is” and “How to stop it” sections in case you want to skip to the important parts.

The way I see it, most of the information you’re hemorrhaging to third parties on the internet is coming from about five main sources:

Cookies 
What it is
Cookies are pieces of data that your browser sends to a web server whenever you request a page. They are specific to a web site, so any cookies you have for www.google.com will only be sent to pages in google.com. They’re a slight of hand to make the stateless internet feel stateful. It’s okay if that sounds like nonsense, the point is that its data your computer is sending to a website every time you type in a URL. It was told to store that data by some webpage within that site on a prior visit.

Here’s an example of a cookie:
  • www.SomeSite.com username=JSMITH 
Every time you request a page from www.SomeSite.com “username=JSMITH” is sent along with it. That’s really handy when you have a login form on your website and you don’t want people to have to keep typing in their user name.

Cookies are also exceptionally good at helping companies track your movement across the internet. Here’s how:
  • You type in a URL, let’s say it’s www.SomeSite.com
  • That website returns to your browser a webpage. It’s really only sending back text and information on how to display it (no images or videos usually). Within that text are the URLs to other resources you’ll need to view the web page (like pictures, videos, and scripts). Your browser will go out and download them all. Every time it requests one of those items, it’ll pass all the cookies it has for that website with its request. 
    • Pretend one of those resources is “www.MeanNastyWebTracker.com/SomeImage.gif” 
    • When you request that image, part of your request header tells the website you’re requesting the image from “www.SomeSite.com” 
    • Mean nasty web tracker notices you don’t have a cookie for their website, so they create a random identifier for you (say, 293), puts it in a cookie, and returns it to you along with the image you requested. 
                              www.MeanNastyWebTracker.com UserId=293
    • Now you visit another completely unrelated website. “www.SomeOtherSite.com” and the same process happens again. The page they return also refers to that image. Only this time when you request SomeImage.gif, MeanNastyWebTracker sees your ID (293) and now knows whoever 293 is has visited both these web pages. 
    Do that over and over again and MeanNastyWebTracker begins to learn a lot about you. In this example, both SomeSite and SomeOtherSite are in cahoots with MeanNastyWebTracker because they enjoy learning things about the people visiting their site. This becomes even more evil and pernicious when MeanNastyWebTracker knows that 293 is actually John Bumbletuck. Google and Facebook do exactly this. Facebook knows about every webpage you ever visited that has a “like” button on it.

    Something you may not realize, is that even if you don't have Facebook or haven't signed into it on your current computer/browser, Facebook still tracks you through these sites. Instead of associating the traffic to you, John Bumbletuck, they create a placeholder account and log the traffic to that account. I heard an NPR commentator refer to it as a "shadow account" which has just the right sinister fear-mongering tone I think.

    How to stop it
    Let’s start with your first line of protection: AdBlock. AdBlock is available for Chrome, Firefox and probably a bunch of other browsers you shouldn’t be using. Remember when we typed in the URL “www.SomeSite.com” and got back information with the text, layout information, and the list of other resources needed to display the page? AdBlock culls through that list of other resources first, and removes requests to advertiser content. No more ads on your pages and MeanNastyWebTracker never knows you visited SomeSite. AdBlock maintains a giant list of URLs of advertisers and trackers, so they know what resources to shoot down and which to allow.

    It’s also expandable, you can custom block certain sites if you’d like or use some others have made. Go into options for AdBlock, then to Filter Lists, and you can enter this URL:
    • http://www.squirrelconspiracy.net/abp/facebook-privacy-list.txt 
    This one stops Facebook from tracking you through websites with Facebook “Like” buttons (i.e. every news site on the planet). You can still use Facebook, but Facebook won't know about anything you do outside their site.

    Most modern browsers have a second line of defense as well: private mode (Firefox)/incognito mode (Chrome). Among other things, this tells the browser not to persist these cookies. It’s a big hammer because you lose the good stuff cookies do (like remember passwords) but a lot of the time you don’t need this. If you’re planning on researching divorce procedures and don’t want your credit to suffer, this will guarantee privacy. Google and Facebook will see you doing stuff, but without those cookies they won’t know you’re the same person.

    Network Sniffing
    What it is
    Network sniffing is using tools to inspect web traffic. Google and Facebook wouldn’t do this, but your employer or ISP (Internet Service Provider) probably does. Let’s introduce an inappropriate metaphor: if the internet is the US postal service, your unencrypted HTTP web traffic is a post card. Anyone who handles your postcard can read its contents easily (like your ISP or the Gateway/Proxy Server at your office). 



    How to stop it
    When you request web pages using SSL (Secure Socket Layer) it becomes more difficult (but not impossible) for anyone handling the message to actually read it. If we’re extending our USPS metaphor we’ll say that your message is now in an envelope. You know you’re using SSL because the URL will be suffixed with “HTTPS” and usually they’ll be a lock icon on your browser somewhere. The only information those handling the SSL messages can easily see is the address of the recipient.

    There are ways to view encrypted web traffic, but they require some money and effort (some ISPs do them already). See Deep Packet Inspection for more info. For those of you with tinfoil hats, there are tools that can thwart even these tools, like VPN tunneling. This is basically an arms race, and you have to decide if not letting your ISP know you’re Googling Bronie literature is really worth it.

    Extra reading: SSL does more than just encryption, it also verifies a site is who it says it is, preventing a “man in the middle” attack. It also uses an encryption technique called asymmetric encryption which is very interesting.

    So now you’re saying “SSL sounds great, what am I supposed to do with this? I want actionable information!” The solution is to use SSL as much as you can, whenever you can. Thankfully Google is now SSL by default for most people (you should make sure by looking for that “lock” and HTTPS prefix). It becomes very difficult for your employer or ISP to tell what you’re Googling when you use SSL, and exceptionally easy when you don’t. Many public libraries and schools block access to Google over SSL precisely because it is so difficult to censor content delivered securely.

    For those users using Firefox you have access to one of the coolest browser plugins available, HTTPS Everywhere. Basically, this notices whenever you’re about to request an unsecured page and checks to see if it knows about a secured version of that page. For example, all requests to http://www.wikipedia.org/ are rewritten as https://www.wikipedia.org/. It makes sure you’re always using the secure versions of many popular websites.

    Extra reading: The organization which makes HTTPS Everywhere is the Electronic Frontier Foundation. They’re a non-profit trying to (among other things) protect you from all these tracking techniques I’ve been talking about. Go to their website and learn about how they’ve already protected you. Give them lots money.

    IP Address Locations
    What it is
    Ever been to a website and noticed an advertisement which seemed to know where you were (approximately)? This bit of magic is achieved via your IP address. A(nother) quick lesson in how “teh intenetz” works:
    • You type in www.SomeSite.com into your browser. The host “www.somesite.com” is translated into an IP address which is basically a unique identifier for a networked computer (in this case, the server for SomeSite).
    • Your request is sent to that IP address.
    • That server now needs to reply! Luckily in you sent your own IP address with the request so this server can reply back to you. 
    Turns out those IP addresses are not completely arbitrary or untraceable. Translating those IP addresses into cities is pretty trivial. Don’t believe me? Go here and scroll down to see the city you’re in right now. The good news is that by using proxy services and anonymizers they’re pretty easy to trick. Of all the tracking techniques I’m outlining this is probably the least evil; most of the time this information is used to localize ads. Some online banking sites use this as a part of fraud detection schemes (you’re not logging in from Serbia?). If you’ve ever tried to stream video from television network sites from outside the United States you’ve likely encountered an issue. The server for Hulu saw that you had an IP address outside the United States and put the kibosh to your plans.

    Extra Reading: That page I linked above did something else a little deceptive you may or may not have noticed. It knew other stuff about you too, like the type of computer you were on and your browser. Although the page is named “IPAddressLocation.org” it didn’t get any of that other information from your IP address. When your browser sent the request for that web page it also sent some information about itself in the header of the message in a location known as User Agent (the browser is an agent of you, the user). It sent information about the browser and computer so the remote server would know what features your computer/browser supported. There are privacy implications to this, but since it’s difficult to track people based on this info we’ll call it out of scope. 

    How to stop it
    If you’re uncomfortable with those websites knowing where you are (approximately) or merely looking for singles in a different area, they’re not too hard to trick, the services which help circumvent these are usually referred to as anonymous proxies. They receive all your internet traffic and re transmit it so it appears as though the traffic is coming from them. These services have legitimate uses, for one they can help people circumvent government restrictions on the internet (I've heard about Iranian bloggers using these services).

    In the interest of full disclosure, modern browsers also have functionality to allow websites to know where you are more precisely (via HTML5) but the browser will usually ask you before it will disclose that information (so far I’ve only seen it used on mobile browsers). Visit Google on a mobile phone and you’ll likely get one of these prompts.

    User Names
    What it is
    Many people share a common user name across many services, and usually a quick Googling of that handle will yield all sorts of interesting information about a person. The more unique your handle, the more effective this technique is. E.g. not too long ago the public execution squad known as Reddit (deservedly) got its claws into to some marketing guy who sent flame e-mails to a customer. The details of the incident are not important, but if you’re curious Google “Ocean Marketing.” Armed with nothing but Google and a user name Reddit user used the technique I just mentioned and discovered that the marketing guy had posted on a forum for users of anabolic steroids. 

    How to stop it
    I guess the moral of the story here is if you’re going to create an identity somewhere that you don’t want connected to your other online identities, use a different handle.

    Yourself
    What it is
    You feel compelled to disclose your entire life on social networking sites. 

    How to stop it
    Stop posting on Facebook so much.

    Those are the big ones; there are scores of others I didn’t cover:
    • Opting out of marketing with service providers (to opt out of allowing Google to share your info, you can go here).
    • Actually reading privacy policies. This is probably a good thing, but since I haven’t passed the bar most of the time these don’t help me.
    • How cell phone carriers gather information about you. Sometimes they take it without telling, as is the case with smart phones running CarrierIQ and sometimes they hide opt-out procedures. Look around.
    Let me know in the comments if you think of any others.

    2 comments:

    1. There is a plugin for Chrome called 'Disconnect' which purports to block the tracking technology used by Facebook, Google, Digg, Twitter, and Yahoo.

      I don't know enough about it to know whether or how it works; I mainly use it because it displays the number of requests it blocks on the page you're visiting, which gives me a nice 'fight the power' vibe.

      ReplyDelete
      Replies
      1. Thanks, I just installed it. It catches the google cookies that I don't have a rule for in AdBlock. It brags about the cookies it blocks though, AdBlock is a little more humble.

        Delete