Dienstag, 10. September 2013

Countering the surveillance

The extent of the recently exposed world wide surveillance by the USA and the UK --- and probably by all the other nations as well --- is not very surprising to those who do have a certain technical knowledge. But it seemed to have surprised those who lack this knowledge. And to be honest, the extent to which we are all snooped on did surprise me somewhat.

OK, there we are now, we're watched for, we're terrorists until proven otherwise. What can we do about that?

Image of a Cataract by Rakesh Ahuja, MD, CC BY SA


I'll first cover the necessities to be successful with snooping, then I'll go into what we could do to mitigate the snooping power of big multi-billion-dollar agencies like the NSA or the GCHQ.
Anyone and any agency who/which snoops relies on basic properties of the data and assumptions about the data. These basic properties and assumptions are:
  • Technical assumptions
    • the data can be retrieved
    • the data can be read and its information can be extracted
    • the signal in the data can be separated from noise
    • all this can be done with the available capacities 
  • Legal properties
    • either there exists the legal permission to do the snooping
    • or there is no entity which has the power to check and enforce the compliance 

Data retrieval

Data retrieval seems to be well under control by NSA and GCHQ and their befriended intelligence agencies. Data is gathered directly from the providers and from submarine cables as well as from satellites. This should cover most of the data especially if the most important hubs are controlled. 

Reading and extracting the information

The requirements to be able do this are, that data (or at least parts of it, such as metadata) are not encrypted or can be decrypted. Nowadays most of the data is not encrypted and the the few data which is encrypted might potentially be decrypted by NSA etc. (nobody knows if there exist non-published attack vectors against the usual encryption techniques and tools). We can assume here, that except in very special circumstances where persons want to explicitly keep information secret the intelligence agencies can read the data and gather the information.

Separation of signal from noise

What is the signal?

Of all the data which is collected it is the signal which is of major interest. And typically each piece of signal is hidden within vast amounts of noise. The first question to ask is "what is the signal?". Whilst in official statements it is always insisted on the signal being terrorists, there is very much reason to doubt this. Why that? Because the first thing in data analysis is to search for signal in data sets where it is likely to find signal in. But most of the data sets in which intelligence acencies are snooping is citizens' and companies interactions, countries which are "friends" and "allies" (I put these words in parenthesis, because friends and allies would normally not be spied on), politicians of trade "partners", the UN, the EU. These are all places where it is highly unlikely to find terrorists. That leads to the question of why are the mentioned data sources used? Well, governments and institutions like the UN and the EU are targeted most likely to commit industrial espionage, and to have a leading edge in negotiations of treaties like the currently negotiated ones (TPP etc.). The citizens are snooped on to discover people with dissenting opinions. If within all the data and analysis some terrorists are found then be it, but I'd reckon, that this happens mostly by accident and is not at all the top priority.

What is the difference between signal and noise?

Once it has been established what the signal is, the data analyst can go on searching for the differences between signal and noise. Let's for a moment assume, that signal is "dissenting voices". If all those people would use a site like dissentingvoices.com (I made this up) for chatting, emailing, videoconferencing etc., then it would be easy as long the site is publicly accessible. Just take all these people and snoop on them. Then crack down on these people and you've eliminated the opposition. But real life is not that easy. Services like facebook and google are used by all types of people. Within them some which might have opposing views. All together the separation of signal and noise will probably be based on what sites you've visited, what comments you've written, what sites you're looking at, what sites you are posting and what your friends are doing. If some of your friends are visiting football sites frequently, you might be into football as well. If friends send around a party invitation, you might go to this party as well. If some of your friends oppose fracking and pipelines, you might oppose fracking and pipelines as well. If you are, then you are signal. If you actually are not opposed fracking and pipelines, you are noise from the point of view of data analysis. You might be flagged as signal, but actually you are not. You are a false positive. Obviously there are true positives (opposition in our example), then there are true negatives (people you've identified as not dissenting and who are wholeheartly in favor of fracking and pipelines) but there are as well false positives (people who your data analysis would put into the "opposition" bin, but which are not there) and false negatives (people who oppose these things, but which your algorithm didn't catch). And often there's not just true and false, but there will be a lot of gray area. You might be against fracking, but only if it is near your house.

Signal efficiency and purity

If you'd like to catch all the signal you just say, that every communication is "signal" and you, for sure, get all the signal. But you're swamped with data which you --- even with big data centers --- cannot dig through and you certainly cannot follow up on all the data because there you are limited by manpower. I presume, that take all the data they can work on with their available capacity and try to get this data as pure as possible. Still, the NSA (and their befriended snooping services) will get false positives (communications flagged as suspicious, but which in reality isn't) and false negatives (communications which are flagged as OK, but which should be signal).

What can be done to counter snooping


Encrypting all the signal (chat, email, web-surfing, voice, etc.) by anyone would an obvious response that worked. Encryption can be done on the service provider level and/or on a personal level.

    Encryption on the provider level

Encryption on the provider level can be organized to be reasonably convenient for the user, just as more secure authentication methods like two way authentic authentication are not at all difficult to use and are barely noticed once set up. Encryption on a provider level is for sure a good thing, but whilst it helps against petty criminals it has been shown that it doesn't help against government backed snooping. It has made their quest for snooping more difficult, but it certainly hasn't stopped it. The service providers are either bought, coerced or forced into cooperating with the intelligence agencies. With the encryption happening on the providers' side all the data will find its way to the snooping agencies.

    Encryption on the personal level

Encryption on a personal level is less convenient. A major hurdle for encryption is the adaption rate. As long as only the a couple of geeks use encryption it is practically useless except to keep very specific data secret. To any person without public key one cannot send encrypted emails. That's it with encryption to spoil surveillance. But *if* we all did use encryption and *if* the NSA could decrypt at least some encryption techniques (which is probable), their computers still would have to work more on each message. Working more means more computing time spent and the results would be obtained more slowly. This is like hitting the breaks of the NSA. Yes, they wouldn't be stopped completely, but they could not analyse so much data. 

Be noisy

The snooping analysis can be screwed up by adding noise to the data. This makes distinguishing signal from background harder. More data has to be sifted through and less of the valuable data is found.

Adding noise means transforming ordinary messages from ones which are analyzed thoroughly by the NSA into messages which have to be analyzed, thus "stealing" the NSAs computing time. An easy way is to just add a couple of keywords to each email, maybe just put it into the signature. Something like

"Dear NSA, This email is important! That's why I want you to read this carefully:
Exposure to so much knowledge over social media is infectious–one of life's great joys. I am so enriched."
This page helps you to pick nice phrases: http://nsa.motherboard.tv/

Adding noise means adding random connections and communications. Imagine for every email you'd send another email with a suspicious message (containing probable NSA keywords) to a random email-address around the globe. The NSA would have to dig through double the amount of emails and they would have to filter out all these messages. They couldn't just throw them out, because there would be all the keywords in there which they are searching for. They would have to add and analyze tons of new connections between people which do not have any real connection.

It would even better to add non-random noise. If all the messages would have senders and receivers which would follow a bigger pattern (i.e. look like a network of people exchanging suspicious messages) it would be even more difficult for the NSA to filter that out and not take it for the real thing.

Of course nobody will do the hassle and send random emails to random people. But imagine if there were a computer virus which did that. Instead of sending spam advertising for crappy products the virus would send suspicious emails from imaginary people to imaginary people.

No email left behind

My emails are important. Period. My fear is, that the importance of my emails (e.g. "Hey Tom, how about cinema tonight?") is underrated by the NSA and thus this email is thrown out instantly and never gets to see a decent data analyst. I think this is deeply unfair. Maybe the importance could be enhanced (further to adding keywords) by sending the email directly (BCC or CC) to the indendet recipients at the NSA or the politicians which are in favor of snooping. I mean, if they wouldn't want to read my emails, they would oppose total surveillance. Hence, they want to read the emails and that's they should read my emails. The nice side effect of this is, that if I were a bad person (translation: identified target of the NSA because of some reason) I'd add a connection to the politician or NSA worker. If the NSA goes two to three layers of separation deep, they would find this person now already in the first layer. Great! 
Politicians probably get a lot of email and therefore the politicians are probably soon taken out of the equation. The same is true for the NSA boss. But NSA employees (and GCHQ employees or those of any other of the implicated snooping agencies) would add a nice angle into the agency itself. And it would be very justified, because my emails should be seen by a human data analyst. It's disrespectful if my emails are seen by software alone. 

"Encrypt" the data for computers, not for people

The NSA can filter emails and messages if they are easily readable by a computer. If there were a plugin for the email program which would transform the text you just wrote into a jpeg-image with a nice flower background, your recipient still could read the email. But the NSA could not. A single analyst could, and they could employ OCR programs to read the text, but they can't do that for all the emails because it would cost them too much computing time. The drawback would be, that you couldn't search in your emails for text and your email program couldn't do intelligent filtering by the message. That makes emails less convenient. But on the other hand, your less snooped on.

Use surveillance against politicians

What if all the surveillance capacity would be used to track politicians and discover corrupt behavior. If we'd track where the 1000 most important politicians of a country, whom they are talking to, what they are talking, what emails they are writing, what websites they are watching, whom they are talking by telephone, etc. Well then, we could ensure that none of their behavior is related to corruption. It is certainly easier and cheaper to watch the steps of politicians than to watch the steps of all the people of the world. And I presume, that it would be much more efficient in order to maintain freedom of the people and democracy. We could even include the 1000 most wealthy persons of a country into the mandatory surveillance list.

If such laws would be enacted, imagine how fast politicians would work on crippling the snooping capability of the NSA and other intelligence agencies. I predict that within four weeks, there would be laws and an efficient oversight entity which would limit the NSAs reach and data retention.

You are very welcome to leave your comments! Let me know what you think.

Creative Commons License
Countering the surveillance by Peter Speckmayer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.