With all of the hacks in the government and corporations we as a society need some defensive techniques in preserving our data , especially when its used to create AI. We all know today’s technology knows a lot of information about us, and maybe more information than we know about ourselves. And if you want to retain your free-will and not let tech companies (and non-corporate actors) manipulate you into being “good customers” ( or any other “good role” that they see fit for you) then you have to learn how their what information you actually are giving up, AI learns, how they may implement influence campaigns, and also how your network maybe used against you. Here we will go over techniques for reducing the information they can learn, understanding the potential flow of your data amongst these organizations, and attractive attack angles for manipulation campaigns.
Since we’re going to talk a lot about data, what is data? Data is, according to dictionary.com, ‘individual facts, statistics, or items of information’. And information is ‘knowledge gained through study, communication, research, instruction, etc.’. For clarity here we will use data to mean digital representation of events and attributes. Information will be used to mean useful knowledge that may have been derived from data. So, what comes to mind when you think about your data?
Some sources of data/information you are creating (whether you realize it or not) is location data that can be extracted from your cell tower connections (and thus includes things like who you hangout with what stores you frequent, etc), attention span (what topics you regularly comment on or how long you watch a video), areas of interest or lack of knowledge (google searches give these sorts of insights as well as what tv shows you watch), user behavior, political leanings (from your liked posts or talk shows you listen to), your mood (posts to media, which posts your liking, if you don’t have a webcam cover your face maybe scanned to detect your mood as well), which friends and family you are close to or not close to (social media posts and likes have shown to give these types of insight). That’s just to name a few. With such sensitive information hidden in your data we must be concerned with how its being protected and used.
Since data and technology is being used for so many different reasons and in so many different contexts you might want to consider implementing these techniques in all of your technological interactions. Obviously it’s impractical to take into account all the relevant considerations listed here each and every second you interact with any technology. Use this information to create engagement strategies with the devices and software you use. That is, think about how you could alter how your interactions with the devices and software you use on a regular basis to preserve your privacy and cognitive autonomy. You use each software for different purposes and each collects different data for different purposes so you have to have a unique approach for each. Think of it as creating your own ‘algorithm’ based off of the information you gain here to craft how you use the software for your needs in a way that reduces ‘off-the-cuff’ information sharing. For example, the #mood popped up and I just had to join in and document all the different types of moods I do and don’t experience and their contexts. A mindful engagement strategy of not revealing emotional triggers may have persuaded you to not participate in that hashtag.
Who Wants Your Data And Why?
When it comes to manipulation and invasion of privacy the first question we must ask is what type of info do you think is likely shared about you? Data has value but certain type of data is more valuable than other types of data. So take a moment to think to yourself what organizations (companies, employers, governments, or even criminal syndicates) are interested in learning about you and what would they like to learn, then think about how the technology you engage with may have data points related to that information.
Now that you’ve had time to come up with a few options on your own (or with a group) let’s discuss a few examples. Some organizations that might be interested in your information are insurance companies, data-marts (we’ll discuss data-marts later), ISPs(internet service providers), retailers, social media and political campaigns. These organizations often times collect data from you and your devices directly or indirectly through data-marts. Data-marts collect data on individuals all over the internet to sell it to interested parties. Many times this data can be public data you posted, data bought from organizations, data that can be extracted from tech such as IP address or cell towers, and information that is inferred (how they infer information will be talked about in the ”Giving More Information Than You Thought” section). But then that begs the question why may they or you share your data?
They may want your information to sell you ads, tailor insurance costs to your lifestyle or even just to better understand their customer base and how to tailor their services to them. If you really are as ‘trendy’ and ahead of the curve as you think you are, investment firms may want your data to predict stock market trends. Governments may want your information to determine how good of a citizen they may consider you to be, if you have any connections to people they consider to be enemies. Also organizations may use your information when looking into hiring to determine if you fit the profile for high performers.
So what can these organizations do with all this information? This information can be used by employers to determine if they’ll hire you, law enforcement for investigations or threat scores, an even insurance quotes. This in combination with the software on your devices can lead to even more manipulative techniques like:
The Circle of Data
How does the data flow between organizations to do such automated manipulations and information extraction? Many organizations may scan the internet themselves for public data, as well as buying from data marts or giving your devices ‘cookies’ when you visit their site to track you across the internet and devices. Your personal tech can be hacked and have a hot mic or active camera recording information. When companies have their (and our) data hacked this information can end up in the underground market and thus bought by criminal organizations or possibly make its way to other legitimate organizations through reselling. If the two organizations that wish to swap data are complimentary in nature (such as an advertiser and a retailer) they may simple ‘share’ data as ‘third-parties’ or platforms like twitter may have a program already setup (called API) to allow people to connect to their databases (which for this twitter example is why it’s important to know your privacy settings).
Throughout the internet surfing, posting, or searching process your device connects with multiple organizations. Such as the operating system of the device that may send data to the parent company, the local cell tower that maybe compromised by law enforcement or criminals, the ISP (such as at&t or comcast) the connects you with the site, the DNR which holds the site and finally the organization that owns that site. These organizations could have a copy of the information sent over the internet connection (especially if its unencrypted). Even if you have multiple online personas, there are efforts underway to coalesce those personas into one identity. This goes to show you not only have to be aware of what you post but what other devices, email accounts and locations are connected to what your posting.
Decoupling data can be a difficult task. However if you feel it is important enough (for instance maybe you work at google and don’t want your employers being able to know all the controversial views you have or religious beliefs), using different devices for different internet activity at different locations ( but maybe at the same time) with different IDs and passwords are essential as well as making sure the devices do not connect to each other or to the same third party via text, email, wi-fi, cell tower, facebook account, etc. This way the data generated during your internet activities are separated and siloed (insulated from connections) to the various sites and companies used, thus decreasing the likely hood of that data being connected in any meaningful way to your other data. Even subscribing to the same sites can give away your identity if they are very unique sites or you subscribe to so many sites that they effectively make up a digital fingerprint. After all everyone has to do brand management and some personal data may not be appropriate for employers, retailers, insurance companies, etc.
Giving More Information Than You Thought
As you can tell some of the information on you isn’t explicitly given but it is inferred, which is how they are using the data they do have on you. When they are inferring information about you they use statistics, machine learning techniques and other data that has been gathered.
Examples of other data being used to infer information about you is store locations. If I have your location via cell towers and see that your at the same location as Barnes&Noble (Barnes&Noble’s location is the ‘other data’) every Sunday afternoon, then I can infer that you go to Barnes&Noble to read every Sunday. Or if your there for 4 hour or more every few days I may infer you work there.
However when it comes to statistical inference (which includes machine learning) the exact explanation can be quiet complex so for simplicity sake I’ll explain it as follows:
Analysts and machines learn about you through your data by looking at the frequency or presences of events or qualities under various circumstances/contexts that they are testing for (ie. do you read faster with political articles or fashion articles, do people similar to you like Instagram posts with cats or dogs). The variety of data gives them more circumstances/contexts to understand your frequencies and attributes.
Some of the ways these programs can infer information about you is through finding others who behave like you or have similar characteristics but have more information available about the more private aspects of their life. It’s basically the ‘birds of a feather’ idea programmed into computers using statistics.
Also by looking at your behavior (if there is enough of it) across different contexts such as locations, times, topics, etc. they can figure out what are influential factors on the behavior in question. This is why having a variety of data and factors is so important for AI. After all if I type ‘1’ fifty times and send that to an analyst there’s not much they can lean from those fifty ones.
Another way they infer information about is determining what hidden characteristic best explains your behavior. A classic example is if we know you eat ice cream pretty regularly, one can assume you eat less ice cream when its cold out side and if its cold one day then its more likely to be cold the next day. Therefore any time there are multiple days you didn’t eat ice cream consecutively we can will infer those days were cold days. The temperature of the day acts as a hidden signal in your data. And like this example learns information about the unknown temperature of the day through other data, they will use your data to infer more hidden signals or attributes about you.
Tech To Protect
With all these methods of gaining more information about you, sometimes even more than you explicitly gave, how do you try to protect it? The most simplistic and impractical way is to simply stop using any technology that has a cpu, memory or connectors to other devices, AND don’t communicate with or be around people who do. Again this is the most impractical option. The more practical options range from technological evasion to augmenting how you interact with such devices and software.
There is some technology you can use to help protect your information is encryption. Encryption translates your data into unreadable gibberish based off of the password you provide the program. Once the data is sent to your intended audience they can use a password you provided them to translate the gibberish back into actual readable information. This is definitely useful when passing information directly between peers. However your location data or website traffic can not be encrypted from your ISP or the site your visiting, unless you use VPNs (virtual private network).
VPNs are a set of computers that pass along your computer’s requests to the internet as if it were their own. This causes the site you are visiting, say amazon, to believe another computer (an thus another user) is accessing their site, unless you sign into your account. VPNs are also useful because it can allow you to access internet sites in foreign countries like Brazil or India to see what media is showcased to their citizens. However neither of these techniques will prevent instagram from inferring you follow self-helpers because you want to kick your drug habit.
For protecting your privacy from being inferred based on the data generated while using various software and devices, you have to be more mindful in what you post and how you use such technology. There are also some tactics as well to use once your aware data is about to be generated that can be connected to private information. Being aware of what private information is or maybe connected to the data that’s being generated is the first step. For instance if you usually go out everyday and especially on the weekends and suddenly your device location is at home most of everyday for a week its easy to infer that you are sick or depressed. Or if you normally like spiritual posts but never like explicitly christian posts then it can be inferred your not christian, or if you comment on every video you see but don’t comment on videos on videos about racial hate crimes it can be inferred that your sympathetic to those crimes.
Mindful Tech Engagements
Some things to be mindful of is the amount of interaction you have with any technology. This includes not only how long, how often, and where you use the technology but how immersive is the interaction and how diverse is the content your interacting with. Are you interacting with content ranging from philosophy, stocks, food, politics, to religion, sensuality, art, or music. That means there is a lot of contexts for the technology owner to infer information about you and your ideas and beliefs.
Then in particular be mindful with how immersive the technology is. Are you simply pressing a button only when a video ends or are you continuously moving the mouse, clicking, typing, looking (in the case of VR or eye tracking) and speaking. These highly immersive interactions are particularly sensitive as this is the environment psychologist use in their experiments to test cognition, reactions to certain material, and mental associations (just to name a few).
Another thing to be mindful of is what private information the technology owner would like to collect and can, likely is, or are actually collecting. Particularly understanding the type of data that is capable of being collected is needed. Like gyroscopes not only saying where you are in your home but direction and angle your at. Most people don't realize some touch screens can detect how conductive your skin is and skin conductivity can indicate if you are under stress. Or even face tracking technology can detect what part of the screen your looking at. That’s just the explicit sensory data that is being collected. Then you should also be aware of what private information the technology owner (or their partners) would like to infer from that data for their purposes (profit, influence, retweets, etc).
Having these understandings of what data maybe collected or information could be attempted to be inferred, and how it can be collected will give you the information needed to determine how and when to implement the following techniques to reduce the invasion of your privacy.
Altering Data For Preserving Privacy
One common method to preserve privacy is aggregation. In the data world that would be only releasing summaries of the data instead of each individual data point. From a user perspective this could mean things like posting your thoughts on a topic (say if your on twitter) at certain predetermined times. For instance once when the topic is first brought up and once when the topic seems to be going away. This way you avoid tweeting every few minutes giving away your thoughts on how the slightly different forms of the topic would effect your opinion and reacting to every tweet about the topic.
Another technique is to either not give any information at all or only give one consistent response. As mentioned earlier not giving any information can be at times impractical, however giving a consistent response can be useful in certain contexts. An example of not giving any information is pandora, instead of thumbing up and down every song just have different playlists so that way you don’t have to train their algorithms as much but yet you still get a diverse set of songs. An example of giving one consistent response is having a default choice in choose your own adventure games and movies. Like with Netflix’s Bandersnatch movie you could have only choose the left option. Obviously not the funnest way to experience the movie, but again these are options.
A more sophisticated and risky technique is giving random responses, particularly when you know the response is valuable in understanding the rest of the data. This obviously requires insight into what is being inferred and what data is being used to be inferred. Also its risky because there are ways to determine false data points that don’t fit with the rest of the data.
The last method is based on how many ways you can interact with the device or software. That is submitting any information in a form that is more costly to analyze and store or in a form that holds less information. Numeric data is the easiest for computers to analyze next to text then audio and then pictures and videos. That means if you can decide which method to use to interact with a device or software pictures and videos will cause the organization analyzing your data to use more hardware for storing that data and more hardware for analyzing. However with that being said pictures and videos also contain more information such as body language, tone, where you focus your eyes, tempo, angle and more. Although it maybe difficult if not impossible to infer ALL of those bits of information NOW, it may become quite practical in the future.
Going forward we have to continue the fight for privacy as individuals and as a community. How we can start implementing protective measures is by encouraging each other to change the culture to have higher expectations for digital privacy and holding each other accountable. Things like webcam and phone camera covers can help reduce the effectiveness of programs designed to hack into the camera and read your mood by your facial expressions. Even expecting any new contact to communicate with the use of encryption and encryption apps the whatsapp, protonmail, or even snapchat (I’ll admit I could be better at this one). Maybe give your friends small faraday cages (a box that can cut-off all incoming and out going electrical or radio signals) to hold their devices when they are not in use to prevent cell-tower hacks or during ‘privacy hours’ (time devoted to direct human to human interactions). But this all starts with assessing the amount and types of interactions we have with and through technology to better understand the private information that is implicit with those interactions and thus can be inferred. That understanding is the key to really valuing our data more that data collectors monetarily value our data.
Encrypt yo face:
Corporate and Inter-Governmental Spying:
EFF (2018) “Responsibility Deflected, the CLOUD Act Passes” By David Ruiz
Business Insider (2017) “Trump just killed Obama's internet-privacy rules — here's what that means for you” by Jeff Dunn
Replicating the Human Brain in Computers Based on Your Information:
TEDxSiliconAlley (2013)“How To Create A Mind” by Ray Kurzweil https://www.youtube.com/watch?v=RIkxVci-R4k
MIT Technology Review “With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Artificial intelligence is finally getting smart.” by Robert D. Hof
Algorithmic Law enforcement and justice system:
Time( 2017) “The Police Are Using Computer Algorithms to Tell If You’re a Threat” by Andrew Guthrie Ferguson
Business Insider (2017) “The first bill to examine 'algorithmic bias' in government agencies has just passed in New York City” by Zoë Bernardhttps://www.businessinsider.com/algorithmic-bias-accountability-bill-passes-in-new-york-city-2017-12
ProPublica (2016) “Machine Bias” by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner
Sensors (Basel) (2012) “A Stress Sensor Based on Galvanic Skin Response (GSR) Controlled by ZigBee” by María Viqueira Villarejo, Begoña García Zapirain, and Amaia Méndez Zorrilla*
Privacy Preserving Data Mining:
SpringerPlus (2015) “A comprehensive review on privacy preserving data mining” by Yousra Abdul Alsahib S. Aldeen, Mazleena Salleh, and Mohammad Abdur Razzaque