AI, or more accurately advanced analytics, has commonly be referred to as one of the hottest jobs to have. And with the craze around startups which often time uses AI and the crash of crypto-currencies one must wonder is there an AI bubble. Anecdotally many professionals in adjacent industries (and sometimes not so adjacent industries but people who happen to have access to massive amounts of data) are incorporating more and more advanced analytics into their portfolio to widen their job search potential and command a higher wage. Many startups use ‘AI’ as a selling point as well to offer features previously though unattainable. These startups are said to have almost as much funding as the wellness service industry is projected to have.
Many ‘AI’ startups focus on advanced analytics proliferation , like how AWS focuses computer storage and processing, for robotics or applying ‘AI’ to industry specific problems. Since most AI features are based on neural nets (which require the data be already correctly labeled before it learns how to correctly label the data) that means these companies have to spend a lot on hiring more humans to label the data before then having their analysts train the model on the data, that might not even be tailored to the client in particulars business processes and customers. Also since its based on the human brain the technique has a ceiling on how much it can advance, after all our brains aren’t getting much more advanced anytime soon. With these caps on growth potential you have to wonder how can these companies protect themselves from ‘analytics superstores’ and horizontal mergers.
These limitations have a few errors that will cause a reduction in usage and effectiveness and a consolidation in providers. With so many companies specializing in making ‘AI’ easier to use and implement then its obvious that it’s a matter of time before a select few companies corner the market and have streamlined the offering to its most efficient and profitable state (like ford with the invention of cars, Apple and google with iphone and android for cell phones). Making it easier for their potential clients to simply have their software engineers import the latest AWS package for their needs. AWS and Google are already at it with their cloud computing having also specialized in deep learning, and so is Microsoft with it’s Azure platform.
Another issue is that many times these features are not as adaptable to changes in the business terrain since it was trained on old data. Even if it has re-enforcement learning capabilities it still requires that business have to label the new data anyway which will make the AI program less relevant (because humans will still do the work with ‘AI’) and make it more apparent it can’t address any technological problems one might have. As some problem areas, say new types of cars or fashion, will be undesirable to have humans label the data regularly without it feeling like an additional burden that could be fixed with AWS mechanical turk (a system that has humans label data en masse) and thus re-enforce the issue mentioned earlier.
Finally privacy and data print awareness is coming no matter how many decision makers in tech want to stick their heads in the sand. The metadata and aggregate information that is the primary revenue source for some companies will have to be reworked as their clients (often times other companies) will want to make sure their data isn’t being used to inform their competitors, and as governments that are realizing metadata can reveal personal data. This means the ‘AI’ startup will either have to be more selective in who it has as clients or put in additional work to de-tangle the personal information in their models for clients they are competing over with Google, Microsoft or AWS (which will also reduce the total value their end product can give to the client) .
These factors are why the ‘AI’ market has a cap (in addition to lack of training, diversity and innovation). ‘AI’ or advanced analytics, will continue to be the hammer of the 21st century in technological innovation and is important for us as a community to advance and grow it, but you do that through better math education, interdisciplinary collaboration and access to technology (not more data/spying and generic packages). Now we just have to hope that all of the investments into this industry has been as carefully weighted and analyzed as they do with everyday loan applications, rather than simply judging by charisma and buzzwords.
Machine Learning and AI is being implemented in every technology we use, and we use technology for everything. Even in some place AI is replacing human workers and interactions. So how are these tools, that are replacing so many human-to-human interactions, effect our society and even people as individuals. If people are being replaced with AI how will our culture change and can they give use the authenticity needed for challenging ideas and diversity of thought? Let’s take a look at some of the motives and effects of outsourcing authenticity.
Ever hear someone say “Why are you asking me? I’m not google” and then turn around and complain that “everyone is always on their phone”? Welcome to self-replacement! The thing is, the interaction that would have happened actually served to societal function. Asking questions of other people causes the respondent to exercise their logic, memories and understandings associated with that question to explain it to you. It also helps to store the variety of view points and interpretations on more subjective topics via memory of this new conversation of the respondent's particular view point . Also running to search engines with your questions makes it easier for tech companies like google to get a better understanding of what interests you and what you do or don’t know know. Which will help make social chatbots a real contender for real-life conversations.
This is one example of the underlying ‘self’-replacement motive and effect of AI and the growing global monoculture. Part of the appeal of using AI is that it has human-like (artificial) intelligence and thus can be used either more cheaply and/or quicker than actual humans in executing tasks, hence self-replacement. Having one monoculture would be mean we’d have more assumptions and biases that are harder to find and less sources of creativity, which are essential abilities when trying to over come any new challenge. To be clear analytics (the use of math and statistics on data), which is the foundation of AI, has many benefits such as optimization and effective representation. However replacing and centralizing ideas, knowledge and ‘human-like interactions’ in effect creates a monoculture.
One area of optimization that everyday people like is in social interactions. Whether it’s creating elevator pitches, small talk at office parties, or talking with that uncle at thanksgiving, we all have some interactions that are rote, superficially engaging and yet necessary. Many cultures try to regulate social interaction and reduce awkwardness via rules of etiquette, ice breakers and codes of conduct. The aversion to awkwardness is so extreme its even called painful in phrases like ‘painfully awkward’, which can be created by silence that is often due to lack of mutual interests or broaching topics that have unpleasant implications. To the extent these interaction happen digitally, what better way to deal with these necessary routine conversations than the use of chatbots that could quickly import data from both of your social media profiles. Obviously such a tool would be useful in saving people time and effort, but this may end up making it more difficult to develop the social skills and empathy needed for critical areas that AI can’t/won’t be used for such as conflict resolution.
Even if its not directly replacing interactions we have to be aware of these new tools and their filter streams effects on our knowledge and perspective. Even if its the use of AI to create movie scripts or to write news articles we have to wonder if there are steps being taken to insure the use of different view points, rhetorical devices, subject matter or even sides on a issue (for instance articles with independent slant rather than republican or democrat). We already know about the problems media have in representing Americans accurately behind and in-front of the camera. These filters on authenticity also dulls our expectation of uniqueness. For example, creating a new word will seem profound when for decades people have communicated with AI that have a limited vocabulary and limited ability to infer the meaning of words.
Some potential solutions to these problems are to still go through the rote social interactions at least some of the time and focus of their body language as a means to make it more interesting, for AI that creates articles and movies don’t have them replace authors just aid authors in producing the volume needed and having a human review the final draft like how the post has done with it’s AI system and utilize different systems that will give different results. Even when it come to telling others to ‘ask google’, instead of pushing the asker away, try to take a minute to think about a possible answer and explain what you think might be the case and then refer them to a search engine like duckduckgo.
The issue that arises is that given a profit/time motive the tools to outsource work can and will leak into daily life (for instance the deepnude app), making our lives more robotic and less imaginative. So we must purposely think about where organic authenticity is and where it should be salvaged and protected. And remember next time your friend asks you a question be happy they’d rather talk to you than to the internet. Please comment below about where you feel that authenticity should be protected.
Many times when people ask me what I do, and I tell them and they say “Oh, your in IT”. I then give a contemplative pause and reply ‘Not really’. To say an analyst or statistician is an IT employee is like saying a HR rep is in IT because they use a computer for the forms, or a communications expert since they put speeches on everyone’s computer.
IT is Information Technology. Often times we think of computers as being IT (which they are), but so are cell phones, faxes, and printers. Again tools everyone uses and IT professionals specialize in maintaining. IT professionals are usually the ones setting up those tools, maintaining them or even tweaking them for new purposes.
Analytics (and thus the statistical and mathematical side of data science) is focused around discovering information by way of applied mathematical statistics. Thus the analyst isn’t by default concerned with setting up and maintaining tech to transmit information, like phones, faxes, and computers. They do use any technology for data collection and computer in particular for data processing. But this does not even mean that an analyst MUST use computers for their analysis. There are some situations where if the data is small enough or the analysis is simple, the processing can be done by pencil and paper. One example is determining how much a business should re-order materials given inventory costs and historical demand (note: the end result of this analysis is a decision and not just information display).
Although generally the techniques and data used by analysts require computers for speedy processing, this does not count as doing IT work. Like many domains, analytics and applied statistics can reveal useful information and relationships between factors. IT can also be used to store data across many domains as well. So both can impact one another, so let’s discuss how they DO intersect.
Statistics can be applied to computer processes, such as determining what programs are likely to run at a certain time of the day or what memory sections tend to be accessed together. And statistics can be apart of IT such as auto-sorting emails based on expected importance value, or suggesting new programs based off of your computer usage. IT can impact statistical analysis by providing faster processors, more memory, or even access to data.
All in all confusing a new profession with some of the tools used in that profess ion is understandable. This compounded with the fact that more and more data scientist have an IT background rather than a statistics background as they use to, is contributing to this confusion. But ultimately just remember this analytics and science are based on analyzing, and analysis can be done by hand (ie without technology, like when you did your math homework) or with a computer. IT is based on technology, and most of us don’t use tech to conduct analysis when we are on Netflix or surfing the web.
In the land of the free, why are all of our newsfeeds on any particular platform controlled by the same algorithm?
We should each determine what news, posts, images, or blogs (collectively referred to as content) are brought to us. This inability to do so isn’t just an annoyance, but also contributes to group-think, blind-sightedness of the masses, and manipulation by foreign or domestic entities. The influence of algorithms on our personal interactions and media intake can lead to improper manipulation by foreign powers or even domestic terrorist hackers, who “game” the system. Even when said system of influence is created by the governing body or platform creator, it oftentimes lacks the ability to properly represent the values (which may change constantly) of each individual customer or the capability to determine which pieces of information the user may consider relevant.
If these platforms really care about improving the quality of their users’ lives and leaving them with memorable, favorable AND beneficial experiences, then they need to include an option for users to customize how their newsfeed is aggregated. In this blog post, we'll look into how enabling users to determine their own information suggestion algorithm can reduce gaming of the system by bad actors, group-think, tribalism, and accidental information suppression by the moderator.
Let’s be clear. Many of the social media companies do allow any account to connect to their platform and implement algorithms that extract information and post through the use of coding (which many people can not code). Although these interfaces are extremely useful to programmers like myself, they are still not useful in regards to changing our experience or newsfeed in the app as you might see when you open the app while you’re out and about, or just using the platform for fun and friends. Still, allowing each user to program their feed within the app using a simpler mechanism seems to be entirely within of the realm of possibility, since they essentially offer something similar already. Also, many of these platforms already label posts that are being displayed because they are ads, so enabling people to customize their feeds shouldn’t affect those ads that will be promoted regardless. Even if the platforms just use the user’s algorithm in combination with their primary algorithm, it will still help preserve intellectual diversity. The worst-case scenario is that most people use the same settings, so then when it comes to the question of the platform’s manipulation of newsfeeds and exposure, it will ultimately be the responsibility of the collective users.
How does this impact us/U.S.
This issue is important because of the need for diversity: more specifically, diversity of thought, diversity of knowledge, and diversity of perspectives. Many people get information from these platforms and that information shapes our thoughts and views of topics ranging from what clothes to wear to a party to what laws should be passed. Allowing each user to customize their feed will make the flow of information on these platforms more dynamic. It’ll be more difficult for foreign or domestic influencers to push narrow-minded propaganda, and make it easier for naturally occurring nuanced and dynamic information to spread. It will be an additional obstacle these bad actors will have to overcome: not only how to tailor content to the varying personality types convincingly, but also how to shape it in a way that gets acquired by these many diverse newsfeed algorithms.
Not having this capability is how Facebook’s newsfeed promoted the icebreaker challenge over the Ferguson protests. That sort of disconnect is what’s helping to increase division and blind-sightedness. If people had had their own custom newsfeed algorithms within each social network, it would have increased the odds that at least one member in a user’s network would have had the information show up on their feed and spread it to their network.
Within each user’s network there is a certain level of “research diversity.” That is, the sort of information one person may look for is not the same sort of information another person may look for. When the different members of the user’s network do finally communicate, they share some of this information, since sharing information or disinformation is the fundamental purpose of communication. This essentially funnels any important or relevant information through the network to the person who is thought to perceive it as important or relevant. And during the process of being confronted by a “trusted” person with new, conflicting information, the user HAS TO engage their critical thinking skills (which helps them be resistant to propaganda) to do any or all of the following:
-Assess if the information is truly import or relevant to their point
-Logically reason if the new information truly supports or negates their previously held position
-Check for mitigating or dismissive factors
-Search for other, counterfactual information
-Finally accept or reject the information’s impact to their previously held beliefs
-Resolve any cognitive dissonance that may result from their newfound changed position and other previously held positions and/or beliefs
Yes, some of these steps may happen so quickly that it almost seems unconscious, but even in that instance they are still exercising those critical thinking skills. However, the important part is that this conflict is started by a trusted party, and thus will likely be seen as valuable enough to engage in the critical thinking tasks. The other (and maybe more important) part is that this can happen with positions on topics users are not as “dug in” on. In exercising their humility muscle, users will hopefully be more open to the idea of being wrong about the more “tribal” issues.
If the overall diversity of information being exposed to anyone in the user’s network is reduced, they all essentially end up viewing the same content on their individual feeds. This is not what makes a well-informed and educated populace. Studies have shown that critical thinking skills are what make disinformation and propaganda less convincing.
How can non-programmers program?
So of course we have to wonder how people could customize their newsfeed in such a way that’s not as complex as using programming languages, but still offers as much flexibility as possible in determining what features of an image, tweet, or post gets promoted. We’ll go over some of the User-Interfaces (UI) or how users can interact with the code in the next few paragraphs.
Surveys of information (supervised learning) vs likes, glance/views, and comments
The simplest way for the user and the platform to create custom algorithms would be surveys of what the user would like in their newsfeed. You may wonder how this differs from how many of the existing algorithms work. Well, many of those algorithms work by looking at what you like, view, and comment on. The issue with these metrics is that some things you may not like because they are bad news sort of posts. Some posts you may not comment on because they contain a controversial topic. With that said, none of these situations exclude the topic from being considered newsworthy. Even just using what you viewed can be ineffective because sometimes you may view something because of a misleading or unclear headline, for fun instead of for information, or even because your mouse slipped.
The alternate survey type could ask the user what sort of topics they want to flood their feed or not flood their feed. It could even ask a user about the content they have engaged with in the past, and if it’s something they want to use to determine if other content is worth putting on their feed. Another way is a comparative test, asking the user a series of questions comparing the newsfeed value of one piece of content to another.
Having an explicit survey of what the users want or don’t want in their newsfeed will provide guidelines for the platform to discriminate between content. If done regularly, it will also enable their AI to automatically search through their database of content that you would likely approve to be exposed to, rather than simply showing you what will get you to engage more with the platform.
Block Models to Build Feeds
One way non-programmers have programmed in the past is through the use of sequentially connecting blocks on a screen. Each block corresponds to a particular function (be it a type of filter, comparison, feature type, outlier type, etc.), and the connections between them determine the sequence when each function is executed.
For instance a block could do a transformation such as replace the number of views a post has with if its in the 1st, 2nd, 3rd,, 4th, 5th , etc. place of views. Then the user can set the filter to filter posts with below the thousandth place of views (say out of a total of 5 thousand posts) instead of using the actual number of views the thousandth place post had, as the cut-off.
Thus, each user could filter at different steps based on various features, and transform these features (for instance, a feature could be the number of times content has the word “bully” and a transformation would be if that count is above or below average) or even combine features to then filter or sort the feed results. This will give the users a way to create and execute flexible code in a way that capitalizes on visual representation and intuition.
Ask your computer to make the algorithm
The simplest way for a user to customize their feed would be to ask their computer to filter their feed based on some criteria. Similar to speaking to Cortana, Siri, or Alexa, there are frameworks and techniques that enable vocal interface with databases. Users will have to understand what features and other functions (similar to the blocks’ functions) can be used when speaking to their computer. These programs usually are bit computationally intensive and thus maybe impractical for a social media platform to implement and maintain such a feature. However, that doesn’t mean we should give up on this idea. The platform could have a user download said program to run on their own computer, and once the program outputs the file with the newsfeed algorithm in it, they can simply upload the file to the platform like they do pictures or videos.
Swapping feed algorithms
This last one is a bit meta. But once some people have created feed algorithms, they could send them to others or upload them for sharing. This would create a sort of coding competition even among the users, and it would encourage new and different techniques to produce new and different information streams. These feed algorithms would be the new app store on these platforms.
For these companies, having users generate their own newsfeed algorithm can help create a platform marketplace of newsfeed apps. It will also increase regular discussions on knowledge, data, and their relative value in the population, pushing our everyday citizens to the forefront of the information age. Some of these algorithms could even be repurposed for the platform.
Our feeds are a new part of life and should be respected for the impact they CAN and DO have on our lives. Now is the time to realize the feeds do more than simply drive engagement and ads. What we can do as consumers is talk about these concerns on social media so the platform owners hear us, and support companies that do enable user-customized newsfeeds. As consumers in any nation, we have to realize how these feeds are shaping our minds and relationships, which are the very fabric of civilization and even our humanity.
With all of the hacks in the government and corporations we as a society need some defensive techniques in preserving our data , especially when its used to create AI. We all know today’s technology knows a lot of information about us, and maybe more information than we know about ourselves. And if you want to retain your free-will and not let tech companies (and non-corporate actors) manipulate you into being “good customers” ( or any other “good role” that they see fit for you) then you have to learn how their what information you actually are giving up, AI learns, how they may implement influence campaigns, and also how your network maybe used against you. Here we will go over techniques for reducing the information they can learn, understanding the potential flow of your data amongst these organizations, and attractive attack angles for manipulation campaigns.
Since we’re going to talk a lot about data, what is data? Data is, according to dictionary.com, ‘individual facts, statistics, or items of information’. And information is ‘knowledge gained through study, communication, research, instruction, etc.’. For clarity here we will use data to mean digital representation of events and attributes. Information will be used to mean useful knowledge that may have been derived from data. So, what comes to mind when you think about your data?
Some sources of data/information you are creating (whether you realize it or not) is location data that can be extracted from your cell tower connections (and thus includes things like who you hangout with what stores you frequent, etc), attention span (what topics you regularly comment on or how long you watch a video), areas of interest or lack of knowledge (google searches give these sorts of insights as well as what tv shows you watch), user behavior, political leanings (from your liked posts or talk shows you listen to), your mood (posts to media, which posts your liking, if you don’t have a webcam cover your face maybe scanned to detect your mood as well), which friends and family you are close to or not close to (social media posts and likes have shown to give these types of insight). That’s just to name a few. With such sensitive information hidden in your data we must be concerned with how its being protected and used.
Since data and technology is being used for so many different reasons and in so many different contexts you might want to consider implementing these techniques in all of your technological interactions. Obviously it’s impractical to take into account all the relevant considerations listed here each and every second you interact with any technology. Use this information to create engagement strategies with the devices and software you use. That is, think about how you could alter how your interactions with the devices and software you use on a regular basis to preserve your privacy and cognitive autonomy. You use each software for different purposes and each collects different data for different purposes so you have to have a unique approach for each. Think of it as creating your own ‘algorithm’ based off of the information you gain here to craft how you use the software for your needs in a way that reduces ‘off-the-cuff’ information sharing. For example, the #mood popped up and I just had to join in and document all the different types of moods I do and don’t experience and their contexts. A mindful engagement strategy of not revealing emotional triggers may have persuaded you to not participate in that hashtag.
Who Wants Your Data And Why?
When it comes to manipulation and invasion of privacy the first question we must ask is what type of info do you think is likely shared about you? Data has value but certain type of data is more valuable than other types of data. So take a moment to think to yourself what organizations (companies, employers, governments, or even criminal syndicates) are interested in learning about you and what would they like to learn, then think about how the technology you engage with may have data points related to that information.
Now that you’ve had time to come up with a few options on your own (or with a group) let’s discuss a few examples. Some organizations that might be interested in your information are insurance companies, data-marts (we’ll discuss data-marts later), ISPs(internet service providers), retailers, social media and political campaigns. These organizations often times collect data from you and your devices directly or indirectly through data-marts. Data-marts collect data on individuals all over the internet to sell it to interested parties. Many times this data can be public data you posted, data bought from organizations, data that can be extracted from tech such as IP address or cell towers, and information that is inferred (how they infer information will be talked about in the ”Giving More Information Than You Thought” section). But then that begs the question why may they or you share your data?
They may want your information to sell you ads, tailor insurance costs to your lifestyle or even just to better understand their customer base and how to tailor their services to them. If you really are as ‘trendy’ and ahead of the curve as you think you are, investment firms may want your data to predict stock market trends. Governments may want your information to determine how good of a citizen they may consider you to be, if you have any connections to people they consider to be enemies. Also organizations may use your information when looking into hiring to determine if you fit the profile for high performers.
So what can these organizations do with all this information? This information can be used by employers to determine if they’ll hire you, law enforcement for investigations or threat scores, an even insurance quotes. This in combination with the software on your devices can lead to even more manipulative techniques like:
The Circle of Data
How does the data flow between organizations to do such automated manipulations and information extraction? Many organizations may scan the internet themselves for public data, as well as buying from data marts or giving your devices ‘cookies’ when you visit their site to track you across the internet and devices. Your personal tech can be hacked and have a hot mic or active camera recording information. When companies have their (and our) data hacked this information can end up in the underground market and thus bought by criminal organizations or possibly make its way to other legitimate organizations through reselling. If the two organizations that wish to swap data are complimentary in nature (such as an advertiser and a retailer) they may simple ‘share’ data as ‘third-parties’ or platforms like twitter may have a program already setup (called API) to allow people to connect to their databases (which for this twitter example is why it’s important to know your privacy settings).
Throughout the internet surfing, posting, or searching process your device connects with multiple organizations. Such as the operating system of the device that may send data to the parent company, the local cell tower that maybe compromised by law enforcement or criminals, the ISP (such as at&t or comcast) the connects you with the site, the DNR which holds the site and finally the organization that owns that site. These organizations could have a copy of the information sent over the internet connection (especially if its unencrypted). Even if you have multiple online personas, there are efforts underway to coalesce those personas into one identity. This goes to show you not only have to be aware of what you post but what other devices, email accounts and locations are connected to what your posting.
Decoupling data can be a difficult task. However if you feel it is important enough (for instance maybe you work at google and don’t want your employers being able to know all the controversial views you have or religious beliefs), using different devices for different internet activity at different locations ( but maybe at the same time) with different IDs and passwords are essential as well as making sure the devices do not connect to each other or to the same third party via text, email, wi-fi, cell tower, facebook account, etc. This way the data generated during your internet activities are separated and siloed (insulated from connections) to the various sites and companies used, thus decreasing the likely hood of that data being connected in any meaningful way to your other data. Even subscribing to the same sites can give away your identity if they are very unique sites or you subscribe to so many sites that they effectively make up a digital fingerprint. After all everyone has to do brand management and some personal data may not be appropriate for employers, retailers, insurance companies, etc.
Giving More Information Than You Thought
As you can tell some of the information on you isn’t explicitly given but it is inferred, which is how they are using the data they do have on you. When they are inferring information about you they use statistics, machine learning techniques and other data that has been gathered.
Examples of other data being used to infer information about you is store locations. If I have your location via cell towers and see that your at the same location as Barnes&Noble (Barnes&Noble’s location is the ‘other data’) every Sunday afternoon, then I can infer that you go to Barnes&Noble to read every Sunday. Or if your there for 4 hour or more every few days I may infer you work there.
However when it comes to statistical inference (which includes machine learning) the exact explanation can be quiet complex so for simplicity sake I’ll explain it as follows:
Analysts and machines learn about you through your data by looking at the frequency or presences of events or qualities under various circumstances/contexts that they are testing for (ie. do you read faster with political articles or fashion articles, do people similar to you like Instagram posts with cats or dogs). The variety of data gives them more circumstances/contexts to understand your frequencies and attributes.
Some of the ways these programs can infer information about you is through finding others who behave like you or have similar characteristics but have more information available about the more private aspects of their life. It’s basically the ‘birds of a feather’ idea programmed into computers using statistics.
Also by looking at your behavior (if there is enough of it) across different contexts such as locations, times, topics, etc. they can figure out what are influential factors on the behavior in question. This is why having a variety of data and factors is so important for AI. After all if I type ‘1’ fifty times and send that to an analyst there’s not much they can lean from those fifty ones.
Another way they infer information about is determining what hidden characteristic best explains your behavior. A classic example is if we know you eat ice cream pretty regularly, one can assume you eat less ice cream when its cold out side and if its cold one day then its more likely to be cold the next day. Therefore any time there are multiple days you didn’t eat ice cream consecutively we can will infer those days were cold days. The temperature of the day acts as a hidden signal in your data. And like this example learns information about the unknown temperature of the day through other data, they will use your data to infer more hidden signals or attributes about you.
Tech To Protect
With all these methods of gaining more information about you, sometimes even more than you explicitly gave, how do you try to protect it? The most simplistic and impractical way is to simply stop using any technology that has a cpu, memory or connectors to other devices, AND don’t communicate with or be around people who do. Again this is the most impractical option. The more practical options range from technological evasion to augmenting how you interact with such devices and software.
There is some technology you can use to help protect your information is encryption. Encryption translates your data into unreadable gibberish based off of the password you provide the program. Once the data is sent to your intended audience they can use a password you provided them to translate the gibberish back into actual readable information. This is definitely useful when passing information directly between peers. However your location data or website traffic can not be encrypted from your ISP or the site your visiting, unless you use VPNs (virtual private network).
VPNs are a set of computers that pass along your computer’s requests to the internet as if it were their own. This causes the site you are visiting, say amazon, to believe another computer (an thus another user) is accessing their site, unless you sign into your account. VPNs are also useful because it can allow you to access internet sites in foreign countries like Brazil or India to see what media is showcased to their citizens. However neither of these techniques will prevent instagram from inferring you follow self-helpers because you want to kick your drug habit.
For protecting your privacy from being inferred based on the data generated while using various software and devices, you have to be more mindful in what you post and how you use such technology. There are also some tactics as well to use once your aware data is about to be generated that can be connected to private information. Being aware of what private information is or maybe connected to the data that’s being generated is the first step. For instance if you usually go out everyday and especially on the weekends and suddenly your device location is at home most of everyday for a week its easy to infer that you are sick or depressed. Or if you normally like spiritual posts but never like explicitly christian posts then it can be inferred your not christian, or if you comment on every video you see but don’t comment on videos on videos about racial hate crimes it can be inferred that your sympathetic to those crimes.
Mindful Tech Engagements
Some things to be mindful of is the amount of interaction you have with any technology. This includes not only how long, how often, and where you use the technology but how immersive is the interaction and how diverse is the content your interacting with. Are you interacting with content ranging from philosophy, stocks, food, politics, to religion, sensuality, art, or music. That means there is a lot of contexts for the technology owner to infer information about you and your ideas and beliefs.
Then in particular be mindful with how immersive the technology is. Are you simply pressing a button only when a video ends or are you continuously moving the mouse, clicking, typing, looking (in the case of VR or eye tracking) and speaking. These highly immersive interactions are particularly sensitive as this is the environment psychologist use in their experiments to test cognition, reactions to certain material, and mental associations (just to name a few).
Another thing to be mindful of is what private information the technology owner would like to collect and can, likely is, or are actually collecting. Particularly understanding the type of data that is capable of being collected is needed. Like gyroscopes not only saying where you are in your home but direction and angle your at. Most people don't realize some touch screens can detect how conductive your skin is and skin conductivity can indicate if you are under stress. Or even face tracking technology can detect what part of the screen your looking at. That’s just the explicit sensory data that is being collected. Then you should also be aware of what private information the technology owner (or their partners) would like to infer from that data for their purposes (profit, influence, retweets, etc).
Having these understandings of what data maybe collected or information could be attempted to be inferred, and how it can be collected will give you the information needed to determine how and when to implement the following techniques to reduce the invasion of your privacy.
Altering Data For Preserving Privacy
One common method to preserve privacy is aggregation. In the data world that would be only releasing summaries of the data instead of each individual data point. From a user perspective this could mean things like posting your thoughts on a topic (say if your on twitter) at certain predetermined times. For instance once when the topic is first brought up and once when the topic seems to be going away. This way you avoid tweeting every few minutes giving away your thoughts on how the slightly different forms of the topic would effect your opinion and reacting to every tweet about the topic.
Another technique is to either not give any information at all or only give one consistent response. As mentioned earlier not giving any information can be at times impractical, however giving a consistent response can be useful in certain contexts. An example of not giving any information is pandora, instead of thumbing up and down every song just have different playlists so that way you don’t have to train their algorithms as much but yet you still get a diverse set of songs. An example of giving one consistent response is having a default choice in choose your own adventure games and movies. Like with Netflix’s Bandersnatch movie you could have only choose the left option. Obviously not the funnest way to experience the movie, but again these are options.
A more sophisticated and risky technique is giving random responses, particularly when you know the response is valuable in understanding the rest of the data. This obviously requires insight into what is being inferred and what data is being used to be inferred. Also its risky because there are ways to determine false data points that don’t fit with the rest of the data.
The last method is based on how many ways you can interact with the device or software. That is submitting any information in a form that is more costly to analyze and store or in a form that holds less information. Numeric data is the easiest for computers to analyze next to text then audio and then pictures and videos. That means if you can decide which method to use to interact with a device or software pictures and videos will cause the organization analyzing your data to use more hardware for storing that data and more hardware for analyzing. However with that being said pictures and videos also contain more information such as body language, tone, where you focus your eyes, tempo, angle and more. Although it maybe difficult if not impossible to infer ALL of those bits of information NOW, it may become quite practical in the future.
Going forward we have to continue the fight for privacy as individuals and as a community. How we can start implementing protective measures is by encouraging each other to change the culture to have higher expectations for digital privacy and holding each other accountable. Things like webcam and phone camera covers can help reduce the effectiveness of programs designed to hack into the camera and read your mood by your facial expressions. Even expecting any new contact to communicate with the use of encryption and encryption apps the whatsapp, protonmail, or even snapchat (I’ll admit I could be better at this one). Maybe give your friends small faraday cages (a box that can cut-off all incoming and out going electrical or radio signals) to hold their devices when they are not in use to prevent cell-tower hacks or during ‘privacy hours’ (time devoted to direct human to human interactions). But this all starts with assessing the amount and types of interactions we have with and through technology to better understand the private information that is implicit with those interactions and thus can be inferred. That understanding is the key to really valuing our data more that data collectors monetarily value our data.
Encrypt yo face:
Corporate and Inter-Governmental Spying:
EFF (2018) “Responsibility Deflected, the CLOUD Act Passes” By David Ruiz
Business Insider (2017) “Trump just killed Obama's internet-privacy rules — here's what that means for you” by Jeff Dunn
Replicating the Human Brain in Computers Based on Your Information:
TEDxSiliconAlley (2013)“How To Create A Mind” by Ray Kurzweil https://www.youtube.com/watch?v=RIkxVci-R4k
MIT Technology Review “With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Artificial intelligence is finally getting smart.” by Robert D. Hof
Algorithmic Law enforcement and justice system:
Time( 2017) “The Police Are Using Computer Algorithms to Tell If You’re a Threat” by Andrew Guthrie Ferguson
Business Insider (2017) “The first bill to examine 'algorithmic bias' in government agencies has just passed in New York City” by Zoë Bernardhttps://www.businessinsider.com/algorithmic-bias-accountability-bill-passes-in-new-york-city-2017-12
ProPublica (2016) “Machine Bias” by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner
Sensors (Basel) (2012) “A Stress Sensor Based on Galvanic Skin Response (GSR) Controlled by ZigBee” by María Viqueira Villarejo, Begoña García Zapirain, and Amaia Méndez Zorrilla*
Privacy Preserving Data Mining:
SpringerPlus (2015) “A comprehensive review on privacy preserving data mining” by Yousra Abdul Alsahib S. Aldeen, Mazleena Salleh, and Mohammad Abdur Razzaque
With fields like quantitative psychology, resources such as data marts, risk-reward seeking habits combined with social influence/acceptance, and the Cambridge Analytica scandal it is apparent that many programs have algorithms in them to take discover and advantage our psychological constitution. And since many of these programs are profit based or at least thrive (directly or indirectly) through user engagement, it is reasonable to assume or logically derive the conclusion that these apps are designed to be addictive.
Algorithms that you are most likely to encounter are used in the form of apps or programs on your devices. And the device that has the most access to you is your phone. So for this article the mention of phones and apps is to reference their usage of algorithms to get users addicted.
Even though increased phone usage has be correlated with symptoms of psychological disorders, addiction is an extreme version of a habit that requires dependency and harmful effects (which often time the harmfulness depends on how extreme the habit and dependency is). So, for this post we’ll largely focus on habits that could then transform into an addiction (if it isn’t already) in order to encourage pro-activity for the largest audience. In this post I’ll talk about symptoms, suggested ways to break free of the habit, and potential impacts on our society.
So you maybe wondering “How do I know if I’m addicted”. Well a few things that should at least raise your eyebrow is if you oftentimes find you self just opening your phone, taping on the screen for no particular reason. That shows you’ve already habituated it to the point that you turn to your phone regularly and unconsciously.
Another eyebrow raising condition is phantom phone syndrome. That is when you feel (usually at the area where you normally keep your phone) as though your phone rang/vibrated even when the phone isn't there. This is your body acting in a similar manner as your mind does in the previous example of checking your phone. I believe this is due to your body auto-reacting to a stimulus it assumes will be there.
Then there is addictive like behavior such as disengaging significantly in real-life social interactions for tech or online engagement. This shows a harm element, replacing in-person relationships (that often can provide real benefits like touch, compassion, financial support, etc) for digital ones that often times are superficial and fleeting.
How to stop
The first step is to reduce notifications. It’s much more difficult to quit a habit, for instance shopping addiction, if your constantly being told about the newest sale going on at the local mall. It's kind of similar to the idea of 'out of sight, out of mind'. And if your phone vibrates or makes noises when these notifications go off then you will be interrupted less.
Replace the reward. If your always checking your phone because of social media or emails, turn off your phone and go to a networking/social event. If your always checking instagram try to replace it with movies, or DIY shows. Studies suggest such short term attention getting tasks harms your attention span.
As usual you can try the cold turkey approach as well. Send out posts, emails, etc to everyone that you will be going off grid for however long you’ve already decided on. But a compromise to that is to reduce the memory on your phone. Less memory means less apps.
The CNBC article on "These simple steps will help you stop checking your phone so much" suggests using greyscale to make the images less appealing and using accountability apps to keep you aware of how much time your spending on your phone. The greyscale would seem ideal if you spend a lot of time on visual-centered programs like instagram, youtube, etc. But it may not be as useful for apps like twitter, or emails.
Potential Impacts on our Society
With any new widespread effect or device one has to wonder how it will change our society even if only a little bit. Social media and apps are no exception. And these concerns go farther than you spending money on the latest in-app purchases.
Increased addiction to phones and social media will also increases the impacts of algorithmic influence. Increased addiction to phones has been correlated with antisocial personality disorder. With the data markets as they are (sharing your info for profit, directly or indirectly), each second of interaction with said programs is more information about you to sell and more influence on you to sell.
With decreased attention span and increased anxiety that makes it easier to distract and act on impulses. Those are the ideal characteristics of a target for manipulative actors or con-artists. We’ve already seen foreign governments using it to suppress voter turnout (depression makes it less likely people will leave their home), and facebook influencing people’s moods. These are just a couple reasons why these addictions and algorithmic influences are important to national security.
Crime syndicates (including terrorists) may start bricking phones of people they know are addicted to their phones for money or even because they may have richer data on other people in their network. And even more fundamental as a culture, we have to worry how we can maintain independence when a country like China can get data on users and use it to send emotionally compromising media to who everyone they deem to be key community members.
What will that sermon be like if right before service starts the pastor saw a youtube ad about pools when she was traumatized by nearly drowning as a child. How well will your attorney argue for you in the FISA court if they got a notification of a string of memes that all have to do with nihilism and free will as an illusion. What will the young aspiring domestic terrorist do when he hears the fake news story on facebook that the loud crash he heard wasn’t a plane crashing because of an inexperienced pilot’s error but is a ‘race-war’ that has started in his town.
I admit, some of these problems can only be fixed once those in power respect us and our data enough to enact regulations and brake data-monopolies or at least if they experience mass consumer backlash. However the steps suggested in this article are ways to reduce and mitigate these harms. Enjoy your tech, just enjoy it responsibly.
Time. (2015) You Now Have a Shorter Attention Span Than a Goldfish
Retrieved from http://time.com/3858309/attention-spans-goldfish/ 12/31
Medium. (2017) The Addiction Algorithm
Retrieved from https://medium.com/@jeffeinstein1/the-addiction-algorithm-864aff96795
New York Times. (2013) Addicted to Apps
Retrieved from https://www.nytimes.com/2013/08/25/sunday-review/addicted-to-apps.html
UCLA, Psychology Department
Retrieved from https://www.psych.ucla.edu/graduate/areas-of-study/quantitative-psychology
Psychology Today. (2013) Phantom Pocket Vibration Syndrome
Retrieved from https://www.psychologytoday.com/us/blog/rewired-the-psychology-technology/201305/phantom-pocket-vibration-syndrome 12/31
Science Daily. (2018) Why we fail to understand our smartphone use
Retrieved from https://www.sciencedaily.com/releases/2018/05/180523104246.htm 12/31
CNBC. (2018) These simple steps will help you stop checking your phone so much
The Economist 1843 Magazine. (2016) The scientists who make apps addictive
Retrieved from https://www.1843magazine.com/features/the-scientists-who-make-apps-addictive
The Guardian. (2018) Mobile phone addiction? It’s time to take back control
Retrieved from https://www.theguardian.com/technology/2018/jan/27/mobile-phone-addiction-apps-break-the-habit-take-back-control
Webmd. (2012) Addicted to Your Smartphone? Here's What to Do
Retrieved from https://www.webmd.com/balance/guide/addicted-your-smartphone-what-to-do#1
Business Insider. (2018) These are the sneaky ways apps like Instagram, Facebook, Tinder lure you in and get you 'addicted'
Retrieved from https://www.businessinsider.com/how-app-developers-keep-us-addicted-to-our-smartphones-2018-1
You’re walking into work after getting your usual morning drink. Everyone is working like always, with the normal office noise you’ve become used to. You check your Key Performance Indicators (KPIs) to make sure everything is running smoothly, and you notice that although your system is processing information as usual, it’s classifying most of the data as being one category in particular. In a fit of confusion and suspense, you rush to review some of the records……. So, what happens now?
That depends on if you used an opaque model or a transparent model. If you used a transparent model, the data will be in that category (or within that range if you’re using numeric data); otherwise, the input parameters would have let you know that something important in your system was changing before you arrived at the office. Like the new data having extremes that wasn’t in the training set. At worst, you’ll discover that your model didn’t account for some sub-process you didn’t previously know could have existed. At least then you’ll gain a better understanding of the actual system and may be able to leverage that knowledge for other projects.
If you used an opaque model, you may see that the data isn’t in that class (or range) and still not know what went wrong. After many hours of playing with the data and your model, at best you might learn that “some” parameter caused the information to become erroneous once your business got the particular combination of inputs it currently has. The remedy now is simply retrain the model on the new data (once you hand label at least a few thousand of them), because opaque models just fit the model to the data and does not validate any specific hypothesis or structure.….. Does that sound like a good explanation of what went wrong and why, or even how to fix it? Does this experience sound like something that would affirm confidence in your techniques or confidence in your understanding of the business and mathematics?
Transparent models give you the power to understand exactly what is going on with your system from top to bottom; however, they require more research and an advanced understanding of mathematics (particularly statistics) and of the business itself. This is required because transparent models are based on specific mathematical patterns; which would need to be match to the corresponding business process. On the other hand, opaque models are much more flexible and, in some cases, more accurate (knowing advanced mathematics can help, but only so much). Opaque models are also good for automating tasks that already have many labeled records and in the sense is built from bottom to top.
Opaque models aren’t as centered as transparent models are, on a particular pattern or test. Opaque models get their power from having so many parameters to tune that the model behaves like clay and fits its structure to any dataset. In doing so nothing can be said with any certainty about the final form it did get. Or even what patterns are statistically significant based on the domain. This why opaque models tend to be accurate but also impractical.
When doing quantitative analysis and data mining for business needs, people often fail to recognize the strengths and weaknesses of opaque and transparent models. Depending on your use case, these differences may not matter. For example, if you just want to identify which grocery bills are outliers in regard to the number of loaves of bread bought, most methods will get you the needed result. But if your trying to analyze data to learn more about how flexible a business process is or how long it will take a supplier to get their product to you then you need to determine which model in more effective and appropriate, as one situation the model is meant to find and explain real-life process and the other is just meant to be accurate in its prediction.
The capability you want to have will usually require a combination of transparent and opaque models. The important part is to understand the system being created. That why when you do use opaque models, you can place them strategically to mitigate possible risks should the system randomly malfunction.
Many people today understand that algorithms can control a variety of machines, software, social networks, financial transactions, etc. but there are few regulations for these algorithms. The most known regulations impacting algorithms were made for the financial sector, and focused on complying with already standing laws or not stopping functionality of the trading environment.
Upon researching this issue I am proposing a methodology to compare algorithms systemically, and create understandable risk metrics to ensure safety and compliance with the law. So, how can we expect to benefit from regulating algorithmic transparency? There has been many proposed methods for reverse-engineering of algorithms and analysis of the problems algorithms have created, but I think an institutional approach to dealing with abuses and standards for algorithms in the increasingly automated information age is critical for any society in the years to come.
Reports of algorithmic social-technical biases and organizational maleficence such as Facebook’s newsfeed promoting the icebreaker challenge over the Ferguson protests for the former (reference 15) and Google’s anti-trust investigation around their ranking algorithms and their algorithms race-name driven search results for the latter (reference 15). There was also an algorithm used by Staple’s website that generated greater discounts for wealthier customers (2), thus helping to maintain the wealth gap among communities. These types of instances have shown the public the potential problematic effect our increasingly algorithmic world can have when left unchecked.
Many cite the complexity of algorithms as barriers to regulation. Often times the length and sophistication of the code and the mathematics behind it scare those thinking about the creation of regulations, as well as algorithms that change or evolve based on the data it has and the systems it has interacted with. Since it is common for many people to have a dis-interest and animus for mathematics and computer science, explanations by the creators of the algorithm and/or field experts are un-interpretable, assuming they are able to give an explanation themselves. This obstacle is exacerbated by the act that some algorithms are trade secrets. These issues are some of the reasons for the opaqueness of algorithms.
Although algorithms can indeed be complex that is not a good reason to let incidences like google’s ranking algorithm (mentioned earlier) go unchecked. Cell phones are also complex, and sometimes large, but we have been able to regulate them. One key idea I suggest to keep in mind is that not every detail must be accounted for all at once, it’s ok to strategically break down these complex systems into smaller parts and then create “check points” where key performance indicators are inspected. Commercial web metrics like time on site and analysis of the user website paths, as mentioned in "Open Data and Algorithmic Regulation.", can aid in determining what key performance indicators are best for certain types checks.
However one must wonder, what are practical methods for fostering and enforcing algorithmic transparency. There have been suggestions of strategies for implementing algorithm disclosures. Zeynep Tufekci at the Centre for internet and human rights proposed standards for transparency that included:
Specific policies for an industry should be created by the relative governing body for that industry (trade commission for trading algorithms, FDA for medical data handling algorithms, etc.) and the algorithm review board for reviewing algorithms. The algorithm review board could be external to the government as it would review government algorithms like those mentioned in the conclusion, and be run by an elected official. This will be needed since algorithms can be so complex and cover so many industries and process so much information, their analysis will need specialized knowledge as well. These regulators have to focus their efforts wisely because of the massive and diverse aspects of algorithms. They will have to understand the purpose of the algorithm and consider the complexity of the source code, the organizations explanation of it, and the test results, once the organization discloses the source code* to the regulating body.
*Note: Anytime code disclosures are referenced, it’s meant to include the creator’s and tester’s notes and any other supplemental material.
As there is a need to focus resources efficiently, utilizing the code length of an algorithm in a standard coding language as a proxy for complexity in conjunction with the organization’s notes and descriptions can ensure we are tackling relevant algorithms and risks (rather than trivial algorithms, like the one used to invert picture colors) relative to their potential for harm consistently.
Once an appropriate method for reviewing and testing the organization’s algorithms has been chosen, regulators will need to determine the specific impact areas to be reviewed, inferred data about the consumer, the time interval needed and acceptable error rates before they begin the tests. If a certain sub-process is needed (and risks have been reasonably mitigated) but is not as predictable because of the sensitive information it maybe privy to, it can be tested using the scraping audit method suggested in "Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms."( Sandvig, Christian et al reference 5). Where a researcher might issue repeated queries to a platform and get the results for needed metric and statistics by key use cases and functions.
After all the tests are done then the report would be released. This report will give the regulator’s a through explanation of the algorithm’s purpose, the data utilized and inferred by the algorithm and how the algorithm achieves its goals. Some specific information to be presented in the report is what systems are controlled by the algorithm, what are potential errors and how these errors are prevented and the systems impact to the user and environment. Parts of these reports (excluding the risks of harm)can be redacted for special cases where the code is a trade secrete or can only work properly when users are unaware.
With the big data boom, the steady progression of machine learning and the overall increased usage of the internet, it has become clear data and algorithms have power. As we go forward algorithms and data will be expected to also be utilized by the government for more egalitarian and altruistic goals rather than just mass surveillance and weaponry. Such uses can also be regulated using the guidelines that were presented. This goes to show how necessary algorithmic regulation and transparency is now, and how much more it’ll be needed in the future.
Current State and Stakeholders
Algorithms typically are a combination of basic business and computer logic, mathematics and statistics, and specific use cases. They can be implemented mechanically but normally they are implemented in software to be faster and to collect more data. Many times these algorithms interact and collect data from other algorithms as well as human beings.
Organizations need these automated processes to save and better serve their stakeholders and/or customers. The data collected from such systems provide better insight into routines and trends much better than any person (or group), as some of the calculations done are done with millions even billions of variables and/or observations. This automatic control and information retrieval is paramount to the progress of the world but also exacerbates the power (and wage) gap between normal citizen and affluent organizations (governments, corporations, nonprofits, etc.).
As of now the regulation of algorithms in the U.S.A. have only been designed for algorithms used for trading. While this is definitely needed, these laws are more about ensuring the algorithms don’t crash the trading environment or place orders/sells that are impractical than ensuring people are not being unfairly manipulated, discriminated or harmed by the algorithms. These laws do not even take into account the internet of things and how everything from military drones to how many patty-pies are on Walmart shelves are control or influenced by algorithms. But it is useful from understanding some failures to consider from an algorithm even when its working on data from both the consumer and owner. Like over efficiency, when the algorithms trade massive volumes in nanoseconds.
With all the data about anything or anyone being collected by organizations (whether its today’s weather, your doctor visit, or even which instagram post you look at) it’s not just enough to know what data is collected but also why. Algorithms are the means by which the collected data is utilized (of course before or after it’s used for just for records and summary information). Understanding the algorithm that uses (or manipulates) the data and what information can be inferred from that data can better help the people decide if the data should be collected and if the purpose and method it’s being used for is legitimate. In the case of cambridge analytica these insights could have changed some users’ decision to allow their app access.
Some of the milestones that prevent the public from understanding the need for algorithmic regulation and thus partially responsible for the lack political pressure for regulation of data is the abstract nature of algorithms. This may be why it took so long for even the FTC to create an office for algorithmic transparency (Noyes, Katherine reference 7). Algorithms are not tangible objects that can be seen or held, and to make matters worse is many times their effects are subtle and diverse from simple list sorting to determining what video to suggest. Many algorithms manage systems, for instance internet protocols. This means the consumer will not see any change in the service or goods until the algorithm fails. As the saying goes “out of sight, out of mind”, and as mentioned in "The Ethics of Algorithms: From Radical Content to Self-‐driving Cars." (reference 15) many people didn’t even realize their facebook newsfeed is control by an algorithm. These complex tasks are often accomplished by advanced mathematics, highly sophisticated or massive amounts of code.
The complexity of the algorithm makes it difficult for technicians to explain to non-techs, and intimidates more accomplished communicators from attempting to tackle the topic. This complexity also allows for many algorithms to change or evolve as well, making its actions even more difficult to understand or predict, sometimes even for its creators (hence the term “black-box” or opaque models). But with all of this mystery and power this is why it is so important to have these systems reviewed and regulated to ensure the algorithm is working as intended and lawfully.
Without knowledge of how an algorithm works or even its existence, we cannot assure its safety, efficacy, adherence to industry standards, the informed consent of the user or legitimacy. Since these programs do control real systems and effect real people it can break real laws and cause real harm. For instance google’s indexing algorithm is suspected of placing its links atop of the results of business inquiries (reference Sandvig, Christian et. Al). In order to make sure the regulatory resources are used efficiently algorithms should be reviewed primarily in terms relating potentially illegal or hazardous results.
Another issue that makes to be aware of is that some algorithms are proprietary or a trade secret. Having a program that can do tasks never done before is innovative and should be rewarded, however that does not mean the software is should not regulated in such a way that takes into account it’s value to the organization. One might think of how some food products (like coke) might have secret ingredients, but they are still regulated and inspected to make sure the food is safe to eat.
With all the difficulties discussed so far there is still ways we can ensure the rights for all citizens are protected and support their general well-being. By testing the software and having the source code disclosed industry benchmarks can be established, safety measures created, risks regulated and general best practices can easily be created and maintained.
As regulations and transparency are meant to protect various types of rights, the specific metrics and choice of techniques used to prevent volatile effects of algorithms will need to be implemented via the relative government agencies for the domain in which the algorithm is being used. For instance algorithms governing medical data and products would be regulated by the FDA, and systems controlling chemical processes and their waste products would be regulated by the EPA. Social Medial data would be regulated by health and human services.
These regulating bodies already know what information is not meant to be used in that industry because of things like insider trading, collusion, HIPPA laws or discrimination. However there may still be a need for an organization dedicated to algorithmic transparency, to review and tackle problems that are specifically related to algorithms such as scalability, emergent bias, human controls, etc.
This regulatory body could also review government algorithms in much the same way it would for corporate algorithms. Many of the government’s own systems would be impacted by the standards developed for algorithmic regulation. Ideally, this new institution could be financed via publishing (and getting subscribers) industry benchmarks and certification fees, similar to the FED.
Some basic rights that consumers are entitled to are control of their data. The ability for consumers to both correct and remove their data is basic right. These systems may collect data, and even infer more data that wasn’t directly given but not all data is captured correctly. Consumers will be more open to engage with the internet and the business product when they know they can be proactive and partners with their favorite brand ( or product or business) to protect their own reputation and legacy. Infringing on this right is akin to allowing an organization to publish false information, or slanderous content. You have the right to contest your credit score why not you data.
Europe already implemented laws allowing users to control their data and the right “to be forgotten”. Enabling consumers to have their data deleted and being notified of when data is being collected. Many companies have been able to comply with their regulations, although many lawsuits were filed.
Algorithmic transparency is alluded to in the EU laws as well, since users will have to be notified of how their data will be used. But more information about algorithms specifically needs to be advocated for. Any business using an algorithm for critical operations, or by public use needs to be fully understood, tested, and reviewed. We do not let bridges be made without an explanation for its function, or plans to have unknown safety measures. Understanding the function of an algorithm is essential to its use and control.
Zeynep Tufekci at the Centre for internet and human rights proposed standards for transparency that included consumer control of data and algorithmic transparency, as well as State Owned Backdoors or access points particularly in infrastructure. However having state owned backdoors will only ensure the government and bad actors can not only randomly test their software (assuming they wouldn’t need interpretation from the organization itself) but also have copies of people’s private data and potentially disrupt imperative services.
Even with auditors knowing what all of the data is that is being used by the algorithms the code can be complex and obscure. Which is why algorithm transparency and algorithm disclosures (such as producing the source code) are fundamental social responsibilities of any entity. Making C the standard language that codes are reviewed in creates a standard to compare the code. Thus making it easier for regulators to understand allowing them to focus on other areas of the algorithm, while the length of the code gives a useful metric for estimating complexity.
The mechanism and purpose of the algorithm should be used to determine the scope and methods for regulation. This provide regulators and organizations with a framework to understand what areas will need the most focus. Systems created for suggesting friends on social media maybe complex and has an obvious purpose but as the potential damage of the algorithm malfunctioning will be limited in intensity (as ultimately it is up to the user to approve friends) and also is specific in the areas it affects because the algorithm will primarily affect people socially (to reiterate this example is for social media and not for sites like linkedin that is more professional networking and thus a biased suggestion in that context would be akin to employment discrimination).
The error rates of the algorithms should be at least equal to or less than that of a human operator for more hazardous effects. These error rates can be determined during the final verification and validation of the program where real or simulated data is used. These error rates are one type of key performance indicator. Commercial metrics like time on site and analysis of the user website paths are a good start and can aid in determining what other key performance indicators are best for certain types of checks as well (as mentioned in "Open Data and Algorithmic Regulation."). These methods provide the regulators with specific frequencies and benchmarks for the algorithm.
Since error rates and the frequency of audits are expected to be influenced by the pace at which the algorithm is executed, an algorithms expected execution rate should be disclosed. As technology changes rapidly the auditors have to highlight areas of the code that will be re-examined after it “learns” or “evolves”. This is particularly necessary for large complex systems that are designed to adapt to incoming data that has a certain level of uncertainty.
Many algorithms can provide previously unknown insights into the subject of analysis, particularly if that subject is a human. The organization using that algorithm needs to be aware that this inferred data is being collected so as to protect it as needed (for example health information, or religious identity, etc.). For systems that infer information about a consumer, that inferred data should be disclosed to the consumer in order for them to give informed consent.
Methods for analyzing algorithms can be useful for understanding consistent behaviors and thus inconsistent behavior. These methods have already been used for creating metrics for algorithms, such as the testing of GPUs and CPUs. Many methods involve using simulated data for programs that don’t influence humans, or use surveys and cohorts for systems that do. Hamilton et al proposed 5 methods to test and audit algorithms.
Statistical inference on the algorithms behaviors can be created from the sample data collected and from the code disclosure. This will highlight errors that happen more frequently than expected or even trends that were too subtle to notice, or discover inferred information about the subject is being used or impacted (i.e. algorithms that discriminate based on zip codes but those zip codes strongly correlate to race). From there appropriate mitigation plans can be created and approved. This proactive approach could have helped prevent many incidents, such as the staple algorithm giving discounts to the wealthy.
Once the tests had been concluded a purpose and methodology report can be published. This report will be presented to the governing body to explain the data utilized and inferred by the algorithm, its purpose and how the data and the algorithm achieves that purpose. In the report the organization’s expectation of the algorithm’s long term effects on the consumers and environment is to be addressed as well as any other risks. In the organization’s address it will be reporting on the number of people effected by the algorithm over whatever time span is determined appropriate given the time the algorithm is expected to be in use. This is in addition to sections on potential failure-modes, error rates and how they are mitigated and their potential impact to the user and environment.
Also the address will cover what are harmful effects of the algorithm’s efficiency and protocol to the users and environment involved (social-technical impact). As we’ve seen sometimes even when the algorithm is known, understood and operates as expected it still can have tremendous power, as best explored in studies done by Robert Epstein and Ronald E. Robertson (13). This will encourage studying the long term impact of an algorithm and help prevent unexpected damage.
These regulations can best keep the algorithm’s proprietary information protected by ensuring the source code is only reviewed by trustworthy government officials. The organization has to then be able to review and suggest redactions for the final system’s ”Purpose and Methods” disclosure before it is published. The proposed redactions must have supporting evidence for any rational provided. Using these guidelines any large or small entity will benefit from increased standards and benchmark reporting, algorithmic research, and security of their proprietary information.
The new an abundant amounts of data and processing power available to algorithms make it clear that some algorithms may be able to help society and therefore should be created, implement and regulated for the public. These social engineering algorithms may help to raise the quality of life for people all around the world.
The WHO could use provide a free app to quickly educate people on common and serious diseases and use those queries of communicable diseases to get better estimates and predictive variables for prevent the spread of disease. Renewable energy businesses can use traffic volume data and weather data to better estimate energy usage and prepare for weather events for energy creation. An example of this is how solar panels adjust to the direction of the sun.
With all of the data being collected, edited, deleted, and the algorithms using and manipulating it, people exposure to and understanding of algorithms are sure to raise. Publicly reviewed and test algorithms are creating standards in their respective domains. Examples of such cases are open-source software, common used methodologies such as k-means or regression. These programs are created for and maintained by everyday users but still used by many large companies as well, showing that algorithmic transparency had has and will have a place in our society
Editorial. "Algorithmic Transparency: End Secret Profiling." EPIC. Electronic Privacy Information Center, 10 Dec. 2015. Web. 20 Jan. 2016.
Rosenblat, Alex, Tamara Kneese, and Danah Boyd. "Workshop Primer: Algorithmic Accountability." SSRN Electronic Journal SSRN Journal (n.d.): n. pag. Algorithmic Accountability. Data & Society Research Institute, 17 Mar. 2014. Web. 26 Jan. 2016.
Dwoskin, Elizabeth. "Trends to Watch in 2015: From Algorithmic Accountability to the Uber of X." Digits RSS. Wall Street Journel, 08 Dec. 2014. Web. 23 Jan. 2016.
Sandvig, Christian, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. "Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms." (n.d.): n. pag. Center for Media, Data and Society, 17 Nov. 2014. Web. 26 Jan. 2016.
Matias, Nathan J. "Uncovering Algorithms: Looking Inside the Facebook Newsfeed." MIT Center for Civic Media, 22 July 2014. Web. 28. Jan. 2016.
Noyes, Katherine. "Don't Assume Your Facebook Friends Are Ignoring You; It Could Simply Be the Site's Algorithms at Work." ComputerWorld, 9 Apr. 2015. Web. 30 Jan. 2016.
McFarlane, Greg. "High-Frequency Trading Regulations (ETFC) | Investopedia." Investopedia. Investopedia, 15 Apr. 2015. Web. 1 Feb. 2016.
Diakopoulos, Nick. "Algorithmic Defamation: The Case of the Shameless Autocomplete." Musings on Media. NickDiakopoulos, 6 Aug. 2013. Web. 1 Feb. 2016.
Diakopoulos, Nicholas. "Algorithmic Accountability." Digital Journalism 3.3 (2014): 398-415. Musings on Media. Musings on Media, 7 Nov. 2014. Web. 3 Feb. 2016.
O'Rielly, Tim. "Open Data and Algorithmic Regulation." Open Data and Algorithmic Regulation. Beyond Transparency, 16 Oct. 2013. Web. 4 Feb. 2016.
Grimes, Seth. "10 Text, Sentiment, and Social Analytics Trends For 2016." VentureBeat. Venture Beat, 12 Jan. 2016. Web. 4 Feb. 2016.
Epstein, Robert, and Ronald E. Robertson. "The Search Engine Manipulation Effect (SEME) and Its Possible Impact on the Outcomes of Elections." PNAS 14.10 (2015): 4512-521. PNAS. American Institute for Behavioral Research and Technology, 4 Aug. 2015. Web. 5 Feb. 2016.
Lohr, Steve. "Netflix Cancels Contest After Concerns Are Raised About Privacy." The New York Times. The New York Times, 12 Mar. 2010. Web. 6 Feb. 2016.
Tufekci, Zeynep, Jillian C. York, Ben Wagner, and Frederike Kaltheuner. "The Ethics of Algorithms: From Radical Content to Self-‐driving Cars." Center for Internet and Human Rights (n.d.): n. pag. GCCS 2015. Web. 5 Feb. 2016.
Hakim, Danny. "Right to Be Forgotten? Not That Easy." The New York Times. The New York Times, 29 May 2014. Web. 6 Feb. 2016.
Editorial. "Europe Moves Ahead on Privacy." The New York Times. The New York Times, 03 Feb. 2013. Web. 6 Feb. 2016.