Blog Archives

Regulating Algorithms: Current Practices, Issues and Possible Solutions

8/23/2018

Executive Summary:

Many people today understand that algorithms can control a variety of machines, software, social networks, financial transactions, etc. but there are few regulations for these algorithms. The most known regulations impacting algorithms were made for the financial sector, and focused on complying with already standing laws or not stopping functionality of the trading environment.
Upon researching this issue I am proposing a methodology to compare algorithms systemically, and create understandable risk metrics to ensure safety and compliance with the law. So, how can we expect to benefit from regulating algorithmic transparency? There has been many proposed methods for reverse-engineering of algorithms and analysis of the problems algorithms have created, but I think an institutional approach to dealing with abuses and standards for algorithms in the increasingly automated information age is critical for any society in the years to come.

Reports of algorithmic social-technical biases and organizational maleficence such as Facebook’s newsfeed promoting the icebreaker challenge over the Ferguson protests for the former (reference 15) and Google’s anti-trust investigation around their ranking algorithms and their algorithms race-name driven search results for the latter (reference 15). There was also an algorithm used by Staple’s website that generated greater discounts for wealthier customers (2), thus helping to maintain the wealth gap among communities. These types of instances have shown the public the potential problematic effect our increasingly algorithmic world can have when left unchecked.
Many cite the complexity of algorithms as barriers to regulation. Often times the length and sophistication of the code and the mathematics behind it scare those thinking about the creation of regulations, as well as algorithms that change or evolve based on the data it has and the systems it has interacted with. Since it is common for many people to have a dis-interest and animus for mathematics and computer science, explanations by the creators of the algorithm and/or field experts are un-interpretable, assuming they are able to give an explanation themselves. This obstacle is exacerbated by the act that some algorithms are trade secrets. These issues are some of the reasons for the opaqueness of algorithms.
Although algorithms can indeed be complex that is not a good reason to let incidences like google’s ranking algorithm (mentioned earlier) go unchecked. Cell phones are also complex, and sometimes large, but we have been able to regulate them. One key idea I suggest to keep in mind is that not every detail must be accounted for all at once, it’s ok to strategically break down these complex systems into smaller parts and then create “check points” where key performance indicators are inspected. Commercial web metrics like time on site and analysis of the user website paths, as mentioned in "Open Data and Algorithmic Regulation.", can aid in determining what key performance indicators are best for certain types checks.
However one must wonder, what are practical methods for fostering and enforcing algorithmic transparency. There have been suggestions of strategies for implementing algorithm disclosures. Zeynep Tufekci at the Centre for internet and human rights proposed standards for transparency that included:

Consumer control of data
Transparency via:
1. Source code publication and explanation
2. Investigation of codes by reverse engineering the code
State Owned Backdoors or access points particularly in infrastructure.

The aspect of enforcing transparency is further explored by Christian Sandvig et al. proposing 5 ways of enforcing/conducting algorithm inspections.

Code audit
Noninvasive User Audit(survey)
Scraping Audit (systematic querys)
Pretend to be users
Crowdsourced Audit / Collaborative Audit(hire users)

These methods of enforcing transparency offer a lot of benefits such as not necessarily needing consent of the algorithm owner and being able to generate statistical reports on the system as a whole. However these methods makes no claim as to what are harmful algorithms or even a criteria to use compare algorithms. The backdoor option suggested by Kilian Vieth could further exacerbate the theft of citizen and corporations private data and metadata.
Specific policies for an industry should be created by the relative governing body for that industry (trade commission for trading algorithms, FDA for medical data handling algorithms, etc.) and the algorithm review board for reviewing algorithms. The algorithm review board could be external to the government as it would review government algorithms like those mentioned in the conclusion, and be run by an elected official. This will be needed since algorithms can be so complex and cover so many industries and process so much information, their analysis will need specialized knowledge as well. These regulators have to focus their efforts wisely because of the massive and diverse aspects of algorithms. They will have to understand the purpose of the algorithm and consider the complexity of the source code, the organizations explanation of it, and the test results, once the organization discloses the source code* to the regulating body.
*Note: Anytime code disclosures are referenced, it’s meant to include the creator’s and tester’s notes and any other supplemental material.

As there is a need to focus resources efficiently, utilizing the code length of an algorithm in a standard coding language as a proxy for complexity in conjunction with the organization’s notes and descriptions can ensure we are tackling relevant algorithms and risks (rather than trivial algorithms, like the one used to invert picture colors) relative to their potential for harm consistently.

Once an appropriate method for reviewing and testing the organization’s algorithms has been chosen, regulators will need to determine the specific impact areas to be reviewed, inferred data about the consumer, the time interval needed and acceptable error rates before they begin the tests. If a certain sub-process is needed (and risks have been reasonably mitigated) but is not as predictable because of the sensitive information it maybe privy to, it can be tested using the scraping audit method suggested in "Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms."( Sandvig, Christian et al reference 5). Where a researcher might issue repeated queries to a platform and get the results for needed metric and statistics by key use cases and functions.
After all the tests are done then the report would be released. This report will give the regulator’s a through explanation of the algorithm’s purpose, the data utilized and inferred by the algorithm and how the algorithm achieves its goals. Some specific information to be presented in the report is what systems are controlled by the algorithm, what are potential errors and how these errors are prevented and the systems impact to the user and environment. Parts of these reports (excluding the risks of harm)can be redacted for special cases where the code is a trade secrete or can only work properly when users are unaware.
With the big data boom, the steady progression of machine learning and the overall increased usage of the internet, it has become clear data and algorithms have power. As we go forward algorithms and data will be expected to also be utilized by the government for more egalitarian and altruistic goals rather than just mass surveillance and weaponry. Such uses can also be regulated using the guidelines that were presented. This goes to show how necessary algorithmic regulation and transparency is now, and how much more it’ll be needed in the future.

Current State and Stakeholders

Algorithms typically are a combination of basic business and computer logic, mathematics and statistics, and specific use cases. They can be implemented mechanically but normally they are implemented in software to be faster and to collect more data. Many times these algorithms interact and collect data from other algorithms as well as human beings.

Organizations need these automated processes to save and better serve their stakeholders and/or customers. The data collected from such systems provide better insight into routines and trends much better than any person (or group), as some of the calculations done are done with millions even billions of variables and/or observations. This automatic control and information retrieval is paramount to the progress of the world but also exacerbates the power (and wage) gap between normal citizen and affluent organizations (governments, corporations, nonprofits, etc.).
As of now the regulation of algorithms in the U.S.A. have only been designed for algorithms used for trading. While this is definitely needed, these laws are more about ensuring the algorithms don’t crash the trading environment or place orders/sells that are impractical than ensuring people are not being unfairly manipulated, discriminated or harmed by the algorithms. These laws do not even take into account the internet of things and how everything from military drones to how many patty-pies are on Walmart shelves are control or influenced by algorithms. But it is useful from understanding some failures to consider from an algorithm even when its working on data from both the consumer and owner. Like over efficiency, when the algorithms trade massive volumes in nanoseconds.
With all the data about anything or anyone being collected by organizations (whether its today’s weather, your doctor visit, or even which instagram post you look at) it’s not just enough to know what data is collected but also why. Algorithms are the means by which the collected data is utilized (of course before or after it’s used for just for records and summary information). Understanding the algorithm that uses (or manipulates) the data and what information can be inferred from that data can better help the people decide if the data should be collected and if the purpose and method it’s being used for is legitimate. In the case of cambridge analytica these insights could have changed some users’ decision to allow their app access.
Issues:

Some of the milestones that prevent the public from understanding the need for algorithmic regulation and thus partially responsible for the lack political pressure for regulation of data is the abstract nature of algorithms. This may be why it took so long for even the FTC to create an office for algorithmic transparency (Noyes, Katherine reference 7). Algorithms are not tangible objects that can be seen or held, and to make matters worse is many times their effects are subtle and diverse from simple list sorting to determining what video to suggest. Many algorithms manage systems, for instance internet protocols. This means the consumer will not see any change in the service or goods until the algorithm fails. As the saying goes “out of sight, out of mind”, and as mentioned in "The Ethics of Algorithms: From Radical Content to Self-‐driving Cars." (reference 15) many people didn’t even realize their facebook newsfeed is control by an algorithm. These complex tasks are often accomplished by advanced mathematics, highly sophisticated or massive amounts of code.
The complexity of the algorithm makes it difficult for technicians to explain to non-techs, and intimidates more accomplished communicators from attempting to tackle the topic. This complexity also allows for many algorithms to change or evolve as well, making its actions even more difficult to understand or predict, sometimes even for its creators (hence the term “black-box” or opaque models). But with all of this mystery and power this is why it is so important to have these systems reviewed and regulated to ensure the algorithm is working as intended and lawfully.
Without knowledge of how an algorithm works or even its existence, we cannot assure its safety, efficacy, adherence to industry standards, the informed consent of the user or legitimacy. Since these programs do control real systems and effect real people it can break real laws and cause real harm. For instance google’s indexing algorithm is suspected of placing its links atop of the results of business inquiries (reference Sandvig, Christian et. Al). In order to make sure the regulatory resources are used efficiently algorithms should be reviewed primarily in terms relating potentially illegal or hazardous results.
Another issue that makes to be aware of is that some algorithms are proprietary or a trade secret. Having a program that can do tasks never done before is innovative and should be rewarded, however that does not mean the software is should not regulated in such a way that takes into account it’s value to the organization. One might think of how some food products (like coke) might have secret ingredients, but they are still regulated and inspected to make sure the food is safe to eat.
Potential Solutions

With all the difficulties discussed so far there is still ways we can ensure the rights for all citizens are protected and support their general well-being. By testing the software and having the source code disclosed industry benchmarks can be established, safety measures created, risks regulated and general best practices can easily be created and maintained.
As regulations and transparency are meant to protect various types of rights, the specific metrics and choice of techniques used to prevent volatile effects of algorithms will need to be implemented via the relative government agencies for the domain in which the algorithm is being used. For instance algorithms governing medical data and products would be regulated by the FDA, and systems controlling chemical processes and their waste products would be regulated by the EPA. Social Medial data would be regulated by health and human services.
These regulating bodies already know what information is not meant to be used in that industry because of things like insider trading, collusion, HIPPA laws or discrimination. However there may still be a need for an organization dedicated to algorithmic transparency, to review and tackle problems that are specifically related to algorithms such as scalability, emergent bias, human controls, etc.
This regulatory body could also review government algorithms in much the same way it would for corporate algorithms. Many of the government’s own systems would be impacted by the standards developed for algorithmic regulation. Ideally, this new institution could be financed via publishing (and getting subscribers) industry benchmarks and certification fees, similar to the FED.
Some basic rights that consumers are entitled to are control of their data. The ability for consumers to both correct and remove their data is basic right. These systems may collect data, and even infer more data that wasn’t directly given but not all data is captured correctly. Consumers will be more open to engage with the internet and the business product when they know they can be proactive and partners with their favorite brand ( or product or business) to protect their own reputation and legacy. Infringing on this right is akin to allowing an organization to publish false information, or slanderous content. You have the right to contest your credit score why not you data.
Europe already implemented laws allowing users to control their data and the right “to be forgotten”. Enabling consumers to have their data deleted and being notified of when data is being collected. Many companies have been able to comply with their regulations, although many lawsuits were filed.
Algorithmic transparency is alluded to in the EU laws as well, since users will have to be notified of how their data will be used. But more information about algorithms specifically needs to be advocated for. Any business using an algorithm for critical operations, or by public use needs to be fully understood, tested, and reviewed. We do not let bridges be made without an explanation for its function, or plans to have unknown safety measures. Understanding the function of an algorithm is essential to its use and control.
Zeynep Tufekci at the Centre for internet and human rights proposed standards for transparency that included consumer control of data and algorithmic transparency, as well as State Owned Backdoors or access points particularly in infrastructure. However having state owned backdoors will only ensure the government and bad actors can not only randomly test their software (assuming they wouldn’t need interpretation from the organization itself) but also have copies of people’s private data and potentially disrupt imperative services.
Even with auditors knowing what all of the data is that is being used by the algorithms the code can be complex and obscure. Which is why algorithm transparency and algorithm disclosures (such as producing the source code) are fundamental social responsibilities of any entity. Making C the standard language that codes are reviewed in creates a standard to compare the code. Thus making it easier for regulators to understand allowing them to focus on other areas of the algorithm, while the length of the code gives a useful metric for estimating complexity.
The mechanism and purpose of the algorithm should be used to determine the scope and methods for regulation. This provide regulators and organizations with a framework to understand what areas will need the most focus. Systems created for suggesting friends on social media maybe complex and has an obvious purpose but as the potential damage of the algorithm malfunctioning will be limited in intensity (as ultimately it is up to the user to approve friends) and also is specific in the areas it affects because the algorithm will primarily affect people socially (to reiterate this example is for social media and not for sites like linkedin that is more professional networking and thus a biased suggestion in that context would be akin to employment discrimination).
The error rates of the algorithms should be at least equal to or less than that of a human operator for more hazardous effects. These error rates can be determined during the final verification and validation of the program where real or simulated data is used. These error rates are one type of key performance indicator. Commercial metrics like time on site and analysis of the user website paths are a good start and can aid in determining what other key performance indicators are best for certain types of checks as well (as mentioned in "Open Data and Algorithmic Regulation."). These methods provide the regulators with specific frequencies and benchmarks for the algorithm.
Since error rates and the frequency of audits are expected to be influenced by the pace at which the algorithm is executed, an algorithms expected execution rate should be disclosed. As technology changes rapidly the auditors have to highlight areas of the code that will be re-examined after it “learns” or “evolves”. This is particularly necessary for large complex systems that are designed to adapt to incoming data that has a certain level of uncertainty.
Many algorithms can provide previously unknown insights into the subject of analysis, particularly if that subject is a human. The organization using that algorithm needs to be aware that this inferred data is being collected so as to protect it as needed (for example health information, or religious identity, etc.). For systems that infer information about a consumer, that inferred data should be disclosed to the consumer in order for them to give informed consent.
Methods for analyzing algorithms can be useful for understanding consistent behaviors and thus inconsistent behavior. These methods have already been used for creating metrics for algorithms, such as the testing of GPUs and CPUs. Many methods involve using simulated data for programs that don’t influence humans, or use surveys and cohorts for systems that do. Hamilton et al proposed 5 methods to test and audit algorithms.

Disclosing source code
This method involves the organization turning over all code, software, documentation, and general reports.
Noninvasive User Audit (survey users)
After users interface with the program they are given surveys to complete. The survey can help shed light on the users experience and consistent actions.
Scrapping audit (auditors script surveying platform)

An auditor can directly query databases and programs. Similar to a temporary government back door.
Sock puppets (dummy accounts)
Fake accounts are created to allow the auditor to test the system with various types of user’s information and record the systems response.
Crowd Sourcing
This final method relies on recruiting real user to test the algorithm. After they have reported back the information can be compiled an analyzed.

Three of these methods rely on the assumption that the system in question has some effect on users specifically. However if one were to extend that paradigm to algorithms and the interaction with data(inputs and outputs), whether simulated or real, then using surveys(or collecting samples) of the processes actions is useful. Thus the methods still stand as a reasonable methodology for auditing algorithms for testing and confirming algorithmic transparency.
Statistical inference on the algorithms behaviors can be created from the sample data collected and from the code disclosure. This will highlight errors that happen more frequently than expected or even trends that were too subtle to notice, or discover inferred information about the subject is being used or impacted (i.e. algorithms that discriminate based on zip codes but those zip codes strongly correlate to race). From there appropriate mitigation plans can be created and approved. This proactive approach could have helped prevent many incidents, such as the staple algorithm giving discounts to the wealthy.
Once the tests had been concluded a purpose and methodology report can be published. This report will be presented to the governing body to explain the data utilized and inferred by the algorithm, its purpose and how the data and the algorithm achieves that purpose. In the report the organization’s expectation of the algorithm’s long term effects on the consumers and environment is to be addressed as well as any other risks. In the organization’s address it will be reporting on the number of people effected by the algorithm over whatever time span is determined appropriate given the time the algorithm is expected to be in use. This is in addition to sections on potential failure-modes, error rates and how they are mitigated and their potential impact to the user and environment.
Also the address will cover what are harmful effects of the algorithm’s efficiency and protocol to the users and environment involved (social-technical impact). As we’ve seen sometimes even when the algorithm is known, understood and operates as expected it still can have tremendous power, as best explored in studies done by Robert Epstein and Ronald E. Robertson (13). This will encourage studying the long term impact of an algorithm and help prevent unexpected damage.
These regulations can best keep the algorithm’s proprietary information protected by ensuring the source code is only reviewed by trustworthy government officials. The organization has to then be able to review and suggest redactions for the final system’s ”Purpose and Methods” disclosure before it is published. The proposed redactions must have supporting evidence for any rational provided. Using these guidelines any large or small entity will benefit from increased standards and benchmark reporting, algorithmic research, and security of their proprietary information.
Down Stream

The new an abundant amounts of data and processing power available to algorithms make it clear that some algorithms may be able to help society and therefore should be created, implement and regulated for the public. These social engineering algorithms may help to raise the quality of life for people all around the world.

The WHO could use provide a free app to quickly educate people on common and serious diseases and use those queries of communicable diseases to get better estimates and predictive variables for prevent the spread of disease. Renewable energy businesses can use traffic volume data and weather data to better estimate energy usage and prepare for weather events for energy creation. An example of this is how solar panels adjust to the direction of the sun.

With all of the data being collected, edited, deleted, and the algorithms using and manipulating it, people exposure to and understanding of algorithms are sure to raise. Publicly reviewed and test algorithms are creating standards in their respective domains. Examples of such cases are open-source software, common used methodologies such as k-means or regression. These programs are created for and maintained by everyday users but still used by many large companies as well, showing that algorithmic transparency had has and will have a place in our society

References:

1.
https://epic.org/algorithmic-transparency/
Editorial. "Algorithmic Transparency: End Secret Profiling." EPIC. Electronic Privacy Information Center, 10 Dec. 2015. Web. 20 Jan. 2016.

2.
http://www.datasociety.net/pubs/2014-0317/AlgorithmicAccountabilityPrimer.pdf
Rosenblat, Alex, Tamara Kneese, and Danah Boyd. "Workshop Primer: Algorithmic Accountability." SSRN Electronic Journal SSRN Journal (n.d.): n. pag. Algorithmic Accountability. Data & Society Research Institute, 17 Mar. 2014. Web. 26 Jan. 2016.

3.

http://blogs.wsj.com/digits/2014/12/08/trends-to-watch-in-2015-from-algorithmic-accountability-to-the-uber-of-x/
Dwoskin, Elizabeth. "Trends to Watch in 2015: From Algorithmic Accountability to the Uber of X." Digits RSS. Wall Street Journel, 08 Dec. 2014. Web. 23 Jan. 2016.

4.
Blank

5.
http://www-personal.umich.edu/~csandvig/research/Auditing%20Algorithms%20--%20Sandvig%20--%20ICA%202014%20Data%20and%20Discrimination%20Preconference.pdf
Sandvig, Christian, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. "Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms." (n.d.): n. pag. Center for Media, Data and Society, 17 Nov. 2014. Web. 26 Jan. 2016.

6.
https://civic.mit.edu/blog/natematias/uncovering-algorithms-looking-inside-the-facebook-news-feed
Matias, Nathan J. "Uncovering Algorithms: Looking Inside the Facebook Newsfeed." MIT Center for Civic Media, 22 July 2014. Web. 28. Jan. 2016.

7.
http://wwftcw.computerworld.com/article/2908157/the-ftc-is-worried-about-algorithmic-transparency-and-you-should-be-too.html
Noyes, Katherine. "Don't Assume Your Facebook Friends Are Ignoring You; It Could Simply Be the Site's Algorithms at Work." ComputerWorld, 9 Apr. 2015. Web. 30 Jan. 2016.

8.
http://www.investopedia.com/articles/active-trading/041515/highfrequency-trading-regulations.asp
McFarlane, Greg. "High-Frequency Trading Regulations (ETFC) | Investopedia." Investopedia. Investopedia, 15 Apr. 2015. Web. 1 Feb. 2016.

9.
http://www.nickdiakopoulos.com/2013/08/06/algorithmic-defamation-the-case-of-the-shameless-autocomplete/
Diakopoulos, Nick. "Algorithmic Defamation: The Case of the Shameless Autocomplete." Musings on Media. NickDiakopoulos, 6 Aug. 2013. Web. 1 Feb. 2016.

10.
http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/algorithmic_accountability_final.pdf
Diakopoulos, Nicholas. "Algorithmic Accountability." Digital Journalism 3.3 (2014): 398-415. Musings on Media. Musings on Media, 7 Nov. 2014. Web. 3 Feb. 2016.

11.
http://beyondtransparency.org/chapters/part-5/open-data-and-algorithmic-regulation/
O'Rielly, Tim. "Open Data and Algorithmic Regulation." Open Data and Algorithmic Regulation. Beyond Transparency, 16 Oct. 2013. Web. 4 Feb. 2016.

12.
http://venturebeat.com/2016/01/12/10-text-sentiment-and-social-analytics-trends-for-2016/
Grimes, Seth. "10 Text, Sentiment, and Social Analytics Trends For 2016." VentureBeat. Venture Beat, 12 Jan. 2016. Web. 4 Feb. 2016.

13.
http://www.pnas.org/content/112/33/E4512.full.pdf
Epstein, Robert, and Ronald E. Robertson. "The Search Engine Manipulation Effect (SEME) and Its Possible Impact on the Outcomes of Elections." PNAS 14.10 (2015): 4512-521. PNAS. American Institute for Behavioral Research and Technology, 4 Aug. 2015. Web. 5 Feb. 2016.

14.
http://www.nytimes.com/2010/03/13/technology/13netflix.html?_r=0
Lohr, Steve. "Netflix Cancels Contest After Concerns Are Raised About Privacy." The New York Times. The New York Times, 12 Mar. 2010. Web. 6 Feb. 2016.

15.
Tufekci, Zeynep, Jillian C. York, Ben Wagner, and Frederike Kaltheuner. "The Ethics of Algorithms: From Radical Content to Self-‐driving Cars." Center for Internet and Human Rights (n.d.): n. pag. GCCS 2015. Web. 5 Feb. 2016.

16.
http://www.nytimes.com/2014/05/30/business/international/on-the-internet-the-right-to-forget-vs-the-right-to-know.html?_r=0
Hakim, Danny. "Right to Be Forgotten? Not That Easy." The New York Times. The New York Times, 29 May 2014. Web. 6 Feb. 2016.

17
http://www.nytimes.com/2013/02/04/opinion/europe-moves-ahead-on-privacy-laws.html
Editorial. "Europe Moves Ahead on Privacy." The New York Times. The New York Times, 03 Feb. 2013. Web. 6 Feb. 2016.

0 Comments

Blogs:

Regulating Algorithms: Current Practices, Issues and Possible Solutions

Archives

Categories