Jump to content

Big data ethics

From Wikipedia, the free encyclopedia
(Redirected from Data ethics)

At closer inspection, datasets often reveal details that are not superficially visible, as in this case where corneal reflections on the eye of the photographed person provide information about bystanders, including the photographer. Data ethics considers the implications.

Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data.[1] Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.

Big data ethics are different from information ethics because the focus of information ethics is more concerned with issues of intellectual property and concerns relating to librarians, archivists, and information professionals, while big data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers, governments, and large corporations. However, since artificial intelligence or machine learning systems are regularly built using big data sets, the discussions surrounding data ethics are often intertwined with those in the ethics of artificial intelligence.[2] More recently, issues of big data ethics have also been researched in relation with other areas of technology and science ethics, including ethics in mathematics and engineering ethics, as many areas of applied mathematics and engineering use increasingly large data sets.

Principles

[edit]

Data ethics is concerned with the following principles:[3]

  • Ownership – Individuals own their personal data.
  • Transaction transparency – If an individual's personal data is used, they should have transparent access to the algorithm design used to generate aggregate data sets.
  • Consent – If an individual or legal entity would like to use personal data, one needs informed and explicitly expressed consent of what personal data moves to whom, when, and for what purpose from the owner of the data.
  • Privacy – If data transactions occur all reasonable effort needs to be made to preserve privacy.
  • Currency – Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions.
  • Openness – Aggregate data sets should be freely available.

Ownership

[edit]

Ownership of data involves determining rights and duties over property, such as the ability to exercise individual control over (including limit the sharing of) personal data comprising one's digital identity. The question of data ownership arises when someone records observations on an individual person. The observer and the observed both state a claim to the data. Questions also arise as to the responsibilities that the observer and the observed have in relation to each other. These questions have become increasingly relevant with the Internet magnifying the scale and systematization of observing people and their thoughts. The question of personal data ownership relates to questions of corporate ownership and intellectual property.[4]

In the European Union, some people argue that the General Data Protection Regulation indicates that individuals own their personal data, although this is contested.[5]

Transaction transparency

[edit]

Concerns have been raised around how biases can be integrated into algorithm design resulting in systematic oppression[6]whether consciously or unconsciously. These manipulations often stem from biases in the data, the design of the algorithm, or the underlying goals of the organization deploying them. One major cause of algorithmic bias is that algorithms learn from historical data, which may perpetuate existing inequities. In many cases, algorithms exhibit reduced accuracy when applied to individuals from marginalized or underrepresented communities. A notable example of this is pulse oximetry, which has shown reduced reliability for certain demographic groups due to a lack of sufficient testing or information on these populations.[7] Additionally, many algorithms are designed to maximize specific metrics, such as engagement or profit, without adequately considering ethical implications. For instance, companies like Facebook and Twitter have been criticized for providing anonymity to harassers and for allowing racist content disguised as humor to proliferate, as such content often increases engagement.[8] These challenges are compounded by the fact that many algorithms operate as "black boxes" for proprietary reasons, meaning that the reasoning behind their outputs is not fully understood by users. This opacity makes it more difficult to identify and address algorithmic bias.

In terms of governance, big data ethics is concerned with which types of inferences and predictions should be made using big data technologies such as algorithms.[9]

Anticipatory governance is the practice of using predictive analytics to assess possible future behaviors.[10] This has ethical implications because it affords the ability to target particular groups and places which can encourage prejudice and discrimination[10] For example, predictive policing highlights certain groups or neighborhoods which should be watched more closely than others which leads to more sanctions in these areas, and closer surveillance for those who fit the same profiles as those who are sanctioned.[11]

The term "control creep" refers to data that has been generated with a particular purpose in mind but which is repurposed.[10] This practice is seen with airline industry data which has been repurposed for profiling and managing security risks at airports.[10]

Privacy

[edit]

Privacy has been presented as a limitation to data usage which could also be considered unethical.[12] For example, the sharing of healthcare data can shed light on the causes of diseases, the effects of treatments, an can allow for tailored analyses based on individuals' needs.[12] This is of ethical significance in the big data ethics field because while many value privacy, the affordances of data sharing are also quite valuable, although they may contradict one's conception of privacy. Attitudes against data sharing may be based in a perceived loss of control over data and a fear of the exploitation of personal data.[12] However, it is possible to extract the value of data without compromising privacy.

Government surveillance of big data has the potential to undermine individual privacy by collecting and storing data on phone calls, internet activity, and geolocation, among other things. For example, the NSA’s collection of metadata exposed in global surveillance disclosures raised concerns about whether privacy was adequately protected, even when the content of communications was not analyzed. The right to privacy is often complicated by legal frameworks that grant governments broad authority over data collection for “national security” purposes. In the United States, the Supreme Court has not recognized a general right to "informational privacy," or control over personal information, though legislators have addressed the issue selectively through specific statutes.[13] From an equity perspective, government surveillance and privacy violations tend to disproportionately harm marginalized communities. Historically, activists involved in the Civil rights movement were frequently targets of government surveillance as they were perceived as subversive elements. Programs such as COINTELPRO exemplified this pattern, involving espionage against civil rights leaders. This pattern persists today, with evidence of ongoing surveillance of activists and organizations.[14]

Additionally, the use of algorithms by governments to act on data obtained without consent introduces significant concerns about algorithmic bias. Predictive policing tools, for example, utilize historical crime data to predict “risky” areas or individuals, but these tools have been shown to disproportionately target minority communities.[15] One such tool, the COMPAS system, is a notable example; Black defendants are twice as likely to be misclassified as high risk compared to white defendants, and Hispanic defendants are similarly more likely to be classified as high risk than their white counterparts.[16] Marginalized communities often lack the resources or education needed to challenge these privacy violations or protect their data from nonconsensual use. Furthermore, there is a psychological toll, known as the “chilling effect,” where the constant awareness of being surveilled disproportionately impacts communities already facing societal discrimination. This effect can deter individuals from engaging in legal but potentially "risky" activities, such as protesting or seeking legal assistance, further limiting their freedoms and exacerbating existing inequities.

Some scholars such as Jonathan H. King and Neil M. Richards are redefining the traditional meaning of privacy, and others to question whether or not privacy still exists.[9] In a 2014 article for the Wake Forest Law Review, King and Richard argue that privacy in the digital age can be understood not in terms of secrecy but in term of regulations which govern and control the use of personal information.[9] In the European Union, the right to be forgotten entitles EU countries to force the removal or de-linking of personal data from databases at an individual's request if the information is deemed irrelevant or out of date.[17] According to Andrew Hoskins, this law demonstrates the moral panic of EU members over the perceived loss of privacy and the ability to govern personal data in the digital age.[18] In the United States, citizens have the right to delete voluntarily submitted data.[17] This is very different from the right to be forgotten because much of the data produced using big data technologies and platforms are not voluntarily submitted.[17] While traditional notions of privacy are under scrutiny, different legal frameworks related to privacy in the EU and US demonstrate how countries are grappling with these concerns in the context of big data. For example, the "right to be forgotten" in the EU and the right to delete voluntarily submitted data in the US illustrate the varying approaches to privacy regulation in the digital age.[19]

How much data is worth

[edit]

The difference in value between the services facilitated by tech companies and the equity value of these tech companies is the difference in the exchange rate offered to the citizen and the "market rate" of the value of their data. Scientifically there are many holes in this rudimentary calculation: the financial figures of tax-evading companies are unreliable, either revenue or profit could be more appropriate, how a user is defined, a large number of individuals are needed for the data to be valuable, possible tiered prices for different people in different countries, etc. Although these calculations are crude, they serve to make the monetary value of data more tangible. Another approach is to find the data trading rates in the black market. RSA publishes a yearly cybersecurity shopping list that takes this approach.[20]

This raises the economic question of whether free tech services in exchange for personal data is a worthwhile implicit exchange for the consumer. In the personal data trading model, rather than companies selling data, an owner can sell their personal data and keep the profit.[21]

Openness

[edit]

The idea of open data is centered around the argument that data should be freely available and should not have restrictions that would prohibit its use, such as copyright laws. As of 2014 many governments had begun to move towards publishing open datasets for the purpose of transparency and accountability.[22] This movement has gained traction via "open data activists" who have called for governments to make datasets available to allow citizens to themselves extract meaning from the data and perform checks and balances themselves.[22][9] King and Richards have argued that this call for transparency includes a tension between openness and secrecy.[9]

Activists and scholars have also argued that because this open-sourced model of data evaluation is based on voluntary participation, the availability of open datasets has a democratizing effect on a society, allowing any citizen to participate.[23] To some, the availability of certain types of data is seen as a right and an essential part of a citizen's agency.[23]

Open Knowledge Foundation (OKF) lists several dataset types it argues should be provided by governments for them to be truly open.[24] OKF has a tool called the Global Open Data Index (GODI), a crowd-sourced survey for measuring the openness of governments,[24] based on its Open Definition. GODI aims to be a tool for providing feedback to governments about the quality of their open datasets.[25]

Willingness to share data varies from person to person. Preliminary studies have been conducted into the determinants of the willingness to share data. For example, some have suggested that baby boomers are less willing to share data than millennials.[26]

Historical cases

[edit]

Snowden disclosures

[edit]

The fallout from Edward Snowden’s disclosures in 2013 significantly reshaped public discourse around data collection and the privacy principle of big data ethics. The case revealed that governments controlled and possessed far more information about civilians than previously understood, violating the principle of ownership, particularly in ways that disproportionately affected disadvantaged communities. For instance, activists were frequently targeted, including members of movements such as Occupy Wall Street and Black Lives Matter.[14] This revelation prompted governments and organizations to revisit data collection and storage practices to better protect individual privacy while also addressing national security concerns. The case also exposed widespread online surveillance of other countries and their citizens, raising important questions about data sovereignty and ownership. In response, some countries, such as Brazil and Germany, took action to push back against these practices.[14] However, many developing nations lacked the technological independence necessary or were too generally dependent on the nations surveilling them to resist such surveillance, leaving them at a disadvantage in addressing these concerns.

Cambridge Analytica scandal

[edit]

The Cambridge Analytica scandal highlighted significant ethical concerns in the use of big data. Data was harvested from approximately 87 million Facebook users without their explicit consent and used to display targeted political advertisements. This violated the currency principle of big data ethics, as individuals were initially unaware of how their data was being exploited. The scandal revealed how data collected for one purpose could be repurposed for entirely different uses, bypassing users' consent and emphasizing the need for explicit and informed consent in data usage.[27] Additionally, the algorithms used for ad delivery were opaque, challenging the principles of transaction transparency and openness. In some cases, the political ads spread misinformation,[27] often disproportionately targeting disadvantaged groups and contributing to knowledge gaps. Marginalized communities and individuals with lower digital literacy were disproportionately affected as they were less likely to recognize or act against exploitation. In contrast, users with more resources or digital literacy could better safeguard their data, exacerbating existing power imbalances.


See also

[edit]

Footnotes

[edit]
  1. ^ Kitchin, Rob (August 18, 2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE. p. 27. ISBN 9781473908253.
  2. ^ Floridi, Luciano; Taddeo, Mariarosaria (December 28, 2016). "What is data ethics?". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 374 (2083): 20160360. Bibcode:2016RSPTA.37460360F. doi:10.1098/rsta.2016.0360. ISSN 1364-503X. PMC 5124072. PMID 28336805.
  3. ^ Cote, Catherine (March 16, 2021). "5 Principles of Data Ethics for Business". Harvard Business School Online. Retrieved September 7, 2022.
  4. ^ Cofone, Ignacio (2021). "Beyond Data Ownership". Cardozo Law Review. p. 501.
  5. ^ van Ooijen, I.; Vrabec, Helena U. (December 11, 2018). "Does the GDPR Enhance Consumers' Control over Personal Data? An Analysis from a Behavioural Perspective". Journal of Consumer Policy. 42 (1): 91–107. doi:10.1007/s10603-018-9399-7. hdl:2066/216801. ISSN 0168-7034. S2CID 158945891.
  6. ^ O'Neil, Cathy (2016). Weapons of Math Destruction. Crown Books. ISBN 978-0553418811.
  7. ^ Buolamwini, Joy; Gebiru, Timnit (2018). "Gender shades: Intersectional accuracy disparities in commercial gender classification" (PDF). Proceedings of the Conference on Fairness, Accountability, and Transparency. 81: 1–15. Retrieved December 11, 2024.
  8. ^ Farkas, Johan; Matamoros-Fernandez, Ariadna (January 22, 2021). "Racism, Hate Speech, and Social Media: A Systematic Review and Critique". Television & New Media. 22 (2): 205–224. Retrieved December 11, 2024.
  9. ^ a b c d e Richards and King, N. M. and J. H. (2014). "Big data ethics". Wake Forest Law Review. 49: 393–432. SSRN 2384174.
  10. ^ a b c d Kitchin, Rob (2014). The Data Revolution: Big Data, Open Data Infrastructure and Their Consequences. SAGE Publications. pp. 178–179.
  11. ^ Zwitter, A. (2014). "Big Data Ethics". Big Data & Society. 1 (2): 4. doi:10.1177/2053951714559253.
  12. ^ a b c Kostkova, Patty; Brewer, Helen; de Lusignan, Simon; Fottrell, Edward; Goldacre, Ben; Hart, Graham; Koczan, Phil; Knight, Peter; Marsolier, Corinne; McKendry, Rachel A.; Ross, Emma; Sasse, Angela; Sullivan, Ralph; Chaytor, Sarah; Stevenson, Olivia; Velho, Raquel; Tooke, John (February 17, 2016). "Who Owns the Data? Open Data for Healthcare". Frontiers in Public Health. 4: 7. doi:10.3389/fpubh.2016.00007. PMC 4756607. PMID 26925395.
  13. ^ Gellman, Barton; Adler-Bell, Sam. "The Disparate Impact of Surveillance". The Century Foundation. Retrieved December 11, 2024.
  14. ^ a b c Von Solms, Sune; Van Heerden, Renier (2015). "The consequences of Edward Snowden NSA related information disclosures". Proceedings of the 10th International Conference on Cyber Warfare and Security, ICCWS 2015: 358–368. Retrieved December 11, 2024.
  15. ^ Larson, Jeff; Mattu, Surya; Kirchner, Lauren; Angwin, Julia. "How We Analyzed the COMPAS Recidivism Algorithm". ProPublica. Retrieved December 11, 2024.
  16. ^ Hamilton, Melissa (2019). "The biased algorithm: Evidence of disparate impact on Hispanics" (PDF). American Criminal Law Review. 56 (4).
  17. ^ a b c Walker, R. K. (2012). "The Right to be Forgotten". Hastings Law Journal. 64: 257–261.
  18. ^ Hoskins, Andrew (November 4, 2014). "Digital Memory Studies |". memorystudies-frankfurt.com. Retrieved November 28, 2017.
  19. ^ "ERRATUM". Ethics & Human Research. 44 (1): 17. January 2022. doi:10.1002/eahr.500113. ISSN 2578-2355. PMID 34910377.
  20. ^ RSA (2018). "2018 Cybersecurity Shopping List" (PDF).
  21. ^ László, Mitzi (November 1, 2017). "Personal Data trading Application to the New Shape Prize of the Global Challenges Foundation". online: Global Challenges Foundation. p. 27. Archived from the original on June 20, 2018. Retrieved June 20, 2018.
  22. ^ a b Kalin, Ian (2014). "Open Data Policy Improves Democracy". SAIS Review of International Affairs. 34 (1): 59–70. doi:10.1353/sais.2014.0006. S2CID 154068669.
  23. ^ a b Baack, Stefan (December 27, 2015). "Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism". Big Data & Society. 2 (2): 205395171559463. doi:10.1177/2053951715594634. S2CID 55542891.
  24. ^ a b Knowledge, Open. "Methodology - Global Open Data Index". index.okfn.org. Archived from the original on March 8, 2021. Retrieved November 23, 2017.
  25. ^ Knowledge, Open. "About - Global Open Data Index". index.okfn.org. Archived from the original on April 21, 2021. Retrieved November 23, 2017.
  26. ^ Emerce. "Babyboomers willen gegevens niet delen". emerce.nl. Retrieved May 12, 2016.
  27. ^ a b Isaak, Jim; Hanna, Mina J. (August 14, 2018). "User Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection". Computer. 51 (8): 56–59.

References

[edit]