How should firms navigate data ethics?
We often see news articles on how the most recent cyber-attack has exposed huge swathes of sensitive data, compromising user accounts, bank details and passwords. However, has anyone ever questioned the way in which companies collect and use this ocean of data as opposed to how they are protecting it?
Data ethics denotes the moral principles and guidelines that govern the collection, handling, and use of data. With data generation expected to reach 180 zettabytes by 2025, up from 64.2 in 2020, it has become more important than ever to consider the ethical implications of how and why data is used. Firms collecting this data need to be aware of how they can avoid falling into unethical practices and inappropriate data usage. 
The key problems with modern data usage
Data capture and storage has evolved in parallel with the digital revolution. You can consider technology as the tap and plumbing, and data is the water that flows through it. In the past, collecting data required significant effort and resources, thereby restricting the activity to when there was a specific use for it, for example, a national census. However, with the proliferation of apps and websites, data is now captured effortlessly, at almost zero cost and as a result often without due consideration.
The key problems with modern data usage include:
The practice of recording everything, partly because the data might be useful in the future and also because it’s so cheap to record and store. This approach is contrary to the data minimisation principle: data should only be stored if it’s necessary for some purpose (as defined in GDPR chapter II, article 5 as personal data that is “adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed”). It represents a fundamentally different attitude today; possibility driven rather than purpose driven. 
Nobody really knows how the data will be used in the future, or what other data it will be linked with. New and rising technologies such as Artificial Intelligence (AI) are capable of merging, analysing and reworking various datasets which can achieve remarkable insights, but can also result in harmful and biased outputs. This means we cannot usefully characterise data sets as sensitive or by potential use (since these are unlimited and unforeseeable). So, it is not the data per se that raises ethical issues, but the use to which it is put and the analysis it is subject to, usually at a future date. 
For example, Cambridge Analytica provides a well-documented example, whereby a political consulting firm, harvested the personal data of millions of Facebook users without their consent and used it to influence the outcome of the 2016 US presidential election through highly targeted political adverts. Although obtaining the personal data of millions and utilising it for political advantage were both unethical and violations of privacy, the application of the data had a far greater implication than the action of it being harvested. 
Regardless of the perceived scientific, financial or political advantages a dataset may seem to offer, such benefits need to be weighed against their ethical implications. Violation of consumers’ data and unjust data uses can be easily overlooked. This is a common dilemma for data scientists who have to question the quality and validity of the data received before analysing it.
How must data ethics evolve with AI?
A US study conducted in 2019 highlights an example of an ethical and discriminatory case in the health care system where an algorithm widely used in US hospitals to allocate health care to patients had been systematically discriminating against black people. The algorithm was found to be less likely to refer black people than white people who were equally sick to programmes that aim to improve care for patients with complex medical need. It was reportedly used to help manage care for about 200 million people in the US each year.
The study found the algorithm assigned risk scores to patients based on their total healthcare costs accrued in one year, which is believed to have disadvantaged black patients who spent less on their healthcare costs to white patients but were substantially sicker. Data scientists figured this was due to systemic racism which was unknowingly passed on to the results in the algorithm. Only 17.7% of patients that the algorithm assigned to receive extra care were black. The proportion would be 46.5% if the algorithm were unbiased. 
This is an example of a black box algorithm, where the user and developer cannot easily decipher the inner working of the algorithm and are therefore ‘boxed’ out from entirely understanding how the algorithm has come to decisions. As the AI within the algorithms learns from the data itself, it introduces unintentional bias which are hidden within the inner working of the algorithm and can result in discrimination. These issues have raised concerns about the ethical impact and unintended consequences of new technologies for society across every sector where data-driven innovation is taking place. [5, 6]
Black box algorithms must be tested numerous ways to ensure they will work properly in the real world. Once the models are deployed there is a very limited number of people who will be able to correct and check them.
Consequently, ethics of AI has become a big challenge for governments and societies, and many of the world’s leading universities have recently created multidisciplinary centres of research with the focus primarily on responsible AI. For example, the IEEE, widely known for developing technological standards, have constructed a global initiative on Ethics of Autonomous and Intelligent Systems. Furthermore, the Information Commission Office (ICO) has updated the Guidance on AI and Data Protection to provide further clarification on the fairness on AI on request of the UK industry. This outlines the emphasis to develop ethical standards within the use of AI.
How should companies be tackling data ethics?
Not only are the ethical implications of these data practices of concern to the people involved in them, but they can also affect the brand image, reputation and credibility of the firms that employ them. Data must therefore be considered from a commercial and ethical perspective. How, then, should firms manage the fine line between profitable data use and unethical data use?
We must turn our attention to a growing source of data biases and discrimination, our ML and AI models. There is growing emphasis to understand these algorithms and design them with ethical considerations in mind from the outset. Explainable artificial intelligence (XAI) is a field which aims to make AI models more explainable, intuitive, and understandable to human users without sacrificing prediction or performance accuracy. It does this by adding an extra layer of algorithms for users to understand factors the positively/negatively affect the AI model outcome. There is a heightened interest in this from many consumer advocacy groups, counterparties and internal stakeholders at firms. Some businesses have already started rolling this technology out such as Google Cloud’s XAI platform, Flowcast and Fiddle Labs. Firms could see more of their pilot projects come to light, since a lack of explainability can be a major hurdle for deploying AI models. 
However, the cause of poor data ethics programs in firms need to be addressed. The lack of evolving culture. Firms need to embed data-ethics programs across the entire C-suite. Having a culture of adherence to strict data ethics principles prevents it from being overlooked. Furthermore, setting expectations up front around data usage is vital for protecting customer’s data from unethical data usage. This can be carried out by having a clear identity and access-management system to ensure only those with the right training and privileges can access sensitive data.
Finally, to ensure these values are implemented and an organisation truly benefits from applying data ethics principles a governance body is needed. A data-ethics board. It would be a cross-functional committee composed of representatives across the business, legal, IT, analysis and the C-suite with the aim of defining data standard and upholding them in all levels of the business. It is still key for data owners and business departments to comply with adopted policies, but now key stakeholders from all levels of the business have oversight over whether these standards are being observed.
Transparency is perhaps the key ingredient to tackling data discrimination, privacy, and security issues. Firms should know, at the very least, what information is being gathered, for what purposes, who has access to it, and what kinds of security controls are in place. Regularly evaluating the validity of models will also ensure issues with black box algorithms are identified at an earlier stage.
The remarkable benefits accruing to firms, organisations and consumers through data-driven capabilities will continue to expand for many years to come. When data ethics align with a firm’s data activities, it enhances a firm’s capabilities as opposed to restricting them. Strict adherence to the principles of fairness and ethics on big data analytics will only strengthen a firm’s reputation as a brand. They can confidently carry out their data-driven projects with cutting-edge innovation knowing they have put procedures in place to prevent negative outcomes and ethics being upheld.
About the author
Subhan is a data science apprentice at Be | Shaping the Future UK and currently studying for a BSc in Digital and Technology Solutions.
Throughout his career to date he has developed a well-rounded understanding of data science principles and how to apply them to work-based projects, having worked on a variety of projects in the financial services sector.
|||P. Taylor, “Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025,” Statistica, 8 September 2022. [Online]. Available: https://www.statista.com/statistics/871513/worldwide-data-created/.|
|||D. J. Hand, “Aspects of Data Ethics in a Changing World: Where Are We Now?.,” in Big Data, 2018, pp. 176-190.|
|||Carole Cadwalladr, Emma Graham-Harrison, “The Guardian,” The Guardian, 17 March 2018. [Online]. Available: https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election.|
|||Z. Obermeyer, “Science,” in Science, 2019, pp. 447-453.|
|||J. Ayling and A. Chapman, “Putting AI ethics to work: are the tools fit for purpose?,” AI Ethics 2, p. 405–429, 2022.|
|||A. Jobin, M. Ienca and E. Vayena, “The global landscape of AI ethics guidelines,” Nature Machine Intelligence, vol. 1, p. 389–399, 2019.|
|||Surkov, Alexey; Gregorle, Jill; Srinlvas, Val, “Unleashing the power of machine learning models in banking through explainable artificial intelligence (XAI),” 2022.|