We need to talk about AI
Ethical and Regulatory questions facing AI
Regardless of area of expertise, most of us are probably already aware of the momentum around Artificial Intelligence (AI). Between self driving cars, home assistants (Alexa, Google Home, et al) and the growing capabilities of our mobile devices there is no escaping the ever looming presence of AI in our lives.
Furthermore, it seems unlikely that this will slow down anytime soon. A recent Narrative Science study found that AI adoption grew by 60% in the last year with 61% of organisations having reported to have implemented AI within their business, and a Gartner report predicted that by 2020 85% of customer interactions will be managed without human intervention.
But despite this growth, there is still a question mark over whether, and if so, how, the field should be regulated. Having been brought up on decades of sci-fi about AI going rogue and robots enslaving the human race, it feels like there is both the fear of this possible future, whilst also scepticism that these fears are only the stuff of movies. Elon Musk has famously warned of the future risks of AI: “I think we should be very careful about artificial intelligence. If I had to guess at what our biggest existential threat is, it’s probably that” whilst others, including Mark Zuckerberg, have downplayed the claims of doomsday scenarios as irresponsible.
So what's the big deal? AI already permeates so many aspects of life and business, but considering for a moment that these technologies could be being used to control autonomous cars on public roads, determine people’s credit score or suitability for a job, to detect illness or even in policing and judicial decision making - it is pretty clear that we should have a good understanding of these technologies and clear systems of accountability and control in place. In all these examples getting a decision wrong has the potential to ruin lives, yet there is still limited regulation, control or even understanding of the algorithms, the data and their usage.
A common analogy is with other heavily regulated industries: big pharma companies can’t release drugs without thorough testing and approval, yet several big tech companies have already started testing autonomous vehicles on public roads with limited regulatory controls (that’s not to say that they have had a completely free pass, there are varying levels of regulation, depending on the region. Arizona has long been promoting itself as an AI friendly state to try to attract business from big tech, making it as easy as possible for companies to test self driving cars with minimal regulatory friction, and they recently saw the first fatality from a self-driving car).
In its 2017 report, the AI Now Institute recommended that AI be outright ban from use in any high risk areas, such as criminal justice, healthcare, welfare and education and further measures for other domains - which given the potential impact of errors in these domains, seems like a fairly sensible starting point.
Uncertainty and the unknown
One key aspect that is especially troubling is the lack of understanding of both the data and the underlying technology. This isn’t necessarily a surprise - we have computers being trained on millions of data points, to the point of being able to outperform humans at their tasks, so it should come as no surprise that both the inner workings and the end results could be beyond easy comprehension.
This problem has been demonstrated by several high profile mishaps from large tech companies, showing that even companies that have a wealth of resources and technical expertise in the domain can be caught out - such as Microsoft’s AI chatbot Tay, who quickly became racist when released into the wild. Clearly Microsoft had neither intended nor envisaged that end result. Similarly, when Google translate revealed gender bias in pairing “he” with “hardworking” and “she” with “lazy” - it clearly wasn’t an intentional or foreseen behaviour, but eventually revealed itself with wider usage.
Understanding where bias in AI comes from
To get a better understanding of where these biases and blind spots come from, let’s take a look at how AI learns. Broadly speaking, there are three primary approaches to training AI: Supervised, Unsupervised and Reinforcement.
Unsupervised learning is where the AI is fed very large amounts of raw data - for example an entire corpus of fictional texts - and it is left to work out patterns or groupings. That is, it doesn’t know a right or wrong answer, but can identify related things from the dataset and group them together (for example, AI reading popular fiction might group together terms such as “batman” and “wonder woman”, but it would have no knowledge of what these terms actually mean).
Supervised learning is where the AI is fed very large amounts of marked up data - that is, for each input, it also gets passed the expected output. An example of this is if you had a large set of photos (say Google Photos) which are pre-tagged with descriptions of what is in the photo, the dataset could be used to train an AI to identify contents of a photo.
Reinforcement learning is similar to supervised in as much as the algorithm gets information as to whether or not it is performing well (like knowing the answer for a given input) but is in the form of a feedback loop and works more like a trial-and-error approach to learning (it might have a general fitness score function that can be used by the algorithm to determine whether or not its response to given input has been successful or not and adjust its response for the next cycle). The simplest example of this is something like AlphaGo/AlphaZero, where an algorithm learns to play a game like Go or chess by trial and error and gets feedback on its attempted response from the game itself.
Both Supervised and Unsupervised learning cases require vast amounts of data to accurately train AI, which really leads us to one of the primary challenges for building fair and ethical AI: sourcing the data to train on. AI is dependent on these huge datasets, and finely tuned to all the details and subtle underlying patterns, regardless of whether we are aware of them or not, and as we will see, getting objective, raw data sets of sufficient magnitude is rife with challenges.
Institutional bias
Similar to the concept of Conway’s Law, which states “any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure”, the data we naturally generate in action, conversation and interactions as a society or organisation will naturally reflect the values, beliefs and structure of the society (or organisation). There is an intrinsic and inescapable subjectivity in all big data, best described by Lisa Gitelman in her book Raw Data is an Oxymoron:
“Objectivity is situated and historically specific; it comes from somewhere and is the result of ongoing changes to the conditions of inquiry, conditions that are at once material, social, and ethical
A simple example of this could be in criminal statistics: if a police force stop-and-search a particular demographic more heavily than others, then that will be reflected in the numbers and therefore that cultural subjectivity influences the data set - this subjectivity will then naturally carry over to, and likely be amplified by, the trained AI as it becomes finely tuned to the data (an example of this was seen where some software used to inform sentencing decisions relied on data that had institutional bias, which resulted in a racial bias in the risk assessment - strengthening the AI Now report’s proposal of banning AI use in these areas).
Finding complete & representative data
Compounding this problem is the fact that researchers working in AI face the challenge of finding datasets that are big enough and permitted for such use, which can be hard to come by, meaning they often make-do with incomplete or skewed datasets. For example, the popular community discussion web site Reddit makes its vast historic dataset publicly available, which is a rich source of natural text and conversation, and makes for a very tempting dataset for engineers and researchers to take advantage of - however, Reddit is a very specific subset of the internet, and the real world demographic, meaning that whilst there is undoubtedly a lot that can be learnt from that wealth of data, any AI trained on it will be heavily subjective.
There have been several reports finding that these incomplete or skewed data sets just further add to the bias. The 2017 AI Now report said:
“data can easily privilege socioeconomically advantaged populations, those with greater access to connected devices and online services
Which is to be expected when you think about it really - always connected people with mobile devices will naturally be generating a lot more date than those without easy access to computers. On a very simple level, the core regular users of reddit, for example, will likely have access to mobile devices or in the very least have available access to computers and the internet - which rules out large parts of the population - not to mention the inclination to partake in the online community.
There are also other challenges that are intrinsic to the way AI currently works: if we have a dataset where a particular demographic is only reflected by 1% of the data, then the AI could claim to achieve 99% accuracy whilst being completely inaccurate for all of that 1% minority. Furthermore, we know that there is a strong relationship between the amount of training data and the accuracy of AI, so in the scenario we have a perfect representation of the population, by definition, all minority groups will have a smaller selection of data points to train on so inevitably the performance of the AI for minority groups will fare worse.
Finally let’s consider again that we have a huge, rich dataset (the idea scenario), and we try to intentionally exclude sensitive features that might explicitly encode bias: race, gender, age, etc. There are still loads of data points that may still act as a indirect proxy to these features, so even without including gender, age and sex in the input data, it is easy to see how these features can get encoded in other data points such as names, location, interests, communication style. This makes it even harder to detect and prevent bias in our datasets.
There is no objectivity in big data.
How can we address the problem?
Some of these examples might have clearer cases of existing bias that we need to be address in training our AI, but a tougher challenge is how can we address the more subtle biases hidden in the cultural objectivity that we might not even be aware of? We all carry our own opinions and biases that subconsciously affect our opinions and attitudes toward things - but if we are not consciously aware of those, we need to think about how we can ensure that developers training AI can have the foresight to engineer around these biases?
This issue highlights one often recommended approach to tackling the problem of having a greater emphasis on the need for diversity in the teams building AI. Both diversity in terms of individual identities but also cross-functional teams. Statistically and broadly speaking, AI is often developed by teams of engineers with limited diversity, which results in a limited range of views when thinking about the dataset and in what goals are optimised for in the training process. The 2017 AI Now report recommended:
“stakeholders in the AI field should release data on the participation of women, minorities and other marginalised groups within AI research and development.
Aside from trying to recognise subtle bias in the data, we also need to consider that the objective norm, and what we consider to be ok at the moment is changing. Going back to Lisa Gitelman’s quote: “Objectivity is situated and historically specific”. If you could get a dataset from even just two decades ago, it’s not hard to imagine that AI trained on that would have un-acceptable biases because the societal norm and general attitudes to race, gender and identity, etc have changed significantly since then.
As a simple example, take the motor insurance industry. For decades, insurance companies identified young male drivers as a particularly high risk of accident so traditionally charged much higher premiums for that demographic - previously a widely accepted approach, and one based in statistics: young male drivers were statistically more likely to have an accident behind the wheel. But then, in 2012 EU gender discrimination regulation came into effect that prevented companies charging men more than women, so now the insurers have stopped that categorisation for pricing despite the data being available. If that was AI it would need to be re-trained with a modified dataset, with gender probably removed from the data and thought put into other data points that would also need to be removed (names, for example, might very easily be a broad proxy to gender). Whilst this is a simpler example, as its a binary change in legislation with clear requirements, there are also the more gradual shifts in attitude where it becomes a lot fuzzier - like the changes in attitudes on race, gender and secuality over the last thirty years.
We previously discussed the idea that even if we exclude socially salient data points, such as gender, those features can still get encoded via other proxies in the data, and this example of the change in EU regulation and its effect on the insurance industry provides an interesting case study in exactly that phenomenon. There was an article written in the Guardian following the EU ruling, explaining that, despite the ruling meaning insurers couldn’t charge more because a driver was male, male premiums have actually increased in comparison to female premiums since. The reasoning they provide, is that rather than classifying on the crude, data point of gender, the system instead places greater importance on a wider set of data points, and it turns out that these other data points are really just acting as encoded proxies (they list car size, occupation, vehicle modifications). The article makes the observation that MoneySupermarket released a study showing that 8 out of the worst 10 occupations for drink/drug drive incidents were the building trade, with midwives being the least likely to have a drink/drug drive offence, the suggestion being that building trade is predominantly male, and midwives, predominantly female.
It certainly seems to me like there are still lots of challenges as to how we can foresee potential problems and how to tackle them. A key starting point will be ensuring teams working in the area have a good understanding of the dataset they are working with: where it comes from, any inherent bias or blind spots and which of the data points might need modifying or weighting due to their contextual/social salience. This will need to be driven through agreed best practices and AI development standards from organisations like AI Now and from academia, as well as a need for appropriate regulatory controls (although these face their own challenges, which I will discuss in a later article).
I also believe that these challenges mean an even greater need for for diversity of the teams - both in terms of the race, background, gender etc of the team, and also cross-functional members, not just engineers but also working closely with the specific domain experts for the field.
Photo credits:
Heading Photo by Alex Knight on Unsplash
Anonymous person Photo by Andrew Worley on Unsplash
0 comments: