Banks must be aware of “biases” in data used to train ML models

Financial institutions need to be conscious of biases in the historical data that is being used to train machine learning (ML) models, particularly around credit underwriting, according to the chief digital officer at HSBC.

“We lend to more credit worthy individuals than less credit worthy individuals. That is part of our business model, but where you draw the line in that lending is what we are talking around right now. What do we do to make sure that we don’t violate privacy or even take a line that in hindsight may be considered to be unethical?” said HSBC’s Rick Hawkins, who was speaking on a panel at the Financial Information Management conference in central London this week.

“As we move into machine learning and AI, and we look at some of the new technologies that are available to us, I think that is going to be an increasing challenge for legacy organisations like ourselves who have a back book of lending, of let’s face it, white middle aged men, because that is technically who we’ve lent to. If we give that base set of data, in that way, we will teach our machines to be bias, and that is a huge challenge for us going forward.”

Hawkins pointed to accusations of bias in Goldman Sachs and Apple Card’s algorithms used to determine customers’ credit limits. In early November, David Heinmeier Hansson, a tech entrepreneur tweeted that Apple card had offered him twenty times the credit limit offered to his wife.

In response to the accusation the bank told Reuters: “Goldman Sachs has not and will never make decisions based on factors such as gender, race, age, sexual orientation or any other legally prohibited factors when determining credit worthiness.”

By first acknowledging the biases in the data, banks can set controls and monitor what the data is achieving in training the ML models, according to Hawkins. On customer data and privacy, he said the insights that can be provided for seemingly innocent pieces of data is rapidly evolving, and banks must be cautious.

“From a list of transactions given a merchant ID and given an amount, we can gleam some pretty interesting high correlations as to who you are, where you are, what you do. Certainly, we can get demographics pretty good.”

When asked whether there were checks being employed at the bank to prevent ideas moving forward that are believed to be an inappropriate use of customer data, Hawkins said it was “being established”.

“It is working reasonably well at the moment,” he said. “We’re applying it to our innovation pipeline, we are doing experimentation before we do proof of concepts, and we are doing some proof of value before we do proof of concepts.

“So, we are going through this kind of staged approach, and it is at that point that if I am going to start spending money, I then ask those questions.”

Related reading