June 26

Your face is no longer your own. Nor is your voice.

Fraudsters can steal them both from the Internet. And, using the power of AI, they can create a deepfake version of you that’s so convincing it’ll beat bank security.

Banks and financial institutions increasingly use voice biometrics these days to confirm a caller’s identity.

And criminals increasingly exploit those voice biometrics in two ways.

They clone a caller’s voice to fool the bank’s computer into believing they’re an existing customer.

And they create a completely fake identity to open a new account.

Corsound, a startup based in Tel Aviv, has a solution for both problems. First, it has developed technology to detect a cloned voice.

Criminals take a voice sample from social media, a YouTube video or even an online meeting.

They then clone it using free online software to impersonate the real customer.

Corsound can tell the difference between a real voice and a cloned one.

Second, it can tell whether a photo of the customer opening a new account matches the person’s voice.

So if a fraudster uses a picture of A and the voice of B, it will immediately reject the application.

That’s because it knows how a person will sound, based on looks.

Creating a face from a voice

That’s pretty amazing if you translate it to your own life experience. You and I pick up the phone to a stranger and have no idea what the stranger looks like. But Corsound does.

And it’s currently working on cutting-edge technology that will actually sketch somebody’s face, based on nothing more than five or six seconds of a voice recording.

A sketch generated by an audio voice file. Photo courtesy of Corsound

It will listen to you over the phone – which is often poor quality audio – and create a picture of how you look. It’s not matching your voice to an existing photo in a database. It’s generating a picture from scratch, instantly.

The company describes it as “magic” and “the only technology in the world that can create a face from just a voice.”

Orel Agmon Halido, head of sales at Corsound, stresses that this is still in development – which is why he has to disappoint when I asked for a personal demo — but he does provide a glimpse into how it works.

“We are like a musical instrument,” he says. “Our voice comes through our lungs and throat, and it’s affected by the shape of our face, the mouth cavity, the nasal canals, the lips, all this area. Basically, we train our model to distinguish between all those differences.”

The AI learns, by processing countless examples, which faces match which voice. And which don’t.

“If you do it multiple times, thousands and tens of thousands of times, you can understand what a voice looks like,” says Halido.

“Our goal basically is to sketch a person’s face, but I will be very honest with you. We still need a lot, a lot of data to do so.”

Moving to commercial mode

The technology is based on 2019 research at MIT (Massachusetts Institute of Technology) into the correlation between voice and face.

Researchers found that a quick snatch of audio reveals a huge amount of data about a person’s gender, age, ethnicity, skin tone, nose and jaw shape, and more.

Corsound was founded in June 2020, as a subsidiary of Tel Aviv-based AI company Cortica, to develop ways to implement that technology. It currently has 17 staff in Tel Aviv, most of them working in R&D based on over 200 AI patents belonging to its parent company.

Corsound has so far raised $3.5 million, from Canadian venture capital firm Awz Ventures and from the Israel Innovation Authority.

With a few POCs (proofs of concept) ready, according to Halido, the company is now moving from development mode to commercial mode and seeking clients.

“It’s the first time, as far as we know, that there is a technology that can prevent identity theft,” he says.

The threats posed by generative AI are already huge, and they’re getting bigger.

“Banks should do everything to protect customers. And that’s why they need to embrace new technologies to understand the threats that come in with the AI technology.”

Synthetic voice alert

Although banks traditionally are quite conservative about embracing technology, he says, bank and finance companies are Corsound’s biggest target market.

The company demonstrated its technology to potential bank customers in February at the Finovate 2024 conference in London, with simulated calls from customers.

In the first, an Australian woman calls to transfer $2,000 to her husband. The voice is a fake, but it fools existing call technology which, according to Corsound, currently has a 90 percent market share.

The same recording is then played to Corsound’s technology and is immediately flagged as “Fraud: Synthetic Voice Alert” because it knows the voice has been digitally manipulated.

Halido then simulates the onboarding process for a new bank customer using his company’s technology.

“My identity is secure because my voice is my password,” he tells the bank’s computer. “Please verify me.” Corsound instantly verifies his voice as genuine – because it is.

He then uploads a “stolen picture” – an image of company CEO Gal Haselkorn – rather than a picture of himself.

The AI immediately rejects his application because it knows that the face he’s provided does not match the voice.

Corsound’s technology will, says Halido, save the finance and banking industry billions of dollars a year in fraudulent transactions.

The voice-to-face technology also has wider applications and could revolutionize law-enforcement investigations, generating facial structures and facial sketches just from voice recordings.