Will subject matter expertise accelerate the progress of voice-activated tech?

Text

Despite huge leaps forward in conversational artificial intelligence (CAI), our demand for one-size-fits-all voice-activated technology has left no one under the illusion they are talking to anything other than a robot.

If CAI is going to take its place at the top table of future tech, it needs to embrace its human side – and the race is on to develop the technology that does just that.

To mark the publication of our latest Artificial Intelligence M&A Market Report 1H 2019, we ask if narrowing the focus of CAI models is the answer.

“With the growth of many industries dependent on the development of AI technologies, strategic investors are stepping up their acquisitions and investments […]. This is one of the fastest-moving sectors, with great momentum and game-changing outcomes.”

The problem with current voice-activated AI

CAI has come a long way since one-dimensional applications designed to answer simple, pre-defined questions became common place just five or ten years ago.

Sophisticated, voice-activated assistant systems built on applied machine learning and natural language processing (NLP) are a world apart from the first simple chat bots. And they are popular.

In the third quarter of 2018 alone, Amazon shipped 6.3 million Alexa-powered Echo devices. Such technology, it seems, has captured the imagination of consumers – despite its well-known shortcomings.

The comedy value of misheard and out of context requests has become common parlance because we expect to interact with voice-activated AI on our own terms.

For example, this tech doesn’t, for the moment at least, understand when a user moves between different tasks, such as dictating a shopping list or changing the music. It can’t understand how frustrating it is to restart a telephone helpdesk menu after making a simple mistake at the very last option.

These functions rely on understanding context – and that’s difficult to train an algorithm to do.

Human conversation is incredibly complex and unpredictable, so how can we programme an algorithm with all the data it needs to make sense of our multiple demands?

These are issues that will only become more prominent as the Internet of Things (IoT) grows in popularity and influence, and we expect to have flawless conversations with everything from our cars to our fridges.

“The language analysis sub-sector has experienced impressive growth over the last 30 months, as it sees its application extend to and grow in new verticals. In fact, Q1 2019 has proven to be a stellar quarter for language analysis."

Personalisation is humanisation

Despite the runaway success of Alexa and Siri’s one-size-fits all approach, leading ML and NLP experts have known for some time that training a more “human” CAI relies on personalisation.

It will take an estimated 15 to 20 years to develop an NLP and CAI capable of understanding the true complexities of human conversation.

In the meantime, advocates of personalisation say that it makes sense to build and train CAI for specific tasks.

Reducing the dataset programs needed to adapt and learn makes it easier to pick up patterns and understand what might be asked of them.

By defining the most appropriate use case and identifying data-rich areas of business, developers can provide ready-made answers to FAQs that allow for a more human seeming interaction.

It’s also about defining, and responding to, the metrics that are relevant to both the user and the use case. Algorithms designed to encourage healthy eating, for example, succeed through multiple engagements, whereas businesses do not want people coming back to their complaint-handling software time and again.

Personalisation is a significant detour from the current expectation that smart speakers and intelligent phone-based virtual assistants should do everything for us.

Subject matter experts

Our report shows impressive growth in the language analysis sub-sector of AI over the last 30 months, and a large part of this is underpinned by the need to improve voice recognition ability.

In November, for example, Wluper raised $1.3 million in a seed funding round. The London-based start-up believes voice assistants work much better when the underlying AI is tasked with becoming an expert in a narrower and more specialised domain.

Co-founder Hami Bahraynian told TechCrunch: "When we think of intelligent assistants like Alexa or Siri, the only time you'll believe they're really good is if they understand you properly. Most of the time, they simply can't.

"It is not the speech recognition which fails. It is the missing focus and lacking reasoning of these systems, because they all can do a lot of things reasonably well, but nothing perfectly.”

In the US, Google has integrated Duplex with its Voice Assistant to enable users of its Pixel 3 smartphone to make restaurant bookings.

While a perfect example of the benefits of the personalisation approach, however, the AI-driven automated reservation system may be a little ‘too’ human.

It has raised questions over the ethics of robotic voices not identifying themselves as such to users, which could be the next big challenge to overcome.

Master of one trade, Jack of none

We are already increasingly reliant on CAI, and that will only get more entrenched as the IoT becomes more prevalent.

Ultimately, improving voice recognition relies on reducing the amount of data we expect CAI to process and learn from. And it seems obvious that training algorithms to be a subject expert is a perfect way to do that.

It’s clear that to embrace the potential of this technology, we need to train AI to better understand the complex context and nuance of conversation.

But while solving these problems, we also need to recognise that making AI more “human” will bring its own inherent challenges.