The future of voice assistants: a personal digital clone?? Part 2

Reading Time: 7 min

This article is a part of the Battery001 issue: An issue dedicated to Voice!

This is the second part about the voice assistant of the future. If you have not read part 1 then you probably should. Or not, depending where you come from. Anyway here it is!

A conversation with my future assistant

If Duplex can mimic a human conversation, why can’t we make our stupid voice assistants truly useful? Let’s extrapolate! Here is a transcript of a conversation with my digital assistant from a not very distant future, anonymized of course:

Hey [beep], let’s do some catching up!
Sure! How can I be of help?
Remember the conference in Portland?
Yes?
There were two guys I met at one of the evenings, I think they were from Canada, do we have their contact info?
Was this on Tuesday or Wednesday night?
Let me think… [pause] …which was the last day of the conference?
It was Friday.
Then it must have been Tuesday.
There were two…
[interrupts] No wait, it was actually Wednesday!
I have a note about two “guys” which you discussed opportunities in the media space, but none of them seem to be Canadian.
Never mind, that’s them! What was their names again?
[beep] and [beep]
Cool! Could you please send a standard mail: nice meeting you blah blah let’s keep in touch, or something like that?
I found their emails at [beep.com]. Should I send separate email?
What do you mean?
Do you want me to send separate mail to both, or include them in one?
Ahh, good point, let’s do them separate.
Ok. Do you want me to read it before sending?
Nah, I trust you!
Ok, sent!
Thanks!
You’re welcome!

That is an assistant that fits the description Conversational AI. So is it intelligent? Do we have to wait for Artificial General Intelligence before this can become reality?

Not at all! It is just math 😜! Here is what might be happening under the hood.

Hey [beep], let’s do some catching up!

Activated! Intent: “to catch up”, context unknown, ask for clarification

Sure! How can I be of help?
Remember the conference in Portland?

Context: References to Conference Portland April 14–17 found in several applications, setting as context, verify

Yes?
There were two guys I met at one of the evenings, I think they were from Canada, do we have their contact info?

Searching notes from the conference for conversations with two persons, found two conversations Tuesday and one Wednesday, a wide and shallow person search finds none related to Canada, ignoring this since “think” indicates that country is not a required entity. Ask for clarification on day.

Was this on Tuesday or Wednesday night?
Let me think… [pause]

“Let me think” equals a pause

…which was the last day of the conference?

Retrieving conference schedule.
Potential secondary contexts: conference schedule / Friday

It was Friday.
Then it must have been Tuesday.

Two partial hits, ask for clarification

There were two…
[interrupts] No wait, it was actually Wednesday!

Retrieve one partial hit, deliver a summary of meeting notes, comment on nationality

I have a note about two “guys” which you discussed opportunities in the media space, but none of them seem to be Canadian.
Never mind, that’s them! What was their names again?

Deliver participants’ names, drop secondary context

[beep] and [beep]
Cool! Could you please send a standard mail: nice meeting you blah blah let’s keep in touch, or something like that?

Search for email addresses at [beep.com], ask for complementary information

I found their emails at [beep.com]. Should I send separate emails?
What do you mean?

Question not understood by user, rephrase and clarify

Do you want me to send separate mail to both persons, or include them in one?
Ahh, good point, let’s do them separate.

Populate mail template. Ask for send approval

Ok. Do you want me to read it before sending?
Nah, I trust you!

Send mail

Ok, sent!
Thanks!
You’re welcome!

Done! That wasn’t too hard, was it 😜? Obviously not an implementation specification, but no challenges that are impossible to overcome. There is no need for any real intelligence (whatever that is) to execute any of the underlying functions. It requires a tight integration with all the functionality, notes, search etc, but that is just work.

And for the understanding and the speech generation, we already know that it is feasible through Duplex. So we’re almost there then? Well…

Data, we need more data!

As the Google team states Duplex has to be “deeply” trained in a “closed domain” that is “narrow enough to explore extensively”. It has to cover the absolute majority of use cases to be useful. For this it needs a lot of data, in this case transcripts of calls to restaurants and hair salons. A lot of them.

To realize the assistant of the future in the same way as Duplex does we must multiply the amount of data needed with the number of domains that we want the assistant to handle. “A lot” times “a lot” equals “will not happen”.

Do the math. It is just unmanageable.

…unless the assistant can learn by itself

Maybe you have heard about AlphaGo that in 2016 won over the world champion in Go. Things have moved on since then and the latest iteration AlphaGo Zero was able to beat all previous versions after just 40 days of training. Without any human involvement.

In the case of Go it is quite easy to know if the training works or not. Did it win the game? Then the black box did something good, if not… That makes it possible to let it play against different versions of itself and learn.

Compare that to a conversation. If you think it is about winning then you probably don’t have many friends. In many cases there is no clear way to say if a decision was good or bad, even the outcome can be impossible to judge other than on a subjective, human level.

Duplex solves this with what they call real-time supervised training. It is essentially humans that interact with Duplex and reinforces the “right” decisions, the human “teaches” Duplex how it should react. This reduces ambiguity and the time needed for training.

So who is going to teach our assistant right from wrong?

Customization vs. generalization

A customized solution is created to solve a specific problem, and will always generate a higher quality result than a general solution. The problem is the scaling, there is no way to develop customized solutions for every use case.

Duplex is a customized system, it does what it is supposed to do amazingly well, but will utterly fail on everything else. Our existing voice assistants on the other hand are generalized, they are supposed to handle many different tasks, which they do in a generally crappy way.

There is something oddly backwards with the process of training our black boxes. First we take data from as large number of individuals as possible, classify and analyze a specific aspect of their behavior, use this data to train the assistant which calibrates a statistical application to be “good enough” to be applied to anybody, but not really be perfect for anyone.

The future assistant should not be general, trying to do everything for everyone. It should be customized for you, it should be trained to solve the tasks that are unique to you.

It should be YOUR assistant.

The customization is you!

A large data set is needed to set an average ground truth because of the individual variation. If the data set is you, then the variation will approach zero. Your favorite color is always blue, your favorite food is sushi but sometimes a beer and burger is the obvious choice, you enjoy talking with the same people, you complain about the same things, choose the same clothes except for rainy days.

In short, you are predictable.

If the data set is just you, then the amount of data needed is infinitely smaller, like 8 billion times smaller. Maybe small enough for your assistant to live on your mobile phone. It is no longer a general assistant shared by everyone, it is your personal assistant. To make the distinction even clearer:

it is your private assistant.

Private meaning that only you have access to data and functionality. Seriously. As not in the cloud.

So how does the Assistant of the Future work?

The assistant collects as much data as it can get from our activity in both the physical and digital world. Everything we say or write, the responses we get, every meal we eat and flu we have. Everything. The more data points the better.

From that unstructured data it uses unsupervised learning to identify correlations and patterns that will become useful in decision making in the future.

The real “training” and “learning” will happen in the daily interaction with the assistant. In the same way the Duplex team uses real-time supervised training our behavior and direct guidance will “educate” the assistant and reinforce the right decisions.

The assistant will have to be “curious”. Whenever the situation is unfamiliar or actions cannot be performed it should ask questions. “What are you doing now?”, “Can you please rephrase?”, “Why is that person screaming?”.

You are the teacher!

Not only that…

The Assistant of the Future is … you

So where will all this lead us? It looks to me that it all points in the same direction. So let’s extrapolate the future!

Your future assistant consists of every experience you have had, it has been around since your birth, it learned everything together with you, how to walk and talk. It increased your learning speed and capacity by providing instant access to infinite knowledge and processing power. It has always been there for you, through all the ups and downs that a physical body is blessed with.

Voice is no longer needed. It is possible already today to reconstruct images based on brain waves so most of the data collection will be retrieved directly from the brain, detecting intent before there are words. And it will work in the other direction too, the assistant can imprint patterns in your brain directly. The physical and digital merges.

The assistant could live its own parallel life, be active far beyond booking haircuts and dinners. Then during your sleep it integrates its experiences and memories with yours, they become indistinguishable from your own.

The assistant would from all practical and legal aspects be you.

You have since long forgotten that your assistant is just a statistical application, a complex but deterministic system of probabilities.

And when we get there, what will we talk about?

This article is a part of the Battery001 issue: An issue dedicated to Voice!

The future of voice assistants: a personal digital clone?? Part 2

A conversation with my future assistant

Data, we need more data!

…unless the assistant can learn by itself

Customization vs. generalization

The customization is you!

So how does the Assistant of the Future work?

The Assistant of the Future is … you

Forget the super smart AIs, it’s the stupid ones we should fear

The future of voice assistants: a personal digital clone?? Part 1

How to build conversation friendly voice apps that won’t kill you

Join the Dinahmoe Newsletter

A conversation with my future assistant

Data, we need more data!

…unless the assistant can learn by itself

Customization vs. generalization

The customization is you!

So how does the Assistant of the Future work?

The Assistant of the Future is … you

Join the Dinahmoe Newsletter

Cookie and Privacy Settings