Do you already use a voice assistant? At the beginning of March B. Amsterdam turned its focus squarely on Voice, with a range of experts sharing their vision for, and general outlook on, Voice Technology. Our Chief of Sound, Gijs, along with Audio Engineer, Max, were there to take care of the Emerce podcasts. Time for some questions about the future of Voice. And… you can also read all about the Voice solution we have just developed.
To get straight to the point: do you have a smart speaker at home?
Gijs: “Yeah, sure, I have one at home. A Google Home to be precise. I use it to quickly check what the traffic is like, or what is on my agenda for the next couple of hours, for example. It’s very handy, you can just call out while you’re getting your last bits and pieces together before going out the door. My kids use it mainly as a joke book! But, after about 3 or 4 jokes, they’re usually tired of it.”
Is such a smart speaker just a nice gadget, or does it have a serious future?
“The developments are moving fast. In the future you’ll just ask your phone or your speaker: will you make me an appointment at the hairdresser on Saturday, at 9 o’clock? The virtual assistant then calls the hairdresser directly, in the background. Google gave a demo of just this kind of functionality. It has to be said though, that this is a best case scenario. There are, of course, also plenty of moments that could be imagined where the assistant doesn’t understand what the other person says, and the whole conversation falls apart. I sometimes ask myself, how long will it be before there is another assistant on the other end of the phone? Then there will be two Google Assistants talking to each other. How well will that go? I think it could still take a good while before they’re effectively making hairdresser appointments, as you see happen in the video, for real. But it is definitely going that way.”
Google Duplex: A.I. Assistant calls local company to make an appointment (2018)
What other opportunities do you see for companies?
I think there will definitely be all sorts of solutions to think about. I just don’t know whether my answers have, in reality, already been overtaken.
There could suddenly be groundbreaking voice use cases added tomorrow. I do see coupling voice solutions to other smart devices as a particularly interesting next step. That Google will look in your kitchen cupboard and say: I see that the chocolate is nearly finished. Shall I order you some more? I’ll let you see the order. Is that OK? You’ll see your shopping list appear on your phone. You will just need to say ‘yes’ and the next day everything will be delivered to your home.
You know that the general text on websites needs to be easy to read. You have read it here before, but here it comes again: write in written language. So, I expect that in addition to SEO – Search Engine Optimisation – you will later have to watch out for your AO – Assistant Optimisation. How do you make sure that the text on your website is still easy to understand if it’s read aloud by an assistant?
What do the developments in Voice Technology mean for text to speech?
“The majority of text to speech services still sound very synthetic. The stress is put in the wrong place and the spe-ed, sometimes does not me-et the de-ma-nd. Google is working on an API at the moment that will have to be a lot better than existing text to speech tools. The expectation is that it will come to market, and be available open source, this year. We’re not there yet, but the developments are going at an exponential rate. I’m very curious about it.”
When do you think that computer voices will no longer be really distinguishable from ours?
“That’s really reading the tea leaves. A number of speakers from Google compared the moment we find ourselves in right now, with the moment when the iPhone first came out in 2007. Why? The iPhone changed an awful lot in our lives, of course. A lot of people at Emerce Update expect Voice Technology to have just as big an impact as the iPhone. We are really only just at the beginning. What we can already say is that Voice Control will be the method to interact with your computer. Because, you won’t need to type everything, that will be old-fashioned. And you will no longer need to click, you can just talk. This is of course a much more natural way to communicate. You remove a barrier. Adobe showed in a demo that they only need a few words to replicate someone’s voice. That works in that specific example, perhaps. But at the moment you need to let someone record ten thousand, strange sentences to be able to gather all the ingredients for the ‘voice generator’. Let’s have another look in six months time, we’ll be a good way on by then, no doubt.”
Preview of a new tool from Adobe, that can adapt the spoken words in a voice over (2016)
And now you guys have your own Voice solution?
“Yeah! We’ve been busy with Voice Technology for a while and have had a look at the possibilities for the communications and content industries. We saw potential for testing voice over scripts and making guide tracks. So, for these we’ve developed our voice over generator. As video maker, you type the script in, choose the right voice in the right language, and then you can also adjust the pitch of the tone and the speed. Of course it doesn’t produce a voice that is ready for broadcast, but it is a really handy tool to edit the video in terms of timing. You are able then, before the voice over has even been recorded, to put the voice under the video and see and listen to how your message comes across. If necessary you can replace words or change the syntax and only once you’re happy that the video is ready, allow the real voice over to record. That saves time and retakes. I think that everyone will be happy with that.”
Would you like to be among the first to try out our free voice over generator? There is no quicker way to test a voice over text for timing and impact. We’re curious to know what you think of it.