NEXT 2019 Podcast Series

NEXT 2019 Pre-Conference Series – Stuart Crane & Paul Cornwell – Voice Metrics

The 2019 NEXT pre-conference series is giving listeners an inside look into companies such as IBM, Voice Metrics, Ipsos, and Pulse Labs.. Join insight leaders on June 13 – 14 in Chicago for NEXT, where you can discover how technology and innovation are changing the market research industry. In this episode, Jamin Brazil interviews Stuart Crane, Founder and CEO of Voice Metrics; and Paul Cornwell, CTO of Voice Metrics.

Find Stuart and Paul Online:

Stuart’s LinkedIn

Paul’s Linkedin



Hi, I’m Jamin Brazil, and you’re listening to the Happy Market Research podcast. This is a special episode that’s connected to the upcoming Insights Association’s NEXT conference. It is located in Chicago on June 13th and 14th. I do a lot of these conferences both inside and adjacent to the market research industry. I think this particular NEXT conference is a must attend if you’re interested in learning about what’s coming up “next”. Maybe that’s how they came up with the name. My guests today are Stuart Crane, the founder and CEO of Voice Metrics, which helps companies leverage voice, as well as Paul Cornwell. Did I say your last name right, Paul?


Yeah, you got it.


Voice metrics CTO. Guys, thanks very much for joining me on the Happy Market Research podcast today!


–Glad to be here.

–Yes, thanks for having us, Jamin!


You guys are speaking at the NEXT conference on how to integrate voice into the total customer experience. I’m really curious, given your backgrounds, when did you first recognize the voice was important?


Voice, I’ve been interested in for quite some time back in the day when I would listen to cassettes in the car and CDs in the car. I was really interesting in voice recognition:  recognizing voice with Dragon dictate, and that sort of thing. But when I realized it was really going to be big is actually when I got an Amazon Echo, I think was for Christmas in 2015, I believe, and just being able to talk to this cylinder, and have it talk back to you and start songs, and you could still talk to it while music is playing. And obviously Siri was out there. But now, it’s basically an ambient voice conversation. It just blew my mind! And then I found out that you can actually write software for it. You can write programs for the Amazon Echo. Back then, it was just called Echo. Now it’s obviously “Alexa”, and it’s a big ecosystem and everything. So I just really recognized that being able to talk to devices and have the full features of computers behind it really is going to transform things. Not that it’s going to take away the capabilities of mobile or anything like that, but supplement it in such a great way. I started looking at ways that we could program voice and got involved very early in the Alexa’s software development ecosystem and just took it from there.


All right, great. So Paul?


Yeah. So I came from an AI machine learning background prior to getting into voice, and that was sort of my segue-way into voice and where the interest came from. So actually, before I met Stuart, I was pretty hot and heavy for Alexa and the idea of building these interactive experiences. So I was looking a lot at Lex and Alexa on the Amazon side, and it just seems like a natural segue-way coming from that AI background and thinking about how these devices and experiences can be more conversational and just the technology caught up to where my head was. With the opportunity with Stuart, who had this vision, at the very beginning of what we’ve built, everything just seemed to align.


So I’m going to go ahead and share, and I apologize, I don’t mean to hijack the point, but for me I recognize it was really important with my daughter and the iPhone I got her when she was 11 years old. We were driving in the car for a three hour drive, so I just started making small talk. And I talked to her about her best friends, and her top three surprised me. It was Siri. So I wasn’t sure if she was making a joke. But we dived into that. And she goes: “Oh, you know, Siri, she’s always there. She’s talking to me.” In the context of an 11-year-old world perception, she really did not understand this concept of AI or bot. For her, it’s a voice that’s got a name, and is communicating to her. Sometimes it doesn’t make any sense. In fact, maybe that’s a lot of the times, especially in the early days, but now you can fast forward with where we are. I also have younger kids, a 2- and a 3-year old, and one of their favorite things to do is interact with Alexa, playing the hide and seek game. I don’t know if you guys have done that or not.





It’s just this construct where you can’t have a tangible game or UX –we are thinking about what that looks like, but in a voice context. For me, as I fast-forward to two or three years from now, I don’t exactly know how voice is going to look like but it feels like the opportunities for us are significant.


Yeah, absolutely. We were out in San Francisco, speaking a couple of weeks ago, and what we noticed just walking around the streets of San Francisco is half the people, probably more than half, maybe 80% of the people have their AirPods or headphones on. So once those have the capabilities, the voice assistants, built right into them, which they’re starting to do, obviously Siri is built in the AirPods, it’s going to be huge. It is just all over. It is everywhere.


So you’ve worked with a couple of market research agencies on voice surveys. What do you see as really exciting in that space? And what do you see as a material challenge at this?


Yes, that’s a good question.

The companies that we are starting to work with is in a really very exploratory way, which I’m sure we’re going to find out at the NEXT conferences. People want to see how can we utilize the voice’s system, whether it’s Alexa, Google Assistant, and Siri at some point to get data, get information, get feedback, get surveys and take them. So the agencies we’re working on right now are taking our survey platform, which is called Survey Line, and they’re basically building surveys similar in the way you would build a SurveyMonkey survey in the web app, and they’re showing them to their clients that are maybe big consumer product manufacturers or just product companies that have panels of testers out there. And they’re basically helping them to say: “Some of the things you’re doing to collect data and do market research in consumer research can now be potentially done by voice.” So they’re looking at things where they may have people coming into the homes and doing surveys by hand. And they want to lower the cost of doing a survey and consider also the convenience factor for the panelist. One thing we’re finding right now is the agency is looking at doing very interactive surveys that have a real voice behind it. So you actually have a voice actor, a voice talking the person through a product. “Pick up the product”, “Hold it in your hand”, “How does it feel in your hand?” And it’s basically through the voice assistant that way. They’re building some of the longer interactions. Some of the challenges we are working on right now is just the cadence, the pausing and stages because sometimes you might want to pause and say: “Well, you do this for a little while, and then come back and tell us what you thought about that.” And so those things aren’t as intuitive on a voice assistant because it wants to work just back and forth, back and forth. We have got some things that we are modifying to make it work in an environment where essentially the market research agencies want a hands-free experience. They do not want to have the person go to a phone or go to a laptop or any kind of tactile interface at all. They want it hands-free. And that’s what’s perfect for surveys by voice or a voice survey. But in some situations when they do something with a product and then they come back and say that. So some of the challenges, like I said, are related to cadence, pauses and delays and just getting that interaction as natural as possible, knowing that you’re still dealing with essentially a computer. As you know, IVR has been around since the 80s. So we’re taking what that had done, and saying: “Hey, this could be done on a voice assistant” and be done even better because you have full programming capabilities, you have really voice behind it, and so forth.


Stuart, I want to get to an example, if you guys have one, of a voice based survey. But before we do, Paul, I have a question in context of AI. It’s a term that we have heard a lot in market research over the last years. And the actual nailing it down in terms of how it applies and improves an outcome has been a little bit squishy in our space. Can you talk to us a little bit how the role of AI in a voice context?


Yes, absolutely. So I think, out of the box, Alexa and Google Assistant do a lot of things very well. A lot of the reasons that it’s improving over time are due to the machine learning and artificial intelligence that Amazon and Google are leveraging themselves. But we have found that there is still a gap. What we have tried to build, and I think what is successful, and what developers of voice solutions are doing, is they’re building their own sort of contextual AI. Using surveys as an example, we have actually created sort of our secret sauce to make the survey experience much smoother for the user because out of the box you run into a lot of things with Alexa skills and Google actions, where she doesn’t understand exactly what you’re trying to do. And if what you say or what she heard doesn’t match exactly what’s been predefined in those voice solutions, those skills and actions, then it can fall down. So coming at it from a pure voice developer standpoint, to me artificial intelligence, which can be a buzzy sort of word –we hear that term all the time – it just means basically having a layer of algorithms and logic that can make sense of what the user is actually trying to do with the intended action is and giving them that results. So that’s how we approach it. And I hope that answers it.


Got it. Yeah, that makes less sense. Again, going back to the simplistic example of the hide and seek game on Alexa. In that framework, it doesn’t feel 100% human, but it also doesn’t feel like going back to Stuart’s point about IVR in the 80s. It’s definitely a lot closer to the human interaction experience. All right, Stuart. Do we have an example?


Yeah, yeah, So what I’m going to show now is an example we put together at the Voice Of The Car Summit, which was a few weeks ago, that brought together people who are involved in bringing voice assistance into the car. Obviously, there’s a lot of companies involved in that, and the car companies themselves, the voice assistants, and so forth. So this example that I’m going to show is related to a survey that would be done after you go to a retailer, a restaurant or, in this case, a fast food restaurant. So everybody’s familiar with the receipt you would get at a movie theater or retailer, a restaurant, which says: “Oh, thanks for coming. Please take our survey.” And it has a long web URL that you have to type into your phone, or what have you. Who does that? Who is going to go, and take the survey, and type all that in from a printed receipt? So the idea is that we can just ask our voice assistant to take it. So the receipt would say: “Just start the McDonald survey” or what happens to be. And in this case, we did a demo for a McDonald’s fast food survey. So what I am going to do is I am going to say the wake word, and it is going to start the survey, and it would be what someone would do in the car as they are pulling out of the drive-through. So let’s give it a shot!

–Alexa, start the McDonald Survey.

–Hello, thanks for visiting McDonald’s today. We want to hear about your experience. This survey should take less than a minute, so let’s get started!

How often do you visit this McDonald’s location?

–About once a week.

–For the next few questions, rate your satisfaction on a scale of 1 to 5, where five is excellent.

Rate the speed of your food delivery.


–Rate the quality of the food.


–Rate the friendliness of the McDonald’s staff.


–Finally, yes or no. Would you recommend this McDonald’s to your friends and family? –Yes.

–That’s great. We’re happy that you’ll recommend us! Thanks again for stopping at McDonald’s and taking our survey. Next time, try our new Mushroom and Swiss signature burger.


You had to throw in the advertisement at the end, of course. But I really like this. So there’s a couple things that popped for me. I will start at the beginning. First of all, thinking about programming that survey, is that hard? I know that creating Alexa skills in general at a basic level is relatively…


-I will let Paul hit that question.

-Yeah, it’s hard to make it extremely flexible. So we built a platform to do that. I think anyone could probably build an Alexa skill or a Google action that is built around a very specific set of questions and get responses. To sort of take it to the next level, we really wanted to build something that was really self-service and something that we call a platform. But I would say the challenges were probably with supporting those different question types and collecting responses in a way that match up with what the survey creator was trying to get. So if they are looking for a rating, we have a lot of validation around. So if instead of 1 through 5, the person said “6”, we got to make sure we come back and tell that person gently: “Okay, that’s not the right answer.” And then maybe play the question again, things like that. So it’s really just having the experience of as a conversation it could be. Then, from a programming side, it was just really building the platform to support basically any type of question and answer back and forth that someone wants. And we tried to make as conversational as possible.


I do think it be really funny if you did an out take version where the correction was something like: “Hey, jackass, it is only five.”


Yes, that would be good.


That’s really feeding the point of the impact of user experience in context of feedback. You really have an opportunity to help enforce brand inside of consumer feedback nowadays. In truth, we always did. I think we are just actually starting to pay attention to it more as an industry now. But having that friendly voice is such a better experience, to your earlier example, than just having to go manually and put a URL into a web browser, which is just like filing taxes.


Yes, for sure. We think there’s multiple benefits to it. We just think it’s another way. I mean, obviously there’s other ways to take surveys. But one of the things we really like is just how we have seen some creativity with some of our customers, who are doing things like having a user do a feedback session while they are experiencing the product, which is difficult to do any other way. But voice lets you do that. So maybe while you’re trying shampoo, or whatever, and you got an Echo in the bathroom, you could actually be answering questions on how does it feel?, how does it lather?, and things like that.

We are seeing some creative stuff, and we just love that. That’s why we try to build it as just as open as possible.


In context feedback, I think it is the part that is going to be interesting for market researchers, and I don’t mean that in a narrow way but in a broad way, anybody that’s interested in consumer feedback is going to find to be tremendously valuable because the in-moment experience comment is the most valuable feedback versus the degradation of feedback because of the delay in that Q&A.




What kinds of content or insights are you actually capturing beyond the obvious answers to the questions?


It is all centered on that: the responses, as you probably are aware. So Alexa and Google don’t give anyone like raw access to the audio itself. You can’t set up a skill and then get the audio file for what exactly that user says. I want to hear their voice. It doesn’t work like that. That’s for privacy and good reasons. So they do a great job with the Speech-to-Text. So it relies heavily on Alexa and Google’s natural language processing and Speech-to-Text capabilities. So we have different question types that we support. So we support asking a user for a rating from 1 to 10 or 1 to 5, whatever they want to set up, “yes/no”, and then multiple choice, of course, and then free form, which is really wide open. So if you just want to ask for the user for some comments, things like that, we have that capability. And we have just added a new question side. We just call it “mobile phone” but it’s basically the ability to collect contact information from the user. And the way we’re implementing it right now is if the user wants to supply that, they get a text, and you need to lead to that phone, and it sort of makes that connection with the brand or whoever is conducting the survey. So we’re looking at different ways to provide value there. But as far as the actual insights, we are just looking at providing as accurate a set of data as we can per survey and then our customers will gleam the insights they want from that data.


Is the data set like a CSP file?


Not exactly. So right now that’s how it happens.


So really easy to integrate into whatever platform they’re using for their analytics. Is there any additional metadata that you are gathering, in a traditional web or web-based platform? You know, you’ve got a host of stuff like time stamps and a browser version, maybe even location?


Yes. Well, we could get what device they are using, whether it’s Google Home or Alexa on them, and then within that which type of device they get. We can only basically get whatever we are given by the platform, the voice platform, and the Google Assistant. But there is some metadata and Paul and I work with that and we provide that to some of the clients.


Got it. I’m going through the site, I literally just purchased my breakfast this morning from McDonald’s. That is maybe not an endorsement of my health, but I do like McDonald’s a lot. So I finished going through the check out. How do I get that? What’s the trigger event? Is it in cars? Is it later? How does that survey get served up to me, so to speak?


Yeah. That’s basically going to be the challenge. I think going forward, Jamin, it is basically what we call the “voice call to action”. And the “call to action” could be in so many different forms. It could be like in this example that we gave, it could be on the receipt, and it just says: “Launch the McDonald survey”, or whatever engagement or voice action they want to start. It could be printed on a product and say: “Tell us what you think. Just say to your voice assistant XYZ”, and we would obviously rebrand it to that product or to that company or whatever they want it to say. So that’s going to be going forward. That challenge is how do you implement that call to action? We are working with a company now that does direct marketing. And they have huge brands like Wells Fargo and these companies that do massive amounts of direct marketing. And they’re adding voice response into it because they get somebody who could get something in the mail and it would be “Go to our website” or “Call 1-800 number”. Well, now you just interact with us through voice, and it would launch essentially a voice interaction, which could be a survey asking them a few questions and based on how they answered those questions, it could do different things and contact them that way.

But I think it is going to be tricky because it is going to take time for companies to say:  “Where do we want to put this call to action and what should it say?” And that’s something that we can help with to a certain degree. But we’re not the experts on that so much as they are.


It seems like that’s a big partnership opportunity that you’re talking about.

I am thinking about in the market research space, we have got a host, whether it’s Dynata or others, very large market research sample providers, and I don’t know how big the industry is between $2 billion and $4 billion, if they have a voice enabled device as a variable inside of their profiles, then maybe there is that trigger event that could happen. There’s a lot of bubbles in this scenario, but assuming that there was an app that was tracking geo.


So just think of emails and how many people do SurveyMonkey or Qualtrics surveys or Zoho or whatever. Most of the time they are asking people to take their survey because they are sending out an e-mail or it’s on social media or somewhere. That could be supplemented, maybe not replaced, but supplemented with “Do you want to do it with your voice assistant? Just say launch bla-bla or “Start the XYZ survey” or whatever it is. Now, obviously, surveys by voice need to be friendlier in a voice contact so you can’t take every SurveyMonkey survey or web-based survey and just copy and paste it into a voice survey because there’s just nuances, and cadence like we talked about before, that are necessary. It’s better sometimes on a screen to sort things or see a lot of multiple choice answers, and that doesn’t lend itself to voice. But the call to action could be in similar way to the way SurveyMonkey’s and online surveys are done.


So in 2023, it’s projected that there’s going to be about $80 billion that are going to be purchased through voice. This is for me a massive number, and I’m seeing in my own user behavior that I procure or buy stuff through my Alexa device, more of the CPG type of stuff is what we’re doing. Google and Amazon of both very aggressive and gobbling up the generic brands. I know that has been well documented. So, like generic paper towels I believe is now owned by Amazon.


Yes, Amazon Basics.


So, in a voice-based, consumer journey, which is invisible, I don’t have any opportunity to intercept the consumer if I’m Scotts, for example. Why isn’t voice a bigger deal right now for the CPG spaces? Or if it is, are they just operating in secret? As a consumer, I’m just not seeing a lot of investment, and as a practitioner, a lot of research. There is not a lot of noise in the space about investment that’s being made in this invisible consumer journey.


It is a good question, and because we’re in the voice industry, we do see a lot of internal investments of companies that are building things now, but they don’t want to just rush them out to market. I was in the healthcare space for a long time, and they’re actually wanting to get voice capabilities for patients and for doctors and so forth. But the brands, like you said, they’re taking a slower approach, and they’re doing a lot of internal testing and building things. But they’re also looking at “How do we get on there?” Because Amazon and Google have basically a native interface. And as soon as you start talking, Google and Amazon know who you are. But until you open a skill or an action or some interface with that brand, they still don’t know who you are until you somehow give them permission. So it’s much more difficult beyond Amazon and Google to get that.

So that’s why we’re building in things like getting contact information right through it, and doing look-ups with codes and so forth so you could just put a code in. But there is a lot of investment going on by brands and also ad agencies. The agencies are basically thinking: “How can we get into voice?” And it’s slow for a couple of reasons, mainly because they are just trying to figure out how it all works. But also they want to be careful not to roll something out that’s half baked.


I did some analysis earlier this year on voice ratings. I was using it as a surrogate for app utilization in voice. Unfortunately, there is not a corollary. The number of rate ratings, for example, does not mean that’s the utilization of the app. But having said that, it still is really interesting to see what products are apps that are being used in a voice-based context on just frequency. And so one of the things I thought was really cool was that, I believe it was GM, that has an auto-start, voice skill to the vehicle. Again, I am intuiting. I live in California, but it’s cold outside so I’d like to start my vehicle ahead of time so it can warm up. And that’s the extent of the skill, which is very highly rated. But they were the only automobile manufacturer, even including Tesla, that had any voice-based app that was in the top 100, I think is what I pulled. So it feels like just transparency in terms of what apps are being used and by whom could be a big opportunity for whether it’s a company like yours or even a company like Nielsen for communicating to the industry what is trending from a user experience perspective.


Yes, exactly. I think it all gets back to that “voice call to action” until people know what to say to their voice assistants, what to ask of their voice assistants, and have a prompt.

It is going to take time because people know how to say: “What’s the weather like outside?” or “What’s the sports score?” They can turn their lights on and off; I have all that up in my smart home, and it’s a great way to play songs. That’s the biggest use case of all for the smart speaker. But I think once brands and companies and different entities start doing a voice call the action where they say: “Well, here’s our website. But then, if you want us by voice, say this”, you will just see that take effect. And then, it’s going to take some time, like you said: 2023, $80 billion. I think in 2023, you will just see a lot more calls to action “Hey, engage us by voice!”


Yeah, that’s right. It is the whole user journey that has to trickle down to just the knowledge of how to interact because it’s invisible. You don’t have those user prompts that you would have. I go to the Star Trek example, right, where you know you had the computer, and then there was a constant interaction with it, and then they would give commands to the computer to transfer controls to whatever. Are you seeing that as one of the maturing use cases or potential use cases where there is a voice-based Instagram feed and then the person asks to transfer that to his/her phone or something along those lines?


Not yet. It’s just kind of too much of a reach for somebody to know to do that. But I think once there is a good use case that it actually gets habitual… It’s our about habit! Turning on lights and doing things that are IoT, and if they do them often they get that habit, and then you get that. But if somebody doesn’t know to do something, to your point before, there is really no, no visual interface. In most cases, you have Echo show and you have some visual. Actually, that brings up a good point, Jamin, I do actually give it commands when I see things on my Echo show that they prompt me with an article or to do something. Right now it needs that prompting or call to action. As more companies put them out there, you’ll see more use cases, and then people won’t even need to be prompted. They will just use them. But we are still in pretty early stages on people doing things like transferring things. It will come but it’s going to take some time.


So what is one practical take-away that our listeners can gleam right now from your upcoming talk at NEXT?


I think the biggest thing is that feedback and surveys or just getting anything from a consumer or an end user, an audience is doable by voice with your own branding, and it’s just now becoming possible. So what we are going to show at the NEXT conference is basically a platform that allows you to create a survey like SurveyMonkey, but it could be branded for yourself so that it’s your own voice. It’s not the Alexa voice or Google Home voice. It’s your own voice throughout the whole thing, like I showed, and you get the data, you get to ask what you want to ask, and that the user is happy with the experience. So that’s what we’re going to show, and it’s evolving. It is still very early stages in this. But as we improve our platform, we are leveraging the capabilities that are being improved by Amazon Alexa. And then there’s also obviously Cortana and Bixby and some others, and Siri. Once Siri has the capability of program, Siri will have that as well.


If somebody wants to get in contact with you, how would they do that?


Yes, the best way is just go to our site, which is


Got it. My guests today have been Stuart Crane and Paul Cornwell of Voice Metrics. Thank you both for joining me on the Happy Market Research podcast today.


–Thanks a lot. We enjoyed it.

–Great. Thanks for having us, Jamin.


Everyone else, for more information on the Insights Association’s NEXT conference, to hear speakers like these fantastic gentleman and others, please join us in Chicago June 13th and 14th. You can also find more information on our website Have a great rest of your day, and I hope to see you there!