Mark Wyner – How Speech Technologies Will Change the UX Landscape @ UX New Zealand 2017
Mark Wyner – How Speech Technologies Will Change the UX Landscape @ UX New Zealand 2017


(audience clapping) – Welcome. I’ll try to follow up all
these amazing speakers today. Really good conferences. I heard a lot of good information. People touched, interestingly, on a lot of things I’m
gonna talk about today. So, let’s get into it. The voice of UX is not
your mama’s UX, okay. This is an evolution of communication. And that is at the core of
voice-based aural UX systems. And that’s what I want
to talk about today. So the first method, if we go back and we look at
the origins of being human and the origins of human communication, we find ourselves at speech, okay. And this first method of
communication was two way. I speak, you listen. You speak, I listen. And writing came along much later, you know on the timeline. And when it did, what they’ve learned is that
the earliest forms of writing, the style of writing, was in the manner of talking. It was still written in
this two-way fashion, this two-way style. It wasn’t until much later we got the Gutenberg press, and we had the ability
to produce publications that could reach far and wide. Later we had electricity
and other technologies, radio and television, which enabled us to communicate
to much broader audiences over greater distances, but in this new manner where
it was one way communication. Information was broadcast, we would communicate, and that’s where that
conversation would end. And thus it wasn’t even a conversation. But it was still communication. Then the internet comes along and we have this whole new
world of communication. It’s sort of a blend of this two-way and one-way communication. Because now it’s interactive and we’re given tools
like keyboards and mice and other ways of reaching vast audiences. But now we are communicating in two-way. And so now we are communicating with a huge audience of people. We have access to millions
of other human beings. And in this new capacity, but in this new vehicle, yet in the original form of speech which is two-way communication. So today, emerging, aural UX, voice-based UX systems. We mostly remove the visuals, the displays. We mostly remove the tools that we’ve had. And we return to this primal
form of communication, this two-way dialogue. Only this time, it’s between human and machine. We are now talking with machines. But we have this primal instinct about the way that we communicate
with other human beings. And when we have that conversation, there are some nuances
that we need to get into, especially as UX designers. This brings about many new considerations, especially for us. You know, as Ash has
talked about the ethics, and the responsibilities that we have as designers and what we
put out there in the world, we have to think about
this very carefully. So let’s explore this. Primarily the art of conversation, okay. As we return to voice based communication, we return to conversation, which is different from
just communication. Conversation, very important distinction. And conversation changes everything about user experience design. So verbal conversation, verbal communication is our
most natural and primal form of communication, as I
just mentioned, right. But when you’re talking
about system communication, it’s very awkward. It’s not something we’re used to. And we have very sophisticated systems that we can communicate with, like Siri, and Google Assistant, and Alexa, and we can have seemingly
normal conversations with these machines. And it feels natural,
and it seems natural. But we have this array of other devices, like our Honda Odyssey,
which isn’t that old, which has a very clunky
system inside of it. And every time I want to say anything, I have to push a button
on the steering wheel, wait for a prompt, and then say something. And then if the automobile speaks to me and I want to respond and I want to reply, I have to go through
that whole process again. And that’s an important
distinction because it changes how we communicate. It changes this natural primitive way that we have conversation. But we now have these subtle variances and the awkwardness of how
we have those conversations because of the systems
that we’re designing. But when you think about
natural language processing, which is at the core of how
we communicate with machines, how natural is that anyways? This right here, this phrase, time flies like an arrow, fruit flies like a banana. This is something called
a syntactic ambiguity. And this is something that linguists use when they’re assessing the dialogue and the dialects of various languages, and they’re studying the
evolution of communication. They look at something like this and say how does this transpire across all these different languages? Like how our communication evolving. And not just across different languages but within one language. Because as our own language evolves, we have all of these other considerations. We have context. We have colloquialisms. We have slang. It’s all this evolutionary
dialog that we’re creating. And it changes conversations. John McWhorter is a linguist. And he has this Ted Talk, Txting is killing language J-K. It’s absolutely awesome. I highly recommend it. Check it out. He’s an incredibly smart human being. And he has in this talk, he talks about this
evolution of human language. He talks about conversation. And he talks about something
that linguists refer to as pragmatic particles. Pragmatic particles, an example of one is L-O-L. Laugh Out Loud, right. And he talks about how what they look for, linguists, when they’re looking for
the evolution of a language, they look for these pragmatic particles that create what they refer to as markers. And in this case, L-O-L, he makes a reference to how L-O-L has evolved and changed
the way we actually speak to each other as human beings. Because the evolution of L-O-L has evolved from Laugh Out Loud, which was its original meaning, to a marker of empathy
that linguists refer to. And he cites this conversation. I just sent you an email. L-O-L, I see it. So what’s up? L-O-L, I have to write a 10 page paper. And he’s talking about no
one’s guffawing here, right. You know there’s nothing
funny about sending an email, and there’s certainly nothing funny about having to write a 10 page paper, right. And this is an evolution. This is how we communicate. And then we see this
move out into the world beyond the technology that we’re using. So you look at hashtags, which were originally
designed for tagging content for easier findability
and indexability right. But like my 10 year old son would say, I’ve done something #LikeABoss. It’s part of communication. And you in this room would understand. You know, potentially my
grandparents may not understand. Potentially, people in
a developing country that don’t use the internet, you know. As Alessandra referred to earlier, we were talking about
ways of communicating which are unnatural to people
who don’t have that context. You have something like this hashtag where there’s no relevance. My daughter, my eight year old, was texting me. And she loves this sheep here, this pink sheep. And you know she, it’s this YouTube video
thing they’re watching. He always says prankster gangster. And so she’s texting, she was teaching me how to
text is what she was doing. And so she said #prankster gangster. And then she says, “There you go, “you’re getting better at this.” You know, you’re learning
how to text, Dad, because now you referenced this hashtag. And it was really funny. But is something which is not a part. It has no relevance in this context. This hashtag does nothing
for this conversation. But it’s a way of communicating
that we understand, and my eight year old understands. So when we’re talking about conversation, a really important component
of that is perspective. And this is really important. I’m gonna get into why this
is important in a moment. In UX, UX designers, we often assume perspective, and we assume context. And we do our best to truly understand the perspective of our audience. But a lot of times we make assumptions. This image right here, this immediately, there was something about it, and I didn’t know what it was, it took me a second, and then I figured it out. Sitting in a physicians office, waiting for the doctor to come in and I saw this map on the wall. And this is a fire escape route. So this is really important information. You’re gonna fucking die if
you can’t follow this map. This is not a board game, right. And so this is really important. And I’m looking at this
and I’m thinking okay, this is on the wall, adjacent to the door where
I’m leaving the room. Now if you look at that map, and you consider this very carefully, this is the perspective I need. Because it’s on the wall, and that’s the door that I’m exiting, and I need to turn left, not right, not like this. I’m probably gonna figure that out. Many people will figure that out. I’m not doubting the
intelligence of human beings. But I am gonna get into
this idea of cognitive load, especially in situations which are paramount to our survival. And something like perspective
and how that can impact everything about our moment in time. So here’s another element of perspective, which is really interesting. And this bothers me all the time. I ask Siri, “Remind me to pick up
my book at Powell’s.” And Siri says, “OK, I’ll remind you”, and says pick up my book from Powell’s. And when I get this reminder
the next day at 2:00 p.m., I’m gonna read it and it’s gonna say pick up my book at Powell’s and that’s gonna make sense to me. And I’m gonna pick up my
book ’cause it’s my book. But I’m having a conversation. In this moment when I’m
creating this reminder, I am having a conversation
with this machine. And when I’m having this
conversation with the machine, there’s a very important
thing here that’s happening. If I ask Ash to remind me tomorrow that I need to pick up
my book at Powell’s, he’s not gonna say, “Sure, I’ll remind you to
pick up my book at Powell’s.” He’s gonna say, “I’ll
remind you to pick up “your book at Powell’s.” Now this is subtle. This is a nuance and it may not seem huge. And weird shit bothers
me all the time like this that nobody else thinks about. But think about the cognitive load that is required as subtle as it may seem, it changes our perspective
on this conversation that we’re having with a machine. I’m asking that to remind me about my book. But yet it, the machine
is saying pick up my book. And so there’s this element. Sometimes it’s subtle. Sometimes it’s subliminal. But it’s there. And it impacts our cognitive load, and it impacts our ability to communicate with the machine, because we’re trying to get shit done. So let’s talk about this. Let’s talk about cognitive
load and working memory. Working memory, some
people refer to its sibling as short term memory. But it’s not the same thing. Part of cognitive load is working memory. Working memory is the human
brain’s version of RAM. It is this temporary data
storage that we have. And many, if not all of you, have experienced at some point in time, walking into another room, and going into that room, and saying you know I’m going over here, and there’s this thing that I’m gonna get, or something I’m gonna say. And you get to that room and you’re like I have no idea why I’m
in this room, right? This has happened. And that’s failure of working memory. And that’s a perfect
example of something that’s incredibly critical when
we’re thinking about interfacing with a machine using a voice-based system. So, something that I learned when I was researching for this talk, a very interesting thing, researchers have learned that
there’s a direct correlation between cognitive load and our pupil dilation. Isn’t that fascinating? You can literally measure the cognitive load of a human being by measuring the dilation
of his or her pupils. So during this process, they wanted to use this measurement. And they wanted to conduct… They used this measurement system to research the cognitive load of responding to aural
versus visual tasks. So somebody telling you go do this thing versus seeing something
written or printed, you know or even on a screen
that says go do this task. And they learned that the cognitive load, much higher when receiving aural tasks. So if you think about
this high cognitive load, and you think about the ability for us and our working memory to failure, and you think about now I’m
interfacing with a system where nothing about it is visual, you now have something
that’s very challenging, something that UX designers
need to respond to. And this other element of
cognitive load, we choke. Athletes choke, speaker’s choke, they forget what they want to say. It happens all the time. This is related to cognitive load. Sian Beilock is a professor in psychology and she notes this. She says choking is
suboptimal performance, not just poor performance. It’s a performance that is inferior to what you can do, and what you have done in the past. And it occurs when you feel pressured to get everything right. So she’s talking about shit that’s easy for you to do, that you do all the time. And when you feel that pressure, and the cognitive load goes up, and your pupils dilate, and all these things are happening, and you forget, and you fail, and you don’t do things that
you know you can do them. And this is important, especially in voice-based UX. Because voice-based UX
are based on timers. It’s a system that requires
prompts for call and response. And when you don’t meet that timer, the pressure is on. And you can fail. This is performance anxiety. So this is something we’re accustomed to in the origin of the web, right. You have this blinking cursor. That blinking cursor is
gonna wait all day long. You can go to lunch. You can got to the pub. You can come back. Wake your computer from sleep. Blink. What do you want? What are you searching for? That’s not gonna happen in voice-based UX. Voice-based UX asks us to make haste. Siri, you know so you
gotta think about this. Like how do you reduce
this cognitive load? And you think about this
element like Siri, right. And you think about
like, and I’ve done this. It’s a prompt, the timer goes, I hear the beep and I go, “Remind me tomorrow to pick up that book.” Shit what was it called. Oh, and then she heard
me and then there it is, and there’s my reminder. And sometimes I’ll just
leave them that way ’cause I’m like fuck, I
know what I needed to do. I know it’s a book. But then I see the prompt. But this happens. This is real. Many of you may have experienced, if not all of you, this type of performance anxiety. And this is related to cognitive load. So interestingly earlier, we saw another slide title, path of least resistance right. Mental models, that’s how we as UX designers create this path of least resistance. This is how we decrease cognitive load. And so let’s talk about that. There’s a pianist by the
name of Clariece Paulk, and she says, “People don’t know what they like. “They like what they know.” Love that quote. I think it’s beautiful, and I think it’s highly accurate. For everyday living and
for what we do as a living. Because we are creatures of comfort. We love comfort. And mental models feed our comfort. Mental models make the new seem familiar. I’ll say it again. Mental models make the new seem familiar. Mental models are how we create comfort. Mental models are how we create
a path of least resistance. Mental models help us
decrease that cognitive load, decrease that choke-ability, decrease that performance anxiety, which is something
that’s inherently natural as a part of voice-based UX systems. So think about this. Tesla, they release this first car. And their very first, that’s
Model S I think it was. Unprecedented touchscreen
environment, this dashboard. This didn’t exist in automobiles. And they were like, you know what, let’s take a tablet. Everybody likes touchscreens, let’s like throw that shit on
the dashboard and there we go. Like, we’re done, right. And I’m not even gonna get into why it’s highly inadvisable to be like fucking with a touchscreen while you’re like 60 miles an hour. Like, I don’t know where
the volume is for this. But you know so, that’s a whole other talk. I’ll give that next year. But there’s no mental model
for this touchscreen right. And so they refer to
a design mental model. They go back to this UI design
model of skeuomorphic design. And they’re like well, if we make it look like a button, then people will know how
to use this in the car. And this is very important. And what’s cool about this
is when we’re thinking about these mental models and how we apply them, search is a big part. That’s like a great place
we can look at this. Search is an integral part of
everything we do on the web. Search is a big part of
our voice-based UX systems. We search. Search was a big part of
the web just to begin with. We use the web to search for information. This is inherent, core, origin of the internet. So let’s look at visual,
aural, and multimodal search. Visual search, let’s begin there. This is something we’re all accustomed to. This is where we’ve been. This is where we began. Lindsey Horan, my favorite midfielder in the NWSL, United States of America. Thorns, she’s plays for
our Portland Thorns. Amazing human being. And if I want to find more
information about Lindsey, and using this search engine, I get all these results. I can scan this page. I can whip through it. There are mental models for
what this page looks like and the layout and all this information. It’s quick, it’s easy. It’s done, right, no problem. Now, you have people
with visual impairments. And they can’t scan this page. So screen readers. That’s the alternative. That’s the adaptability. That’s what we used to help people and to accommodate, right. And so this is what a screen reader, which is distilling headers
only would return, okay. Now is this reasonable for an aural return for screen readers? They said yes, and this is what you return. But if you’re thinking
about new base systems which aren’t necessarily for people with visual disabilities and impairments, then you have this whole other system where now lots of people get involved, and now they’re really concerned, and they want to really
make an awesome system. So is this a reasonable aural return? Probably not. So Google says, well what
is a reasonable return on something like this? This is a multimodal experience. So you have Google
Assistant on your phone, you speak into the phone, and you get the visual results. And Google says well, for one, tiny screen, let’s limit this space. Do we want all these results? How can we simplify this? Single return, the right information, I can probably guess. They did, she’s there, done. Now, here you have something
like Google Assistant, which now crosses two modalities. So you first have your mobile phone where you have multimodal experience. So I speak and I see a visual return. And then you have the Google Home which has no visual return. So it’s a single aural-based modal. And now, you have these
multiple modalities that you have to go across, and we think about mental models. And Google says well, okay. That aural return, the list of the headers, do we want to return that on Google Home? Do we want this list? And then if we return this
list of all these headers when we’re searching for Lindsey Horan, how do we reference which one? And where do we want to go from here? So Google says well let’s distill this down to one. One result, the most meaningful information, done. And then we do that on the mobile phone. Now you have a mental
model for how you can take this with you across
multiple modalities. This is a smart adaptable
decision that Google made to handle this type of
design decision making. Accessibility. We gotta talk about accessibility. Talked about it, touched
on it with screen readers, it’s really important. This is my son, Cassius. He is 10 years old. And he can only read and
write a handful of words because he has a developmental delay. 14% of Americans, according to the department of education can only read at his level
and write at his level. 14% of all American adults are reading at a level that is the same as my 10 year old Cass, who by academic standards in America is well behind the other
children of his age. Okay, and that’s a
significant number of people that we have to accommodate. This changed his life. This voice-based UX system. The ability for him to go find
Star Wars Lego mini figures on his own, you know. To be able to find videos. To be able to just take
control of this system where every 10 year old boy wants to watch these great Star Wars episodes, and whatever else, and Lord of the Rings. This is important. And this is really life changing. This aural-based UX
system that we’re creating has an immediate benefit for people, for the illiterate population. Now, an important element here when we’re talking about accessibility. We’re accustomed to a visual web. This is where we began. Now we have this visual. We made accommodations for people with visual impairments, and we created screen
readers, and voiceover, and other things, right. They adapt reasonably well. They’re not amazing but they do adapt. For hearing impairments, that’s a much bigger mountain to climb. How, visually, do we accommodate a voice-based UX system? So, something like this. Amazon Echo. They create the Echo, you speak. They create this thing called VoiceCast. And you can pair it with a Kindle, a visual, and so now it’s got this makeshift multimodal experience. I speak, I get the visual return. Not ideal, loose accommodation but
it’s an accommodation. And then, just now they just
realized this Echo Spot. This is a true multimodal experience. So they’re saying okay, this is now an accommodation, you know for people with
hearing impairments. And it’s a great benefit
to everyone else, right. This is a really intelligent
UX design decision, saying this is gonna meet everyone’s needs all at once. And this, you know it seems so simple, and it seems so obvious. But not everyone gets to this point. And they made a really smart decision when they went through this process. So let’s talk about the language barrier. This is a whole other arena. And Alessandra talked
about this earlier today. She talked about this language barrier. There are over 7,000
languages spoken worldwide. Spoken, written, read, worldwide. 7,000 languages. Devices can be localized, right. We have lots of devices, different computers, things. You can ship the same device to 10 different countries, and each one can be ready to go and localized in that
particular region’s language. So I want to talk about that. This slide’s for you Ian,
wherever you are, Ian. This is one of my favorite
football defenders in the entire world. He plays for Manchester United. This is his name. Now, I can type his name
into a visual search engine and viola, there he is. There’s my man, okay. But this is not Eric Bailly as you might pronounce it. He’s a French man from Ivory Coast. And his name is pronounced Eric Bye-ee. Now this brought a shit storm into my life when I was communicating with my devices, I will tell you right now. (audience laughing) Hey Siri, what national team does
Eric Bye-ee play for? That’s what I got. Not quite, not quite. Let’s try, okay. Let’s just try Google. Okay Google, what national team does
Eric Bye-ee play for? Eric Berry. We got sports, but we’re
not quite there yet. What if I add some context? What national team does the
soccer defender Eric Bye-ee play for? There he is. Google knows. It knows that I searched
for soccer players. Now it gets it. There’s some context. It can deduce all of this information and say he’s probably not
looking for Eric Berry. He’s probably looking for Eric Bailly. I misheard him because
I’m a fucking computer that can’t understand Eric
Bailly’s pronunciation, right. So this is artificial intelligence using its machine learning, using its context and understanding that I search for soccer
players all the time. It’s my favorite sport in the world and I research all the
time and now it knows. And the next time I ask
it without this context, what national team does
Eric Bailly play for, there he is. I get the results. Now, this is really important because what this system, originally, if you look at this system
and this original search didn’t have a return, and it took me providing, my providing some context, and then asking again to
get back to this place where I want to have a natural, primal based, comfortable conversation with a machine, it put the onus on me to
adapt my search inquiry to get the results that I wanted. That’s something for us to consider. That’s a language barrier. So let’s look at context. Censorship specifically, privacy, things like that. You all know this. Or at least those of you with
potty mouths like me, right. This is the bane of our texting existence. (audience member woo-ing) (audience laughing) So, visual environments, I can see what’s trying
to be censored, right. Now I know what I’m typing. He’s not a duck, right. Sort of. But let’s just, you know. I can tap that little x and it’s gone. And then okay, now we’re getting back
to business here right. Speech to text where I don’t really know
what’s gonna be printed. Not so much. This is a text I got
from a friend of mine. Holy F, just hilarious, right. I’m thinking that’s hilarious, you know. What’s the deal here? She says, “I love the
way my voice activation “doesn’t spell the word F out.” What? I’m like thinking about this and I’m like, I thought she was just
kinda being sensitive. I mean I don’t know why
she would be sensitive with me and cuss words. But she was just like, you know I thought
okay, this is happening. No, no. She said that the machine is
censoring what she is saying and not printing her cuss
words into the text message. I’m like what. What kind of device are you using. I’m gonna use this in
my talk in New Zealand. F’n Droid. My Droid is an A-hole. But that’s a story for another day. (audience laughing) So let’s talk about privacy. Public spaces where privacy
or auditory services are a concern. I don’t want to say my PIN number. I don’t want to say my password when I’m talking into a
system or a credit card number or something like that. I don’t want to be disrespectful when I’m talking to other human beings. So how do we adapt to that? We have to think about
that as UX designers. So Google Assistant, the app on our phone, great, you tap this little keyboard icon, suddenly you don’t have to use your voice and you can tap. That’s an adaptability. This is a good multimodal experience that’s been converted to
a single modal experience because of the context, because of the understanding
that these UX designers had. And this is something which I’ve seen and I know a lot of you have seen this. Over the past couple years, especially feeds, like social media feeds, you’re scrolling through and videos started auto playing, right. This is a newer sort of thing where you used to have
to actually press play and tap on that. But now they’re automatically playing. So this brought something
new into the spectrum of watching these videos where, while they’re saying well people… It may be an inopportune time for audio to start just blaring out. If we’re gonna auto play videos, we gotta keep ’em silent right. But then you now have these video makers where people are hanging
out on the airplane, or they’re on the can or whatever, and they’re just like flipping through and seeing this video. And they’re watching it, maybe in silent, maybe looking at it, or maybe they’ll forget about
it or won’t watch it later and they’ll miss it because
it gets buried in their feed. So the video makers are saying, well, let’s take this. Let’s think about this for a minute. What if we hard code some transcriptions right into the videos? And I’m not talking
about closed captioning. Because if I’m not… If my hearing is fine, if I don’t have any hearing impairments, I’m not gonna have closed captioning on. But if these are hard coded. If these are printed into the video, now I can see them, and now I can watch a video on silent. And that’s a good UX decision. And thinking about this
single modal experience of watching a video. This is really intelligent. So I want to get into
this idea of personality and trust and control. And these for me are key components when we’re talking about
voice-based UX systems. Kah talked about personality earlier and he talked about it with words and words creating this
personality for us. He’s absolutely correct. And it is paramount in
voice-based UX environments. And I’ll tell you why. Conversations change the psychological and emotional relationships
that we have with machines because it makes those
conversations suddenly personal. These dialogues that we’re
having with machines, they trigger our Darwinian buttons. All the way back to the roots
of who we are as human beings, and how we communicate with each other. Suddenly, it’s personal because
we’re having conversation. This idea of speech is
very, very important. As subtle or subliminal as it may seem, the psychological and emotional
impact of that is huge. And especially when
you bring in the factor that we’re talking to machines
and not other human beings. So how do we get around that? Personality. Personality is a key component. It’s a key UX component
when we’re thinking about our relationships with these machines. Because when we think about personality, we begin to investigate a lot of things in this personal dialogue. The personality of this machine, the intonation of the voice, and other nonverbal communication cues. Because nonverbal communication
is a significant part of how we communicate, how we converse with each other. It’s very important. When you’re talking about
our facial expressions, our pose, our stance, we fold our arms, we roll our eyes when someone
says some ridiculous shit. And this is, you know you have Wonder Woman. Famous is Wonder Woman’s stance. And Amy Cuddy I think is her name gives Ted Talks on this whole thing on the power stance and how
it changes us psychologically. And it changes the perspective of us when people are looking at us. You know, like right now, look. See, now look at me. You all think I’m Wonder
Woman right now suddenly. It’s like this magic. But it does, it really does. And this is important. Nonverbal communication cues which are undetectable currently by AI. AI in environments
where there is no visual feedback mechanism. But for what is verbal, we can explore. We can begin to explore personality. You know that’s how we can sort of play with these cues that we might
be missing in other ways. But personality can harm
as much as it can help. Personality says, Ricky Baker anyone? Come on. This is New Zealand. He had to find his way into this. Personality says, “Do you trust me?” ♫ Ricky Baker ah ah Sorry, sorry, I’m done now. So there’s this thing called
the aesthetic usability effect. Please raise your hand if you’ve heard of the aesthetic usability effect. Holy shit, such a small
number of you have heard. This is amazing. Every single person in this
room needs to go look this up because this was life changing
for me as a UX designer, as a UI designer, anything I’ve ever done. The aesthetic usability effect. This is the idea that aesthetic designs are perceived as easier to use, and that their shortcomings
will be more likely to be forgiven. There was a study by the
Hitachi Design Center on human computer interaction in 1995. The results of that study, they noted this in their report. “Users are strongly
influenced by the aesthetics “of any given interface, “even when they try to evaluate “the underlying functionality.” So, people, they were saying
hey here’s this interface. Use it. Was it easy to use? That was the fundamental question. The pretty ones, more often than not they said,
“Yeah it was easy to use.” The ones that were ugly, they said, “Nah, I had some problems.” This is very real. This is a very real impact. And this because humans
like pretty things. We love beautiful art. We love museums, we love music. We love each other. We love to look at each other’s faces. We love to talk to each other. In any element of beauty, aesthetic is really important to us. And when we’re talking about this and the personality, and the realm of personality
in voice-based systems, we’re talking about the biomimicry of UX design. And there’s this designer
named Bert Brautigam who talks about biomimicry
being at the core of voice UX. And there’s a skeuomorphic layer we’re talking about at this core level of just how we interact. And then there’s this core level at the very center of this. And he says in this
article he wrote about it, he notes “biomimicry manifests itself “at a much deeper level than skeuomorphism “and concerns the how and the why “a product solves human problems “while skeuomorphism is the initial “and temporary literalism for
human interaction with them.” Personality is a critical component of UX. Because trust, right. At a minimum we need trust. I love this. Trust that UX systems will
be usable, functional, and delightful. As Ash talked about, you know. And this is the thing. Personality drives that trust. There’s been research on this. This is a big part of it. And you think about that. And you think well, personality applies to voice-based UX the way aesthetics apply to visual-based UX. Personality is the
aesthetic usability effect of aural UX systems. There’s a qualitative study
on personality and trust and the correlation between the two. And Robert Sicora, who’s one of the master
researchers from this, who is directing this research project, he says, “The findings
showed that agreeableness “was found to be a significant predictor “of propensity to trust. “The higher an individuals agreeableness, “and emotional stability, “the higher their propensity to trust.” So, when we’re thinking
about this, UX designers, we spend all this time
making things easier to use, making them more accessible
and usable to people. But will personality supersede
all of our best intentions to make a system easier to use when we’re talking about
a voice-based system? Will people trust systems
that they don’t like? Just like they think systems
that aren’t beautiful, the UI, they’re not very usable. There’s an element in control. Human beings love to control. We are controlling human beings. And when we have control, we feel safe, we feel secure, and we want to continue on that path. And when we don’t have control, things get scary for us right? So control is an element
of having some trust with a machine. And when we have got that trust because we love its personality, it’s a great recipe. It’s a great cyclone, personality trust. So I want to read this
passage from Old Man’s War, a book called Old Man’s
War, which is incredible. I’m an AI enthusiast. I give some talks on it. By John Scalzi wrote this book. And there’s this passage in there which is highly relevant. And I think it’s very realistic about where we will go with
these types of systems. And we’re talking about control. And it’s a system, and in this particular chapter, this character is setting up his new voice-based AI system. So this is an aural UX. I-A-I system. And he doesn’t want to use it, and he’s being forced to use it. And he’s like, no I don’t
want any part of this. I don’t trust this thing, I don’t want it. And they say, well let’s design
it so that you have control of the system. And this is what happens. “Many BrainPal users
find it useful to give “their BrainPal a name
other than BrainPal. “Would you like to name
your BrainPal at this time?” “‘Yes’, I said. “Please speak the name you would
like to give your BrainPal. “‘Asshole’, I said. “You have selected ‘Asshole’. “Be aware that many recruits
have selected this name “for their BrainPal. “Would you like to
choose a different name? “‘No’, I said, and was proud “that so many of my fellow recruits “also felt this way about BrainPal. “Your BrainPal is now ‘Asshole’. “Now you must choose an access
phrase to activate Asshole. “Please say your activation phrase now. “‘Hey, Asshole’, I said. (audience laughing) “You have chosen, ‘Hey, Asshole.’ “Please say it again to confirm. “I did. “Then it asked me to choose
a deactivation phrase. “I chose (of course) ‘Go away, Asshole.’ “Would you like Asshole to refer to itself “in the first person? “‘Absolutely’, I said. “I am Asshole. (audience laughing) “Of course you are.” Personality, trust, and control. I think that’s a very realistic passage. I think it’s very realistic. You provide control. You think about this
character in the book. You think about this character, it’s just like a real tough grease monkey, blue collar, military guy who’s got these recruits, and he’s like, “You’re
handing me this technology. “Shit, I don’t want to use it”, you know. And then he names it Asshole. And that makes him feel really good. You know, ’cause he’s in control. He knows, like I have the upper hand. I am better than this machine. And that’s really important. And now he has trust. ‘Cause he can say, “Hey Asshole, “what’s the weather like later?” And he can snicker or whatever. And that makes him feel comfortable. And that’s a very realistic way that we can think about UX
in the form of personalities and how that impacts trust. This is not your mama’s UX. This is an entirely new landscape of UX. And it’s around the art of conversation, personality, trust, and control. So as we move into this new territory, let us tread wisely. Thank you. (audience clapping)

1 thought on “Mark Wyner – How Speech Technologies Will Change the UX Landscape @ UX New Zealand 2017”

  1. Lulz Ash says:

    there will no space for web developers

Leave a Reply

Your email address will not be published. Required fields are marked *