dan nessler
digital experience designer

5. The age of conversational UIs

Google, Apple, Microsoft, Amazon, and Facebook bet on Conversational UIs.

 

“In 1950, artificial intelligence pioneer Alan Turing famously proposed what came to be known as the Turing Test: the proposition that a machine had achieved intelligence if it could carry on a conversation that was indistinguishable from a human one (The Washington Post and Dwoskin, 2016)”. Conversational UIs fueled by AI have been omnipresent in science fiction films, series, movies, books and culture (Figure 4) for a long time (Brown and BBC America, 2010 and Hogan, Whitmore, and The Guardian, 2015).

Figure 4: Sample collection of film and TV show artwork

Human-machine conversation is nothing new but it has recently celebrated a renaissance with conversational UIs – visual or nonvisual – at the core of it. Early 2016 saw the big five technology companies become the most valuable businesses in the world (Entrecoder, 2016). They now share the top five positions (Figure 5) of the most valuable US public companies ranked by market cap (Business Insider UK and Leswing, 2016).

Figure 5: List of the most valuable US public companies ranked by market cap

Analysing these companies (Figure 6) is not only due to their leading positions in the stock market. They have the power to provide ecosystems on the hardware and software side an entire industry and consumers rely on. They create dependencies and best-practices other companies adopt, and they introduce standards others follow. In 2016 every single one of these companies has unveiled its vision for the future of computing, which seems to be conversational. “In the future, we’re going to talk to our computers (The Verge and Vincent, 2016)”. Apple, Google, Microsoft, Amazon, and Facebook are heavily investing in AI technology and launch corresponding products that enable a more human man-machine conversation in some form (x.ai and Mortensen, 2016).

Although these companies follow different strategies, they have one thing in common: They embrace conversational UIs, where the main form of interaction is a conversation, either by using written or spoken word or combining both. These companies have not only introduced end-consumer conversational products, but they have also opened up their technology via publicly available APIs to third party developers and therefore the public. Apart from smaller companies benefiting from these openly available platforms, there are a lot of companies developing their systems, services and products. Countless startups are also building chatbots that do things like schedule a meeting, conduct a basic Q&A with a job candidate, or collect daily pain reports from a patient.

At this point in 2016, none of these passes the Turing Test yet (The Washington Post and Dwoskin, 2016).

Figure 6: “The Big Five” Overview

5.1. What is a conversational User Interface

“Conversational UI is a very hot topic nowadays, but it is not working for every solution (Stan, 2016)”. To understand what this trend and the leading companies are about let us define conversational UIs.

Godlewski and Yeti (2016) describe them as any user interface that can mimic human interaction, whether typed or spoken. Spoken or written communication is used as an interface for humans to interact with machines. A more general look at what communication and conversations are also seems relevant. Communication is an established and researched field. Shannon’s (1948) often referenced communication model is a mathematical approach identifying various components of communication (Figure 7).

Figure 7: Schematic diagram of a general communication system (Shannon, 1948)

  • Source: The "information source", which "produces a message or sequence of messages to be communicated”.
  • Sender/transmitter". It "operates on the message in some way to produce a signal suitable for transmission over the channel”.

  • Channel: The channel is "merely the medium used to transmit the signal from transmitter to receiver,"

  • Receiver: It "performs the inverse operation of that done by the transmitter, reconstructing the message from the signal."

  • Destination: For Shannon, the destination is "the person (or thing) for whom the message is intended".

  • Message: The concept, information, communication, or statement that is sent in a verbal, written, recorded, or visual form to the recipient.

How does this terminology relate to today’s elements of conversational UI and related topics and how might we adopt it? Mortensen and x.ai, (2016b) refer to the “sender” or “transmitter” in the form of software programs or intelligent agents. Godlewski and Yeti (2016) on the other hand put emphasis on the form of the message in the interaction, whether typed (text) or spoken (voice). Mortensen and x.ai, (2016b) add that Conversational UIs dispense with touch gestures we have been accustomed to. Instead, users interact with software on platforms like email, SMS, Slack Messenger, using only text or voice. The main mean of interaction is natural language (Mortensen and x.ai, 2016b).

The machine side of a conversational UI may be referred to Shannon’s (1948) model using the following terms:

  • Information source: artificial intelligence, human intelligence, a scripted database or a mix

  • Transmitter: software, device, tool e.g. a scripted Bot or an intelligent agent or mix in the form of a messaging app, smart voice assistant, etc.

  • Message form: Voice, text, or other forms of signals

Early conversational UIs e.g. AOL chatbots used to be dismissed as simple programs, regurgitating pre-made phrases. Relating to the latest efforts of the big five tech companies, conversational UIs seem to become the next technological age after the app age. Microsoft calls it "conversation as a platform and Google says it wants computers to have an "ongoing two-way dialogue" with their users (The Verge and Vincent, 2016).

5.2. The role and importance of a conversational UIs and messaging

To reveal reasons for this shift from touch or mouse gesture based Graphical User Interfaces (GUI) to speech or text-based conversational interfaces, a reflection of the human-computer relationship seems valuable.

The main problem between computers and humans is that they do not speak the same language. Unless one is familiar with code and terminal syntax, humans have relied on GUIs. But these user interfaces have come with a natural barrier. People need to learn how to use them (Mielke and Smashing Magazine, 2016). “The power of the conversational interface is that it shields the end user from having to learn anything new (Eyal, 2015)”.

Graphical user interfaces have shortcomings. Despite efforts to standardise GUIs, applications and websites still use different menus, shortcuts, and processes. This makes it difficult for users to get jobs done. Furthermore, software is often sandboxed meaning that users cannot control one application from within another (Partyline, no date). “When messaging becomes the UI, you don’t need to deal with a constant stream of new interfaces (Aube and TechCrunch, 2015)”.

Language is the most natural form of interaction for human beings. Voice is naturally being developed, and written language is a core skill usually learned in early age and applied in messaging. Going beyond human-human communication Zumbrunnen (2016) says: “Humans are designed to think in conversation. Therefore it is the most natural way to interact” (Figure 8).

Figure 8: Visual of messaging key numbers

Aube and TechCrunch (2015) claim messaging makes for a better UX than traditional apps because it feels natural and familiar. At the same time, Virtual assistants or bots that live inside messaging apps can help users accomplish multiple tasks e.g. arranging a trip, shopping online, ordering a ride or banking (Mielke and Smashing Magazine, 2016). Surveys and experience point to the fact that people are more comfortable writing to communicate for business rather than speaking on the phone. Processflows.com states that 32% of people would prefer text than phone and itproportal.com predict that by 2020, 40% will mainly interact with software via chat or voice technologies. Mark Zuckerberg states: “We think you should be able to message a business in the same way you would message a friend” (unit4 and Staven, 2016).

In China, the messaging app WeChat already combines e-commerce and real-world services. “The conversational element of WeChat, for example, is becoming a primary kind of interaction point (Grant, 2016)”. WeChat may be used for almost everything from paying bills, hailing a taxi, booking a doctor’s appointment, sharing photos and chatting. The app counts about 700 Million users in China (Mozur and The New York Times, 2016)”. At the same time, other messaging apps such as WhatsApp continue to grow (Pew, August 2015).

According to BI Intelligence and Business Insider UK (2016), messaging apps surpassed Social Media apps in 2015 (Figure 9). They state, the first stage of the chat app revolution was focused on growth. In the next phase, companies will focus on building out services and monetizing chat apps’ massive user base. Although media companies and marketers are still focusing on social networks like Facebook and Twitter, messaging services will change this by building out their services and providing more avenues for connecting brands, publishers, and advertisers with users (BI Intelligence and Business Insider UK, 2016).

 

Figure 9: Messaging vs. social networks (BI Intelligence and Business Insider UK, 2016)

For now, it remains unclear how conversational UIs can be made to work in a practical sense (Connolly and Intercom, 2016). Aube and TechCrunch (2015) point out that graphical UIs will still outperform messaging based services and tasks. “Conversational apps are currently good at only a particular set of tasks (Aube and TechCrunch, 2015)”. As it often goes with such technological innovation, trends and hypes there often seem to be one crucial element of the equation be neglected. The human being who is expected to use the conversational interface (Connolly and Intercom, 2016).

5.3. Bots, Assistances and AI

The previous chapter does not only reflect on the role of conversational UIs in today’s technological landscape, but it also introduces some terminologies that are often used in the field. When it comes to the definition of terms in the field of conversational UI, there seems to be a lack of differentiation. A lot of articles fail to define or explain the meanings behind the terms and their use. This chapter tries to fill the gap acknowledging the fact that there are and might be overlaps, redundancies or different interpretations. The following terms will be examined:

  • Bots and chatbots (used interchangeably in this context)

  • Smart and intelligent assistants or agents (used interchangeably in this context)

  • Artificial intelligence including Machine Learning and Deep Learning.

5.3.1. Bots and chatbots

Relating back to conversational UIs in general, Mortensen and x.ai (2016b) deliver a simple definition of the term bot. They define bots as simple software programs that users interacts with via a conversational UI. Bots communicate to the user via some sort interface e.g. audio or text. A bot’s capabilities depend on the way it is programmed. Schlicht (2016) differentiates two kinds of bots (Figure 10). One relies on strict and defined rules the other one on machine learning or artificial intelligence.

Figure 10: Scripted vs. AI fueled bots

5.3.2. Smart and Intelligent assistants or agents

Mortensen and x.ai (2016b) point out that bots and intelligent agents are not the same. While bots operate through a conversational UI, intelligent agents sometimes but not always do. As an example, he mentions Google’s self-driving car, which is an intelligent agent run by pushing a start button in addition to speaking to it. Intelligent agents share the characteristic of being autonomous (Mortensen and x.ai, 2016b). Going back to Schlicht’s (2016) definition it may be pointed out that “smart” or “intelligent” imply capabilities to learn and teach oneself.

Whether the differentiation between bots and smart or intelligent assistants or agents is relevant to the end-user is debatable and will not be explored any further within the scope of this study. The terms bots, chatbots, smart and intelligent assistants or agents and occurring terms will be used interchangeably in this study.

5.3.3. Artificial Intelligence, Machine Learning and Deep Learning

As the Financial Times and Waters (2016) say, “Machine learning has brought artificial intelligence (AI) back into the technology mainstream”. Companies such as Google, Facebook, Microsoft, Amazon, and Apple have been paving the way making advanced technologies like AI and machine learning as well as digital assistants openly available through their platforms. This empowers anyone to deliver new and intelligent applications (unit4 and Staven, 2016). Gartner (2016) lists Advanced Machine Learning as a leading technological trend. It describes it as what makes smart machines appear “intelligent” by enabling them to understand concepts in the environment and to learn. A smart machine may change its future behaviour through machine learning.

John McCarth and Stanford University (2007) describe Artificial Intelligence as the science and engineering of making intelligent machines – especially intelligent computer programs. They see it related to the similar task of using computers to understand human intelligence.

Intelligence in this context is the computational part of the ability to achieve goals in the world. Nvidia and Copeland, M. (2016) define the relationship between artificial intelligence, machine learning and deep learning as following (Figure 11):

 

Figure 11: AI vs. ML vs. DL, Nvidia and Copeland, M. (2016)

Artificial Intelligence — Human Intelligence Exhibited by Machines

As Nvidia and Copeland, M. (2016) reflect, it was in the 1960s when AI pioneers dreamt of constructing complex machines that possessed the same characteristics of human intelligence. This is the concept thought of as “General AI”, and it has widely remained in films so far. In reality, we mainly encounter the so-called “narrow AI” that includes specific tasks like image recognition, which shows some human intelligence. This intelligence comes from Machine Learning (Nvidia and Copeland, M. 2016).

Machine Learning – An Approach to Achieve Artificial Intelligence

Nicholson, Gibson, and deeplearning4j (2016) describe Machine learning as a subset of AI. All machine learning counts as AI, but not all AI counts as machine learning as they conclude. Machine Learning is the practice of using algorithms to parse data, learn from it, and then determine or predict something. Instead of manually coding software routines with a specific set of instructions to accomplish a particular task, machines are “trained” using large amounts of data and algorithms. They give them the ability to learn how to perform a task. This is still based on hand-scripted rules to identify certain patterns. If the source data is corrupted, machines tend to fail. This is where deep learning comes in (Nvidia and Copeland, M. 2016).

Deep learning – A Technique for Implementing Machine Learning

Deep learning is considered a subset of machine learning. "Deep" is a technical term in that sense as Nicholson, Gibson, and deeplearning4j (2016) state, and it refers to the number of layers in a neural network. Artificial Neural Networks come in as an approach from early-machine-learning. They are inspired by the understanding of the biology of our brains and the interconnections between the neurones. "Artificial neural networks have discrete layers, connections, and directions of data propagation (Nvidia and Copeland, M. 2016)". Different layers fulfil a specific task and pass it on to a next layer until a final layer produces a final output. Each layer measures and weights the correctness of the previous layer’s output. The system depends on a lot of data and training (Nvidia and Copeland, M. 2016).


Previous chapter:

Next chapter: