Home Blog UX and UI What is Voice User Interface and why should you care?

What is Voice User Interface and why should you care?

Voice user interface (VUI) has been a regular feature of trending technology lists in the last few years. Perhaps that’s no surprise with widespread examples of devices using speech recognition, such as Apple’s Siri and Amazon’s Alexa. But VUI is more than just interacting with devices using your voice - it’s a sophisticated element of the user experience that presents both advantages for the user and challenges for the designer. Read on to find out what VUI is, what it offers, and the unique issues involved in using it.

What is Voice User Interface and why should you care?

Table of contents

What is a voice user interface?

A voice user interface uses speech recognition to allow users give voice commands to a device. The first VUI tech – then called interactive voice response, or IVR – was developed during the 80s and 90s, and saw widespread use in the 2000s in the field of customer service. By recognizing common questions, these early efforts would direct users to recorded responses – effectively a VUI version of frequently-asked questions.

Voice User Interface examples

Right now, we’re in what’s often referred to as the ‘second era’ of VUI technology, combining speech recognition with natural language processing and artificial intelligence. Common uses and applications of current VUI tech are online shopping (e.g. Walmart’s one-sentence voice-ordering service), searching the web for information, music interfaces, and accessing real-time weather and traffic updates.

Popular examples of devices and apps in this second era include Siri, Google Now, Cortana, Amazon Echo and Google Home.

Why is VUI a steadily growing trend?

A simple answer to this question might be found in the popular use of smart speaker ‘personal assistant’ technologies, of which the best-known examples are Amazon’s Alexa and Google’s Siri – a patient machine, ready at a moment’s notice to answer questions, search out information, make notes or just play your favorite music – all without you needing to leave your sofa or lift a finger. Human-centric convenience is always an effective driving force for adoption of technology and smart speaker VUIs offer convenience in spades.

More specifically, interacting with a device or app using a VUI has a range of advantages that are driving up their usage, including…

  • Speed – Think about the difference in speed between saying something out loud and writing or typing it. Apparently, speech is at least four times faster.
  • Ease – No more searching through settings, help functions, or online manuals to find the right feature, with the right VUI you just need to ask. In general, a good VUI can be much more intuitive to use than even the most straightforward screen-based interface. What’s more, whereas in a more traditional UI, a feature will have a single, defined name, a VUI can be set up to respond to multiple common variations or synonyms, meaning that even for someone who has not previously used a specific VUI, they are likely to be able to do so. (This also avoids the user frustration that comes from manufacturers applying odd or unintuitive names just to differentiate their product from the competition).
  • Safe multi-tasking – Vocal instructions or requests can be made without losing focus on other activities, such as driving or operating machinery, which means less risk. What’s more, a device or app might combine a VUI with a more traditional graphical user interface, effectively allowing simultaneous dual input by the user.
  • Accessible design – There are obvious business benefits to a design that can be used by as broad a cross-section of people as possible. Speech recognition and VUIs enable people with some disabilities (e.g. visual impairments or repetitive stress injuries) to access devices and apps without relying on a visual interface. This accessibility factor also benefits the whole user population in certain circumstances; for example when there is a need to operate hands-free.
  • A more ‘human’ interaction – The act of speaking and conversing with another voice gives the interaction a more ‘human’ feel for the user. This impression can be heightened by designers using more empathic language for the VUI, giving a sense of personality to the exchange. Furthermore, the quality of the interaction with a VUI can be enhanced by programming different tones of voice, making the message more information-rich and, again, increasing the sense of talking to another person.

When to Use Voice User Interfaces (VUIs)

Users typically turn to Voice User Interfaces (VUIs) in several common scenarios where voice commands offer convenience, safety, and accessibility. Here are some of the key contexts in which VUIs are particularly useful:

Hands-Free Operation: VUIs are ideal when users need their hands free for other tasks. This is especially useful in scenarios such as driving, cooking, or when engaged in any activity that requires manual attention, allowing users to interact with devices without having to physically touch them.

  • Acessibility: For individuals with disabilities, such as visual impairments or motor restrictions, VUIs provide an essential way to interact with technology. Voice commands can enable these users to access information, communicate, and control their environment more effectively than traditional input methods.
  • Multitasking VUIs allow users to perform multiple tasks simultaneously without having to stop and type or navigate through a graphical interface. For example, a user can ask a smart assistant to set reminders or send messages while engaged in other activities like watching TV or working on a computer.
  • Efficiency and Speed: Speaking is often faster than typing, particularly for complex commands or when searching for information. VUIs can streamline interactions by quickly interpreting and executing voice commands, making them efficient tools for busy environments or when quick responses are needed.
  • Routine Interactions: Many users incorporate VUIs into their daily routines, using them to check the weather, play music, turn on smart home devices, or get news updates. This convenience is one of the main reasons for the growing popularity of smart speakers and voice-enabled devices.
  • User Experience Enhancement: VUIs often create a more natural and engaging user experience. Speaking is a fundamental form of human communication, and interacting with devices through voice can feel more intuitive and human-like compared to typing or clicking.

In these various contexts, VUIs help to enhance the functionality of technology, making it more accessible, convenient, and integrated into everyday life. Of course, in some digital products, especially those supporting accessibility, VUI can be one of the main features of the application.

Voice user interface design – factors and issues

As you might imagine, designing and developing a VUI involves unique factors and challenges for developers, to the extent that it almost requires a different mindset. How to design voice user interface?

From visual to verbal

Usually, UI design is highly focused on the visual elements of the interface. For a start, there is the basic fact that with a screen, the user can see and know what features and options are available with a minimum of interaction.

With a VUI, the user journeys are different, reliant on vocal commands and feedback instead of clever or esthetic visuals.

Interaction is conversation

Earlier, we mentioned IVR or interactive voice response, the earlier iteration of VUI technology. Being an earlier, simpler version, IVR tended to be a case of giving a voice command and the device carrying out the associated action. Modern VUIs, however, go further. Now, the interaction is more conversation-based and that brings a new level of design complexity – the technology interprets what it ‘hears’, even learning specific user preferences and modes of speech. This is what makes your smart speaker ‘smart’, and it brings in artificial intelligence and machine learning as potentially significant design elements.

What did you say?!?

Speech recognition is sophisticated technology, especially when you consider the wide variety of dialects and accents that a VUI must understand and interpret correctly. If your VUI communicates in English, is it UK English or US English? US, you say? In that case, is it northern or southern? East coast or west coast? Let’s face it, different neighborhoods within the same city often have different ways of speaking. The bigger the intended market or target audience for a VUI, the bigger the challenge.

Speech is resource-intensive

This statement applies to VUIs in different ways. First, there is the increased computational and processing power needed for a device or app to run speech recognition and response functions. Then there is the complexity of programming and coding a high quality VUI. Finally, there is the question of development resources: the team creating the app or device must have the skills, knowledge and experience necessary.

Data privacy and security

Encryption and secure data storage are more essential than ever when we are talking about people’s conversations, often taking place in the privacy of their own homes. A factor that puts some people off using smart speaker personal assistants such as Alexa, is that to function, the device is constantly ‘listening’ (or more accurately, capable of detecting speech and key words and phrases) which for some leads to privacy concerns. VUI technology must come with a strong end-to-end encryption infrastructure with full trust in the manufacturer.

Voice user interface design

As we said earlier, VUIs are trending. Despite the challenges that come with this sophisticated technology, the flexibility of application and ease of use of voice user interfaces makes them central to the future of UI and UX design. VUIs are a powerful tool that can greatly enhance and broaden the user experience, making apps and devices more appealing, and to a wider audience. One day, the keyboard, and mouse combo (and maybe even the touchscreen) will be a museum exhibit – maybe sooner than we think…

FAQ

Q: How do VUIs handle multiple languages and accents?

A: Voice User Interfaces (VUIs) manage multiple languages and accents by incorporating advanced speech recognition technologies that are trained on vast datasets containing diverse linguistic patterns. These systems use machine learning algorithms to improve their ability to understand and process a wide range of pronunciations and colloquialisms specific to different languages and regions. Developers often work with linguists and speech experts to enhance the VUI’s ability to recognize and respond accurately to a multitude of languages and dialects.

Q: What are the specific ethical concerns associated with VUIs?

A: The ethical concerns related to VUIs primarily revolve around privacy, data security, and user consent. Since VUIs can constantly listen to users’ surroundings to detect commands, they may inadvertently collect personal and sensitive information. This raises issues about how this data is stored, processed, and used. Ensuring strong encryption for data transmission, securing storage methods, and transparent user policies are essential measures to address these concerns. Furthermore, there must be strict regulations on how data is shared with third parties and how users can control their own information.

Q: How are VUIs integrated into existing technology ecosystems?

A: VUIs are integrated into existing technology ecosystems through APIs and SDKs that allow developers to embed voice interaction capabilities into various applications and devices. This integration is often part of a broader system that includes cloud computing resources, AI algorithms, and sometimes specific hardware components designed to optimize voice processing. In professional settings, VUIs are integrated to work with enterprise software systems, enhancing user interaction by enabling voice commands for various tasks. In smart homes, VUIs connect with other smart devices and systems through home automation platforms, allowing users to control lighting, temperature, security, and more through voice commands. This integration aims to create a seamless and intuitive user experience across different devices and platforms.