Voice/Automation/Customer service • 5 min read

How to Use a Voice Chatbot & Realtime API for Customer Service and Beyond

What is a call center bot, IVR chatbot, and how do leading companies use a voice chatbot to transform their customer service — read all the details in our blog post.

Marie Avandegraund
Marie Avandegraund
Feb. 10, 2023. Updated Oct. 29, 2024

At first, it seems that the voice chatbot technology is hard and complicated. 

However, voice chatbots have already become a solid solution for many call-centers. Furthermore, with OpenAI's release of their Realtime API, voice bots have leaped to a new level of efficiency and ease of creation. We've had the chance to put this powerful tool to the test, and we are excited to share all the details about its capabilities and the advantages it brings. 

Call-centers performance

The average call center receives 4,400 calls per month. This number includes all picked up, missed, blocked, and dropped calls. 

200 calls per day, 1,000 per week, 4,400 per month, and 48 missed calls per month.

The volume of calls varies based on factors like business size, geographic location, peak hours, and seasonal shifts such as holidays or special events. To ensure the highest efficiency, call-centers typically implement specific strategies to manage incoming calls. These strategies may include utilizing interactive voice response (IVR) systems, or voice chatbots.   

What Is A Voice Chatbot?

Voice chatbot, also known as, voice assistants, voice bots, voice interactive agents, or voice enabled chatbot, is an AI-powered software used to interpret inbound calls or requests and answer them using voice. Сhatbots with voice recognition are most commonly used in customer service, marketing, and sales campaigns.

Voice bots are traditionally referred to as call center bots, used to process incoming callers within contact centers. There are two most popular types of voice chatbot. For simple repetitive inquiries like order status or appointments, companies can use IVR bots

IVR chatbot (Interactive Voice Response) is an automated telephony system interacting with callers through keypad inputs. The system includes a menu of options for callers to navigate, allowing them to access self-service options and transfer calls to a live operator if needed.

If a company needs to automate FAQs, provide detailed responses with customers' details, or their customers don't like pressing relevant keys to resolve their issues, they can implement conversational IVR

Conversational IVR is an AI-driven software program that is used to simulate conversations with customers to answer questions and provide other services. It utilizes speech recognition and NLP capabilities to understand the customer's query and provide appropriate responses.

Discover the Power of the Realtime API

OpenAI has recently unveiled a beta of the Realtime API, which enables seamless, natural speech-to-speech interactions, featuring 6 preset voices that bring your applications to life.

In the past, crafting a voice assistant experience required developers to juggle multiple steps: transcribing audio using an automatic speech recognition model, feeding the resulting text into a reasoning model, and finally converting the output back into speech with a text-to-speech model. This multi-step process often led to a loss of emotion, nuance, and accents, not to mention frustrating delays.

Realtime API simplifies this journey into a single API call. Furthermore, it allows audio inputs and outputs to stream seamlessly. This means conversations can flow naturally — so much so that customers may not even recognize they're chatting with an AI, with the added bonus of handling interruptions automatically. With these advancements, developers can now create voice assistants that feel more human and responsive than ever before. 

Here are the standout features and enhancements:

Enhanced Natural Language Processing. Experience conversations that flow effortlessly, as the AI better understands context and nuance.

Improved Voice Recognition Accuracy. The AI now captures words more accurately, ensuring clearer communication.

Faster Response Times. With its near-instant response capabilities, the API dramatically slashes wait times, paving the way for seamless, uninterrupted conversations.

Support for Multiple Languages and Accents. 6 uniquely crafted voices are available now, each designed to resonate with human warmth and nuance.

Effortless Integrations. The Realtime API effortlessly weaves into various platforms, empowering businesses to seamlessly incorporate AI voice interactions across a multitude of applications.

The Realtime API establishes a seamless WebSocket connection, allowing an ongoing dialogue with GPT-4o. This powerful API embraces function calling, enabling voice assistants to not only engage in conversation but also spring into action based on user requests. 

For instance, a customer can simply ask for the status of their order using their voice. The AI voice agent seamlessly accesses this information via the API and provides an immediate voice response, keeping the client informed in real-time.

Benefits of Using a Voice Enabled Chatbot

Considering the current landscape, the most compelling use case to leverage a Real-Time API is Customer Support. In the past, the reluctance to implement voice-enhanced chatbots stemmed from their subpar performance. However, with OpenAI's Real-Time API, the user experience has significantly improved, enabling chatbots to handle complex customer inquiries with ease.

This technology is also beneficial for small businesses, like local repair services, where teams consist of just 1 to 5 people. Often, these professionals are on-site, unable to respond to customer inquiries. As a result, they risk losing potential leads when clients need urgent assistance but can’t afford to wait for a delayed response.

However, the Realtime API's transformative power extends far beyond. Imagine the ripple effects in industries like healthcare, eCommerce, and banking — the possibilities are truly limitless.

Healthify, the innovative nutrition and fitness coaching app, leverages the Realtime API to bring its AI coach, Ria, to life in engaging, natural conversations. When users need a deeper level of guidance, Ria seamlessly connects them with human dietitians for tailored support, ensuring a personalized journey toward wellness.

Speak, the dynamic language learning app, harnesses the power of the Realtime API to fuel its immersive role-play feature. This interactive element invites users to dive into realistic conversations, making practicing a new language not just effective but also fun and engaging.

It isn't worth implementing a Realtime API for product details — it would directly inflate prices. Instead, think of an AI voice agent with the Realtime API integration that effortlessly provides package tracking or banking balance inquiries. Therefore, Realtime API is a perfect fit for quick interactions.

Is it safe to implement a Realtime API?

The Realtime API is fortified with robust safety measures designed to minimize the risk of abuse. This includes a vigilant system of automated monitoring complemented by human oversight for flagged model inputs and outputs. 

As with all OpenAI's API offerings, the Realtime API adheres to the developer's stringent Enterprise privacy standards, ensuring that the company does not train its models on any inputs or outputs from this service without your consent.

As we look ahead, OpenAI is anticipated to reduce the costs associated with Realtime APIs, possibly introducing new, more affordable models. However, this evolution may take several months, so staying ahead of the curve is crucial.

Benefits of Powering a Chatbot with Voice Recognition

High Accessibility

Globally, over 2.2 billion people (or more than 20% of the population) live with near or distance vision impairment. While only a small fraction face complete blindness, many still struggle to see clearly or read with ease. And there is also dyslexia, a condition that affects the accuracy and speed of word recognition, particularly when reading. The Yale Center reports that it touches the lives of 20% of the global population.

This makes it essential to consider accessibility when designing any tool, including chatbots. A voice chatbot offers a powerful solution for individuals with visual impairments, serving as an inclusive alternative to traditional text-based chatbots.

Fewer Outgoings

Voice chatbot can help businesses save on customer service costs by speeding up response times, freeing up agents for more challenging work, and answering up to 80% of routine questions. They let you save money on customer service representatives while offering an ultimate customer experience 24/7.

Lower Abandonment Rate

As reported by Voxco, the typical call abandonment rate in the industry falls between 5–8%. This metric tracks the percentage of callers who hang up before reaching an agent. A high abandonment rate often signals the need for more staff, as customer frustration tends to rise when they are left waiting.

Enter voice chatbots — these digital assistants can seamlessly manage incoming calls, handling routine inquiries and escalating only complex issues to human agents. This not only reduces the burden on teams but also ensures customers get quicker responses, improving overall satisfaction.

Less Missed Calls

We've all felt that wave of frustration when we are met with phrases like, “Please wait, the page is loading,” “Your call is very important to us, please hold,” or “Leave a message, and we’ll get back to you as soon as possible.” According to a Salesforce report, 69% of consumers prefer to use chatbots because they provide instant responses.

In traditional contact centers, the gold standard is to answer 80% of calls within 20 seconds, with an average speed of answer (ASA) lingering at 34.4 seconds. With voice chatbots, the number rises to 100% of answered calls and 0 sec. waiting time. Additionally, voice chatbots can increase the resolution rate and decrease the call duration from an average of 5 minutes to less than 3 minutes.

Multilingual Support

AI voicebots excel at navigating multiple languages, thanks to advanced voice recognition and translation capabilities. They can interpret and respond to various dialects in near-real time, bridging language gaps in customer interactions.

As businesses expand globally, the demand for multilingual support has surged. However, relying on human employees alone is challenging. For instance, in the U.S., 90% of employers value multilingual skills, yet nearly a third face a talent shortage, with 25% losing opportunities due to this limitation. 

Mastering new languages takes time and effort for employees, but AI sidesteps this challenge with ease. A single machine learning model can fluently handle dozens of languages. Take ChatGPT, for instance, which operates in over 50 languages.

In-Depth Analytics

AI voice chatbots can transform help desk conversations into valuable insights on customer sentiment and employee performance. By analyzing tone and word choice, these tools gauge satisfaction levels without needing formal surveys.

Lastly, voice chatbot solutions cut staffing expenses while guaranteeing seamless, round-the-clock productivity.

Book A Free Consultation and get a live demo of the voice chatbot! We'll dive into your specific use case, guide you toward the best-fit technology and solutions, and provide a tailored demo of similar implementations that match your needs.

When to Use Voice Chatbots

Let's tackle when voice chatbots are especially recommended:

  • Your main channel for customer support is call-center;
  • Your customers prefer calling you over texting;
  • Your customers have any vision disabilities that makes them harder to use text chatbots;
  • The main point of live chat is to get contacts from customers to proceed further communication over the phone.

Real-World Voice Chatbot Examples & Their Outcomes

Domino's Pizza: Voice Ordering Assistant. Domino's implemented "Dom," a voice bot for Domino's ordering apps available for both iPhone and Android that allows customers to place pizza orders through voice commands. The system handles menu queries, order placements, and even payments. Users can seamlessly order, customize, and track their pizzas, creating a convenient and hands-free experience.

Nowadays, 70% of Domino's orders are placed online, with AI and natural language processing (NLP) revolutionizing the ordering experience. The company has also reported a remarkable 160% surge in voice-activated orders, showcasing the powerful impact of AI-driven technology on enhancing customer interactions and simplifying the entire process. 

U.S. Bank: Mobile App Voice Assistant. U.S. Bank has integrated a voice assistant into its Android and iOS apps, allowing users to handle their banking needs through conversational language, much like speaking with a live bank teller. 

This feature can process various requests, but if a task exceeds its capabilities, the AI connects users with human bankers via text or a phone call. U.S. Bank has been exploring virtual assistant technology for nearly a decade, refining the service to improve customer experience and streamline digital interactions.

What technologies are used for voice chatbots? Can we use ChatGPT for voice chatbot?

In general, companies use:

  1. Automatic Speech Recognition (ASR);
  2. Speech-To-Text technology (STT) to convert voice phrases into text phrases. Speech recognition accuracy rates are 90% to 95%;
  3. Then AI algorithms, such as Natural Language Understanding (NLU) to understand the meaning of the phrases;
  4. Finally, Text-to-Speech technology to convert text phrases (answers) into voice phrases.

As you can see, businesses can leverage Generative AI models, such as GPT-4 Omni or ChatGPT, instead of NLU systems to accelerate text comprehension and response generation, which can then be seamlessly converted into voice output.

However, it's worth checking out what ChatGPT and GPT chatbots have to offer. Firstly, GPT supports voice, which enables voice chatbots. Secondly, with GPT, it is much faster and cheaper to train a generative AI voice assistant that will talk like ChatGPT. There are many ways to prompt a voice based bot to eliminate hallucinations, mistakes, or biases, and make it behave like a company's representative. 

Check out this clever voice assistant demo hacked together with GPT and Siri:

How to make the voice sound less robotic?

Nowadays, there are several AI voice softwares that can generate top-quality spoken audio for a contact center in any voice, style, or emotion. These tools function similarly to how audio streaming services offer a vast library of music, but instead, they provide a diverse range of pre-recorded voices. For example, a voice generator like LOVO.ai can generate 500+ high-quality voices in 100 different languages. Also, if you want unique sounding and go extra for your customers, you can hire a voice actor like we did with Honda Australia and Leo Burnett.

IVR or Conversational IVR?

With the rapid advancement of voice technologies, especially innovations like GPT-4 and ChatGPT, the days of struggling with misunderstood voice inquiries are fading fast. These sophisticated systems ensure more accurate interpretations, offering a seamless experience. It's no wonder that businesses initially adopting traditional IVR are increasingly making the switch to Conversational IVR, drawn by its enhanced capabilities and user-friendly interaction.

However, regardless of which option you choose, BotsCrew can help you with both and expand the voice automation when needed. 



Voice chatbot implementation: BotsCrew edition

At BotsCrew, we explored a range of innovative use cases for voice automation, applying it to apps, social projects, and advertising campaigns. Each project allowed us to push the boundaries of voice technology, enhancing user experiences and driving engagement in unique and impactful ways.

Honda Harvey 

IVR chatbot for an anonymized genetic-testing company

Numbers speak better than a video presentation.

  • 25,000 - 30,000 calls per month;
  • No 24/7 support, only Mon-Fri, 9am - 5pm;
  • Customers waiting for days to receive their replies;
  • The client grows 25-30% annually, so does the number of CS requests;
  • ~35%of calls volume were not handled annually;

After phone calls automation:

  • 12 000 automated calls per month;
  • $5,544.00 savings per month;
  • 1400 hours saved per month;
  • 0 min waiting time compared to an average 4 min;
  • 3 min resolution time compared to an average 7 min;
  • 0 missed calls compared to 9k per month.

Ready to Create Your Own Voice-Powered Chatbot?

Not sure about how to build your own voice-powered chatbot? We feel your pain. With smart, voice-enabled AI becoming a must-have for businesses, the fear of falling behind is real.

That is why we've cracked the code to help you innovate quickly — crafting a clear vision, discovering the true value, and bringing your voice chatbot to life faster than you'd expect. We offer generative AI & GPT solutions, in particular, to build top-notch voice-based chatbots. Contact us, and we will reply within 1 business day!