During last year’s Amazon Prime Day, the various Echo devices were a bestseller, same picture for Xmas sales. Now Amazon claims it has sold more than 100 million smart speaker devices¹. Google famously announced that there were now more than 1bn voice-enabled devices² for their Assistant. In this blog post, I’d like to take a closer look at the numbers and the use cases out there.
What’s Voice Commerce?
Before we dive deeper into the subject, let’s try to reach a common understanding what voice commerce actually is. To my knowledge, there is no scientific definition for this, so what’s the common understanding? Most people would say something like “commerce transactions triggered exclusively by using a voice service, either in the shape of an actual smart speaker device or a mobile app”. This definition, however, takes a rather narrow view of this process, because it entirely focuses on the transaction itself. In the world of “classic” e-commerce, this would be like saying, it’s only online commerce when customers run through the checkout and actually order stuff.
In reality, however, we all have heard about the customer journey: it takes a few steps for customers to find their product. They might find an ad in their social stream, look for this product online, maybe take a closer look at the product in a physical store, and then purchase it via an online marketplace.
Similarly, in the world of voice, it takes customers a few different touchpoints and moments to finalize their orders. They might ask their voice service about a service or product (“What do I need for a children’s birthday party), listen to the result, do a search online and buy coloured cups and an adorable pinata in a local store. Not to forget the after-sales. There are many skills out there which allow you to track order status.
In other words, a broader definition of voice commerce – which I will also use for the rest of this article – is a scenario, in which using a voice service is part of the shopping process.
Besides, if we really followed the narrow definition, the headline of this article would have to be “Amazon Voice Commerce”: Apart from a very few exceptions, Amazon is currently the only platform that lets users create orders via voice only.
Hardware vs. App vs. Platform
Let’s make one last differentiation: It’s important to understand that there is a difference between the actual voice platform itself, the app which is using it and the hardware it is built into.
Platform: This is the cloud service which is able to interpret voice commands, calculates the results in the background and returns an output by using natural language generation.
App: The application is the intermediary between the hardware and the platform and lives on a device, by it a native app on a mobile phone, an app installed in branded device (such as the Amazon Echo) or an app installed on 3rd party hardware (like Sonos speakers).
Hardware: This could either be a complex, multi-purpose piece of machinery such as a smartphone or a relatively dumb device, like a speaker with a few microphones and WiFi functionality.
For Amazon, it does not make a difference whether your voice is heard by an Echo device or using the Alexa app on your mobile phone. All devices are routed to the same persona, the same account. Of course, the more devices you own, the bigger the chance that their microphones pick up your voice. Plus, by placing those devices in multiple locations – in your living room, in your kids’ room, in your office, in your car etc. – you share something important as well: context. Typically, people’s habits change depending on where they are, and Amazon can use those puzzle pieces to enhance and personalize the experience. After all, according to …. 95% of all human-to-human interactions are non-verbal; and it is obviously very helpful for Amazon, Google and all the others to get an extra bit of information to get more than the remaining 5%.
Players and Figures
Clearly, the voice market is a vendor market, meaning that companies like Amazon and Google are making considerable efforts to convince customers of the usefulness of their products. This is done by heavy advertising as well as heavily subsidizing their hardware. Which brings us to the installed bases: the first question is: how many people across the globe are actually able to use a voice service. Let’s try to get closer to the answer and take a separate look at each of the vendors.
Amazon is dominating the voice space. After a closed beta phase restricted to Prime Members, they have brought their first Echo device to the US market in mid-2015. Allegedly, the company is building this platform since 2010 and is ever since using considerable amounts of resources. Amazon has confirmed that there are now more than 10,000 employees working with voice technology, which is quite a lot, even for Amazon’s standards. The retailer keeps churning out new ideas for devices, such as the Echo Show which has a built-in screen so people can also see search results – alleviating some of the pain that consumers experience in contexts where an image tells you more than a thousand words. But this is just one example: Amazon is famously experimenting with sticking Alexa technology in all sort of devices, such as clocks and even microwaves. They are also getting into other contexts and are now the voice technology of choice for the BMW group, who are building Alexa technology into their vehicles.
In other words, Amazon has a pretty good grasp on the market, their share is now … across the globe. This dominance also becomes visible in their efforts fostering their 3rd party ecosystem. Their skill directory now features over 50,000 skills developed externally. What’s more, they constantly try to make the process of developing voice skills easier, for example by launching the new Alexa hosted skills. However, while there are skills for all walks of life like news, entertainment, sports, and smart home, there are only about 200 skills in the shopping category. And out of those, there is only one by 1-800 flowers which lets you order something. With all the others, such as by Best Buy and Dell, customers can only check for special deals or request tracking information.
As far as shopping functionality is concerned, you can ask Alexa for products (there are some restrictions regarding the categories) and confirm the order by simply saying ‘yes’.
In May 2016, Google presented its Google Assistant platform as an app for Android devices, which later became available for iOS devices as well. At the same conference, they introduced the Google Home smart speaker, which had Google Assistant built in. After this initial product, there are a few others like the Google Home Max, Google Home Mini or the Google Home Hub, which also features a screen like the Amazon Echo Show. And like with Amazon, there are also 3rd party smart speakers which are now equipped with the Google Assistant technology, such as Bang & Olufsen and JBL.
Instead of “skills”, Google names their applications “actions”, and there is also a directory for those, featuring more than 1 million of them. (I find this very confusing and get mixed up between the actions which Google builds themselves and the 3rd party ones. Another source mentions about 2,000 actions. If anybody can shed some light on this, please do so in the comments.) Developers find tutorials and online services to build actions and add them to the Google Assistant directory.
The maker of iPhones and iPads introduced their Siri service already in 2010 with the introduction of iOS5. Now it’s an integral part of the Apple ecosystem and can be used on iPhones, iPads, PCs, and Apple Watches. Siri is mostly regarded as lacking quality and being too restricted; people use it mostly for device settings, setting alarms, and querying calendars. In 2018, Apple added the HomePod, a smart speaker running Siri, to their portfolio.
Officially, Apple supports 3rd party apps for Siri by providing SiriKit, but it seems that this platform doesn’t really take off. One reason might be the fact that in contrast to Amazon and Google, Apple’s business model doesn’t involve keeping huge amounts of profile data on their servers; instead, they have to rely on local data stored on users’ hardware to train their algorithms.
Last but not least, Microsoft is also part of the voice game. They introduced their Cortana service in 2014 as a component of Windows 10. It can do things like setting timers and answering questions from the Bing search engine. Microsoft has not released their own hardware yet, but build Cortana functionality into the Harman Kardon Invoke smart speakers.
Of course this was just a quick overview of the current status of voice. Much more could be said about the companies named here, but also others such as Samsung with their Bixby service or Tencent with … When it comes to the usefulness of voice for commerce, we need to apply a broad definition of what commerce actually means. There’s only a fraction of apps which allow for actual ordering of products, making it a seamless voice-only shopping experience. The vast majority of skills and actions is about product discovery, inspiration and getting advice.