Google Actions VS Alexa Skill - A Comparative Study
In the last few years AI became more accessible because of the Arms Race among top tech companies.
This turned AI from a research tool only used by a few to easy-to-use platforms used by millions of users. Customer facing AI applications like Siri, Alexa and Google Assistant ensured everyone with a smart device has access to this technology.
The interface of AI is crucial. By definition, AI should mimic human interaction model.
This explains the proliferation in bots with Natural language abilities.
Bots interact like humans do. So, there is no learning curve to use them. However, most bots fail to provide better performance and compete with websites and apps that reside in our screens. But voice enabled bots reside in a medium with no competition from traditional websites or apps.
One of the most efficient ways humans interact is through speech. However, bots were never good enough to understand the nuances of human speech until Deep Learning platforms came into play. This sets up voice as the perfect medium for AI.
Building Voice capability for Archie.AI :
Archie is a bot that is able to understand and respond to your data related questions in plain English. It connects to already existing data source like Google Analytics and makes data conversational.
We used Alexa and Google Assistant to further enhance Archie’s capability to speak. They are similar both in their functionality and development framework. In this article, I summarize my experience while building voice bots on these two platforms.
Lets us start with the way a user discovers voice bots.
Photo: Alexa Skill Discovery on Mobile Source: http://www.allthingsdistributed.com
Alexa users need to find and activate your bot using the Alexa App. Without completing this step, Alexa won’t recognize your bot even if you’re published in the Alexa Skills Store. Google assistant on the other hand does not require any activation. Once published, any Google Assistant user can right away start using your bot if they know the “invocation” to activate the bot.
Invocation is a set of particular phrases which activates your bot.
Alexa uses predefined invocations for its bots. For example “Start Coffee Express” or “Run Coffee Express”. Here, “Coffee Express” is the name of your bot and Start/Run is the predefined phrase that you have to use.
Google Assistant on the other hand gives the developers an option to customize the invocations.
You must provide sample invocations and the Google Assistant trains itself to cater to different ways a user might invoke your bot.
The only requirement is that the invocation must contain the name of your bot. Example: “Could you get me Coffee Express” or “May I talk to Coffee Express”
Intents are used to classify the various functions of your bot requested by the user. For example: “Get me an Espresso” vs “Get me a Chai tea”.
Intent captures the type of functionality requested by the user and any additional data required to fulfill the functionality. For example, the statement “Get me an Espresso” may not be enough for your bot to complete a functionality as it may require further information on the size, sugar level etc.
The idea is that once an intent is identified, the bot can ask for missing variables from the user to complete the functionality. This is, in essence, a conversation funnel.
Photo: JSON format for Google Actions.
The conversation frameworks for both platforms are given in the form of JSON.
Photo: Alexa Skill JSON format
Alexa only has the option of typing the JSON into a form on the Alexa Skill’s developer page. Google lets the developer upload the JSON using “gactions” tool.
In order to drive the conversation in a funnel format we require data from the user. Both Google and Alexa use similar objects to capture this information. Google uses “Entities” while Alexa uses “Slots”. Both are raw objects which can be customized by the developer by providing examples for each.
Alexa provides a large variety of built-in Slots. For example movies, tv shows, names of people etc. This makes it more reliable than Custom Slots defined by developers since the built-in Slots use Alexa’s already existing knowledge-base without depending only on examples you provide. Switching between Intents:
Both the platform allow switching between the intents. Google uses “Contexts” to enable this while Alexa uses “Sessions”.
Context is defined as a part of the conversation framework whereas Session is maintained on the server and you can modify it as necessary.
This gives Alexa a slight edge over Google Assistant as the developer has more control. With Alexa, your bot does not need to be modified if you need to add more context or session data.
Webhook is the end point on your server which responds back with the functionality requested.
Google Assistant lets the developers define multiple webhooks while Alexa sends all the requests to a single webhook.
Therefore Google Assistant is much simpler to integrate with existing applications as the endpoints are already present. While Alexa requires development of additional routing or functionality at the webhook.
Both Amazon and Google are promoting their own cloud services for hosting the webhooks. Google provides detailed documentation for setting it up on Google Cloud and Alexa does it for AWS Lamba.
However, the Google SDK works on external web servers but Alexa’s SDK only works with AWS Lamba. Therefore, for Alexa, if you want to use your own web server you will have to create response objects instead of using the SDK.
Interactive Models for Conversations:
Both the platforms provide an interactive model for creating conversations. Google’s service is called API.AI while Alexa is “Skill builder”. Both these models are in beta phase. I found API.AI simpler while designing conversations. However Skill builder provides almost the same functionality albeit is slightly harder to work with.
Photo: API.AI Dashboard
Photo: Alexa Skill builder Dashboard
Account linking lets the user log into your application before performing any function. Both these platforms provide Oauth2 and implicit code flows for authorization. The difference is when the user is prompted to log in.
Alexa users must log in when they activate your bot or when the server responds back saying it needs the user to login. Since there is no activation on Google Assistant, the user login must be sent from the server.
Reviewing and Publishing:
Both the platforms use a somewhat manual process where reviewers actually use your bot and send feedback. There are strict guidelines regarding security, branding, privacy and terms of service which you have to adhere to in order to publish your bot. You also need to provide the complete instructions and sample user accounts so that the reviewers can easily verify the bot, contents and functionalities. In both cases, once submitted, you will probably hear back within two business days with the feedback. I did not notice any versioning on Alexa. Google Assistant provides versioning where you can have multiple versions of the bot active simultaneously. Depending on the user’s version we can cater the appropriate fulfillments.
Both platforms are very comparable. Your “Corpus” — The cognitive part of your bot — should not be dependent on the platform. It should be centralized so you don’t have to build for every platform. During the development process of your bot, you must keep in mind the scaling and centralization of webhooks. Therefore, just by creating the conversation JSON in the required format you can generate the voice bots for either of the platforms.
Alexa has been in the market for quite a while now. Google Assistant is new in comparison. But Since Google I/O in May 2017, Google appears to have prioritize Google Assistant and catching up fast. One major advantage Google gets over Alexa is that Google Assistant is available to almost all Android devices.
I strongly recommend fellow developers to start playing around with both platforms as these Voice bots have the potential to outgrow native apps ecosystem.
Note:If you would like to know more about my work with Assistant and voice bots, check out Archie.AI .