With the Echo Show’s touch screen comes a new type of user interaction, but the new ElementSelected Intent doesn’t get much coverage in Amazon’s Display Interface Reference. Today’s post explains how to include, and handle, touch events in your Alexa skill code.
In Amazon’s Display Interface Reference there’s a topic for Handle Selection Events by Voice and Touch, but it’s not very helpful. It says:
Each item in a list can be made selectable by touch. For each selectable element on the screen, the skill developer provides an associated token that they will receive in the callback response when the element is selected. The skill can set a token field on any selectable element, and this token is returned in a Display.ElementSelected request if that element is selected.
An example of a Display.ElementSelected request is provided, but there’s no example of where and how the token is set or how to pull it from the callback response. The Reference goes on to say:
There is no built-in intent for selecting actions or list items. However, you can create an intent for this purpose and include it in the intent schema. This intent should be activated when the skill receives a Display.ElementSelected event in a response.
This statement is a little misleading. You don’t have to add this intent to your schema, it’s a ‘native’ event the Alexa service detects in the same manner as an onLaunch event. Here’s how to add touch selection to an Alexa skill for the Show.
1. Add a ListTemplate to your code with the Display.RenderTemplate directive.
The Display.RenderTemplate cases you’ll include in your code work kind of like Cascading Style Sheets: they tell the server what the screen layout should be and establish certain default settings. There are three options for touch-select list screens: ListTemplate1, ListTemplate2 and ListTemplate3. See Amazon’s Display Interface Reference for more information about them.
The Display.RenderTemplate directive lets you specify a template and populate it. In the case of list templates, you need to provide all the images, titles and text you want the user to see in your on-screen list. That information takes the form of JSON name:value pairs. One of the name:value pairs is “token”:”value”, where “value” is any string value the developer wants to assign. The value you assign for “token” is the reference point you will use to detect which item the user has selected by touch.
The sample includes all the name:value pairs for a single list item, as well as the rest of the Display.RenderTemplate case code for a sample of ListTemplate3. You can follow the same format as the list item shown to populate your own list with as many items as you want. In my example, I’ve assigned the value “One” as the token value for my sample list item.
2. Add a handler for ElementSelected, and capture the token for the selected list item in that handler.
The code for this handler can be pretty short and sweet, depending on how you’ve designed your code. In my case, I want a different function to run for each different item in my list. I assigned a token value from “One” through “Seventeen” to each of the seventeen items in my touchscreen list, and I’ve written seventeen different functions and named them all with the same base function name followed by a number from One to Seventeen, to match my list item tokens.
The main takeaway here, and the thing that’s not explained very clearly in Amazon’s documentation, is that you need to use “this.event.request.token” to capture the token that corresponds to the list item the user selected by touch.
You also have to include intent handlers that allow the user to select list items by voice. In the case of my skill, I’ve written handlers to route the request whether the user asks by list item number or name/title (the “primaryText” item in the list definition section of the Display.RenderTemplate code block, linked above), plus every variation of the names/titles I could think of to include in my Utterance file.