The Deep Dive series continues with a discussion of Alexa Display Template skill test gotchas. Testing Display Template skills is a little more demanding and time-consuming than it is for voice-only skills, since you must test the visual and touch elements on top of the usual voice interaction testing. A systematic approach will make things easier, but there are definitely some pitfalls to avoid.
Note that the information provided here is accurate as of publication date, on 3/6/18, but is subject to change in the future if Amazon makes changes to its Alexa devices or the Alexa software.
Writing out actual test scripts can be very helpful when working with any complex skill. I don’t mean you should write a .bat file or anything like that, this type of test script is a spreadsheet or table that lists every possible feature or function of your skill and provides spaces for you to check off each feature/function as “passed” or “failed” as it’s tested, and to add any notes. Many devs avoid writing out test scripts because of the extra development time required, but the script can save time in the long run.
First, writing everything out in a script helps to ensure a thorough test. Each line item you add to your script can jog your memory about related features or functions that must also be tested, but aren’t obvious from a surface level. Second, when a given test line item fails, it’s much faster and easier to debug because your test script contains a record of everything that does work and can therefore be eliminated as a potential cause of the line item fail.
Finally, the process of writing a test script will often reveal functionality gaps, inconsistencies, or other problems with your skill that you didn’t notice before. For example, while writing out the test line items for Visual Tarot to verify my instruction message fires when each new screen is loaded, it occurred to me that I only wanted that message to fire the first time the user accesses each screen. This realization led me to revise my code so the messages would be suppressed on subsequent screen loads within the same session.
Be aware that a simulator can only approximate the user experience on an actual Alexa device, and there are some Display Template skill gotchas you won’t be able to catch with a simulator. If at all possible, test on an actual Alexa device.
For example, simulators that run text-based tests will not catch speech interpretation errors. When testing Visual Tarot on a Show I found Alexa often mis-heard the word “cups” as “cops”. If I’d tested only with tools that let me type in the text of the speech rather than speaking it, I wouldn’t have trapped—or even noticed—this error until after the skill was live and users started complaining about it.
Similarly, a simulator cannot duplicate or mimic Alexa touchscreen interactions (as of this writing, at least). You can write a test script (the code type of script) that sends a touch request to the Alexa service, but running that script will only tell you how Alexa responds when a valid touch event occurs—it won’t tell you what type of interaction is required to produce a valid touch event.
If your touch elements are difficult for the user to find or operate, you will only discover that by testing the skill on an actual device. For example, all my Display Template skills run on both Echo Show and Echo Spot but because users are unfamiliar with the on-screen paging controls on the Spot I’ve gotten negative reviews from users who claim the skills don’t run on that device. It’s a case of user error that doesn’t surface until you try to run the skills on a Spot device.
By the way, I ultimately decided to revise most of my Display Template skill descriptions to say they only run on Show, even though it’s not true. I know users who don’t understand the Spot interface will leave more negative reviews that mislead other users into thinking there’s something wrong with the skills.
Test Category Checklist
Obviously, I can’t cover every possible custom skill feature or function here, but the list below hits all the major categories of testing required for a Display Template skill.
Does the skill launch properly for every invocation phrase you’ve mapped in the interaction model?
2. Screen Navigation
Are you able to navigate to every screen in your skill, both by voice and by touch?
This one is a pain, but it really is necessary. Verify that every image loads correctly on every screen where the image could possibly appear. For example, in a Flags of the World quiz skill that includes both thumbnails and full-sized images of flags you will need to verify that both versions of every image load properly. If your skill chooses/displays images at random (e.g., a flag quiz skill that randomizes the questions), you may have to write a special function just for purposes of loading every image on the device screen.
In Visual Tarot I had to check over 160 images, but I’m glad I did because sure enough: I discovered three that weren’t loading properly due to typographical errors in filename references or due to a missing file on the server side.
Test your interaction model timeouts on every screen: does the skill reprompt the user who fails to respond to Alexa within 8 seconds (or whatever custom timeout you’ve set)? Alternatively, if your skill design prohibits reprompts or timeouts for some reason, does it fire a user-friendly ‘goodbye’ message and shut down gracefully?
5. Help File
Can users access your skill’s help file from every screen without negatively impacting your skill’s interaction flow? For example, in a quiz skill if the user asks Alexa for help after the quiz has begun, is the user forced to start the quiz over or can they pick up right where they left off? Help request behavior can reveal some unexpected and unpleasant surprises.
6. Custom Skill Features
Do all of your mapped intents work as they should? Does your skill’s custom functionality fire properly for every feature it includes?
7. AMAZON. Intents
Does your skill include and properly handle all required AMAZON. native intents?
8. Graceful Shutdown and Exit
Verify that the user can quit the skill from any screen, and any code that must run to persist data across sessions fires properly. Also verify that the exit is graceful, with a user-friendly ‘goodbye’ message and dump of any session attributes.
Don’t give in to the temptation to skimp on testing. It’s a safe bet that if there’s any testing base you’ve failed to cover, the users will find it and complain about it in negative reviews.
Other Posts In This Series
Things That Kill Alexa Skills
A checklist of reasons why you might want to rethink your skill, or its design, before you get too deep into the build.