UX Engineer II, Microsoft, 2010
Kinectimals was one of four Day 1 games for Kinect that included speech. The game allowed kids to interact with really cute virtual cats using gesture and their voice. They could pick their cat, name their cat, teach their cat tricks, and compete with their cat.
The Problem
For Kinectimals, the production team needed to understand how to think about speech, which commands to use, and how to teach kids the commands.
What I Did
Kinectimals Tuning
One critical aspect of building a speech recognition system is tuning the system to optimize for the technology. Speech is inherently non-deterministic, meaning that every command a user gives isn’t clear like pressing a physical button but rather is compared against multiple different patterns (possible commands) and the one which best matches is considered the “right” one. With speech, it’s also possible that the “right” one isn’t good enough to actually be a true match. It’s like matching fingerprints: sometimes it’s a complete match, sometimes a partial is good enough to use in court, and sometimes the partial match is inconclusive.
Usually, a system is tuned to minimize false recognitions. When someone calls a bank, it’s very frustrating if the system misrecognizes the person and sends them down the wrong path. Everyone on the team assumed that this type of tuning would be appropriate for Kinectimals as well.
I played the game a lot and found myself frustrated when I asked the cat to do something and it just stared at me blankly or, worse, walked away.
It turned out kids were just as frustrated. We learned that, when playing a game, misrecognitions were actually fine. It was the rejection that stung. If you asked the cat to sit, rather than ignore you, everyone would much prefer to the cat to play dead and still look cute.
What Shipped
Kinectimals mostly focused on the gesture interaction, which was easiest for kids because it was based on mimicking. Kids moved and the cat moved too. This was instantly engaging and had less of a learning curve than the speech commands, which like the Dashboard were limited to a special mode. However, despite the limited voice functionality at least when you spoke, the cat did something cute.