The Television's Natural User Interface
January 11, 2012
Below is a post I had started writing about three years ago, but never got around to finishing and publishing. I updated this post periodically, so there is some more recent news sprinkled throughout. Given the incredible buzz around Smart TV at CES 2012, I thought this post could be enlightening.
TVs are unnatural by design. Hundreds of channels, half a dozen remotes and boxes and we still can't find anything worth watching. Menus sit within menus which are buried within menus. Content comes in from everywhere - our gaming consoles, media extenders, streaming boxes, DVRs, Cable boxes, over the air - and there is no way to sort through it all. The TV is screaming for the simplicity of great design, but there isn't a simple answer. TVs aren't within reach, so direct touch interfaces are out of the question. TVs create lots of background noise, so voice communications can be tricky, and voice processing times can create obstructive lag times that sully a smooth user experience.
Microsoft has taken another step forward by releasing Xbox 360 / Kinect integration with many video providers. However, talking or gesturing to or at our TVs is not yet a strong natural user interface for a number of reasons:
- Most people watching large TVs are not close enough to the TVs for natural vocal conversations. Users will likely feel the need to speak loudly to their TVs, which will introduce vocal strain. One may raise their voice to speak to a friend across a large dinner table, but in this case, the friend would also provide real-time visual and audio feedback, creating a sense of connective proximity. This brings us to our next point.
- At present, our TVs do not yet talk back or otherwise show meaninful visual cues of engagment (see above), creating yet another layer of tension that depletes the natural elements of the exchange. Apple's Siri speaks back, creating a level of intimacy through a feedback loop.
- Coverflow as an interface isn't the answer to everything. Coverflow in Apple products made a lot of sense beasue we were familiar with the visual analogy of flipping through CDs, audio tapes, pictures, video cassets and records. However, we have never flipped through pages and pages of channels on a TV. The physical full-arm flipping motion, particularly when performed with one's entire arm waving through the air at a distant screen, is not the "natural" way that one would surf through programming. The interface metaphor doesn't translate especially well (as of now), and for the couch potatoes who loathed getting up to change the channel, waving an arm requires far more movement than pushing a button on a remote.
- TVs are dumb. We have to adapt our behavior to meet their dynamic. For example, most TVs don't have a robust set of metadata about programming that will allow me to say, discover great new programming based on my previous behavior, favorite actors or directors, or those of my social circle.
- Voice navigation isn't smart. Saying "Xbox, Bing, Blazing Saddles" is memorized voice navigation scheme. Xbox initiates the Xbox program, Bing initiated the search... this is like navigating through a traditional interface with one's voice. This isn't natural.
The Optimal TV Navigation Solution
The TV user experience faces three core tension points: (a) confusing navigation schemes (b) data and (c) reliance on analog navigation/difficult discovery.
Fixing Navigation
The natural television interface is likely going to have a remote control, but one that is flexible and intimate. This will be a slim and simple touch screen with voice input (a mic) and a speaker - such as a phone or a tablet. As a matter of fact, it often will be a phone or tablet. A fantastically designed program guide will serve as a base layer that will also deliver cross-device content syncing and enable streaming (to the TV and to a remote device). This platform will not only display information in a beautiful and simple manner, but it will provide on device and on-tv visual and auditory feedback for user inputs. This in-hand, intimate interplay will solve the proximity challenge typically challenging set top boxes. This solution should use a unique active noise cancellation solution - one that removes the noise generated by the TV from the noise picked up from the in-remote microphone.
There will also likely be a new crop of more capable dedicated remotes. For example, a clamshell design could put a simple, traditional numerical remote on the top, and a full keyboard for thumb-typing on the inside or bottom.
Possibly most importantly, this navigation scheme must be universal. I believe that we are very close to an era of navigation APIs where the central box interfaces with the other entertainment devices, portals and options to display all media in a singular interface. This is about more than the PS3 and Xbox 360 dashboards, it's full integration with the cable box, DVR, Sonos system etc. Think iTunes for TV, if iTunes didn't suck. This interface would not only pull in all media sources, but would provide multiple tablet or phone operating systems with an app for controlling the TV.
But that isn't all.
Data
This platform will have to rule search. As television content becomes increasing tagged with additional metadata, the navigation solution will find the right content for the viewer from a variety of sources. This will be Netflix meets Amazon meets Pandora meets Tivo meets the social graph meets ad targeting. Getting all this video data into a single integrated search experience isn't going to be easy. But that overload of available content data, coupled with individual user data, overlayed with social data, could create an incredible personalized suggestion and discovery engine.
Data will not be limited to the television, but will extend onto the remote devices through rich third party app integrations, such as GetGlue for social television viewing. These third party integrations (apps plus devices) will offer enhanced functionality on top of the existing broadcast layer, offering a robust layer of monetization opportunities. The TV and "remote" become the platform and hub.
What Powers This Interplay?
Most early Smart TVs will feature proprietary operating systems with limited apps. Over time, this space will settle into one, two or at most three operating systems. While some early TV manufacturers tried to sell companion tablets and TVs, we will being to see bundled packages. The Samsung TV and Samsung Tablet bundle is coming to a Best Buy near you. However, with OS and API standardization most manufacturers will develop a suite of remote-control like apps apps for popular mobile operating systems.
TV is the next frontier in the mobile OS war. Microsoft, Google and Apple are going to go head to head. And believe it or not, I think the TV is going to be a far more lucrative channel than the mobile. Here come the TV wars.