What is this?

This is an effort to explore using sound and audio as primary means of human computer interaction without relying on visual display. This is not voice interface.

The Shortfall of GUI

Graphic user interface (GUI), which dominates almost all User Interface (UI) design, is cumbersome on small display device or useless if device has no display. We need to think of appropriate input and output (I/O) modalities and interaction techniques tailored to mobile, wearable and embedded device, which may have very limited space for display. Using these devices nowadays is part of everybody’s daily life whether you are aware of it or not. We see people look at mobile phone while walking and even driving. This already tells something is wrong in our UI design. UI design should fit to environment and user preference appropriately. Considering a user engages in a task, user can’t get distracted in many cases such as driving, running, walking, meeting or maybe soldier in battleground. It’s often live or death issue if visual get distracted like driving. This suggests that UI should move to background as sideline. User interaction should be in assisting mode rather than dominant mode.

Towards Sound User Interface (SUI) or Audio User Interface (AUI)

Since GUI is not the best fit in many situations, do we have alternative? Ever think about sound and audio? The goal is to provide information while shifting additional cognitive load to a different modality. Sound perception is not generally less important. It just has a different function than visual perception. Humans embody the physiology needed to absorb information in the form of sound. Just as the eye can perceive many different variations of light: hue, brightness and contrast, the human ear is capable of sensing a vast array of sounds through the alteration of timbre, loudness, pitch, melody and tempo. Human brain is able to associate these sounds with events, objects, actions and environment. It is also well-known that hearing is relative passive process. Human can easily perceive sound while being occupied by visual task. Human computer interaction has not taken advantages of these abilities. Trying to introduce sound into computer interaction as primary, not secondary means, we may find it is UI design issue rather than human capability issue. However, the difficulties of SUI/AUI design have several fronts: lack of commonly recognized UI mode, more abstract compare to visual, sound generation and design. SUI model can borrow GUI concept, which maps real world objects to virtual desktop. With certain tangible devices as input control and rich sound feedback as output, it can be designed via well thought modeling. Sound is generally more abstract than visual because user does not see it. User may need to have different mindset to get used to it. Sound generation and design are not so easy. First, we need to have interactive sound synthesis engine, which is able to produce sound interactively driven by user action and underlining UI model. Secondly, mapping sound design. There is no consensus at moment even though there are lots of research topics involving psychology, data signification, auditory perception in academic world. Despite such difficulties, the effort here is to bring SUI/AUI into practice first. Let’s design a model and test it.

Not Voice Recognition

People would always think of voice recognition as the solution of above issues. It is true to some extent. But in many cases they are not enough and in some cases they are wholly inappropriate. Such as: 1) noisy environments, especially in crowds, where even the best voice recognition may not work. 2) No security, cases where a topic under discussion is confidential or private. 3) Irritating, cases where in public area and social environment. 4) Not error-free and unreliable. Even the best voice recognition wouldn’t claim error-free. 5) Language dependency, obviously. 6) Not modeled interface. The outcome is not predictable, if it’s not constraint command type. The last one has both pros and cons. Voice recognition simulates human interaction which is nonlinear, not modeled and non-structured way. It suits query on vast knowledge database. In the other way, user just wants to do very simple job with predictable fashion using modeled and structured UI design. That is where SUI resides. It is especially true when user’s main task is not on the device that user is interacting with.

Anyways, voice recognition is still very useful under certain constrained situations. It’s just not focus here.

SUI App – MinimalSee

To validate SUI concept and design, an iOS App MinimalSee is developed. Get free version from Apple App Store.