Gesture Recognition Technology
Gesture recognition is a topic in computer science and language
technology with the goal of interpreting human gestures via mathematical
algorithms. Gestures can originate from any bodily motion or state but
commonly originate from the face or hand. Current focuses in the field
include emotion recognition from the face and hand gesture recognition.
Many approaches have been made using cameras and computer vision
algorithms to interpret sign language. However, the identification and
recognition of posture, gait, proxemics, and human behaviors is also the
subject of gesture recognition techniques. Gesture recognition can be
seen as a way for computers to begin to understand human body language,
thus building a richer bridge between machines and humans than primitive
text user interfaces or even GUIs (graphical user interfaces), which
still limit the majority of input to keyboard and mouse.
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current focuses in the field include emotion recognition from the face and hand gesture recognition. Many approaches have been made using cameras and computer vision algorithms to interpret sign language. However, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques. Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a richer bridge between machines and humans than primitive text user interfaces or even GUIs (graphical user interfaces), which still limit the majority of input to keyboard and mouse.
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current focuses in the field include emotion recognition from the face and hand gesture recognition. Many approaches have been made using cameras and computer vision algorithms to interpret sign language. However, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques. Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a richer bridge between machines and humans than primitive text user interfaces or even GUIs (graphical user interfaces), which still limit the majority of input to keyboard and mouse.
Introduction of Gesture Recognition Technology
Interface with computers using gestures of the human body,
typically hand movements. In gesture recognition technology, a camera
reads the movements of the human body and communicates the data to a
computer that uses the gestures as input to control devices or
applications. For example, a person clapping his hands together in front
of a camera can produce the sound of cymbals being crashed together
when the gesture is fed through a computer. One way gesture recognition
is being used is to help the physically impaired to interact with
computers, such as interpreting sign language.
The technology also has the potential to change the way users interact with computers by eliminating input devices such as joysticks, mice andkeyboards and allowing the unencumbered body to give signals to the computer through gestures such as finger pointing. Unlike haptic interfaces, gesture recognition does not require the user to wear any special equipment or attach any devices to the body. The gestures of the body are read by a camera instead of sensors attached to a device such as adata glove.
In addition to hand and body movement, gesture recognition technology also can be used to read facial and speech expressions (i.e., lip reading), and eye movements. The literature includes ongoing work in the computer vision field on capturing gestures or more general human pose and movements by cameras connected to a computer.
The technology also has the potential to change the way users interact with computers by eliminating input devices such as joysticks, mice andkeyboards and allowing the unencumbered body to give signals to the computer through gestures such as finger pointing. Unlike haptic interfaces, gesture recognition does not require the user to wear any special equipment or attach any devices to the body. The gestures of the body are read by a camera instead of sensors attached to a device such as adata glove.
In addition to hand and body movement, gesture recognition technology also can be used to read facial and speech expressions (i.e., lip reading), and eye movements. The literature includes ongoing work in the computer vision field on capturing gestures or more general human pose and movements by cameras connected to a computer.
Gesture Only Interfaces:
The gestural equivalent of direct manipulation interfaces is those
which use gesture alone. These can range from interfaces that recognize a
few symbolic gestures to those that implement fully fledged sign
language interpretation. Similarly interfaces may recognize static hand
poses, or dynamic hand motion, or a combination of both. In all cases
each gesture has an unambiguous semantic meaning associated with it that
can be used in the interface. In this section we will first briefly
review the technology used to capture gesture input, then describe
examples from symbolic and sign language recognition. Finally we
summarize the lessons learned from these interfaces and provide some
recommendations for designing gesture only applications.
Tracking Technologies
Gesture-only interfaces with syntax of many gestures typically
require precise hand pose tracking. A common technique is to instrument
the hand with a glove which is equipped with a number of sensors which
provide information about hand position, orientation, and flex of the
fingers. The first commercially available hand tracker, the Data glove,
is described in Zimmerman, Lanier, Blanchard, Bryson and Harvill (1987),
and illustrated in the video by Zacharey, G. (1987). This uses thin
fiber optic cables running down the back of each hand, each with a small
crack in it. Light is shone down the cable so when the fingers are bent
light leaks out through the cracks.Measuring light loss gives an accurate reading of hand pose. The Dataglove could measure each joint bend to an accuracy of 5 to 10 degrees (Wise et. al. 1990), but not the sideways movement of the fingers (finger abduction). However, the CyberGlove developed by Kramer (Kramer 89) uses strain gauges placed between the fingers to measure abduction as well as more accurate bend sensing (Figure XX). Since the development of the Dataglove and Cyberglove many other gloves based input devices have appeared as described by Sturman and Zeltzer (1994).
Gesture Based Interaction
The CyberGlove captures the position and movement
of the fingers and wrist. It has up to 22 sensors, including three bend
sensors (including the distal joints) on each finger, four abduction
sensors, plus sensors measuring thumb crossover, palm arch, wrist
flexion and wrist abduction. Once hand pose data has been captured by
the gloves, gestures can be recognized using a number of different
techniques. Neural network approaches or statistical template matching
is commonly used to identify static hand poses, often achieving accuracy
rates of better than 95%
Time dependent neural networks may also be used for
dynamic gesture recognition, although a more common approach is to use
Hidden Markov Models. With this technique Kobayashi is able to achieve
an accuracy of XX% , similar results have been reported by XXXX and
XXXX. Hidden Markov Models may also be used to interactively segment out
glove input into individual gestures for recognition and perform online
learning of new gestures (Lee 1996). In these cases gestures are
typically recognized using pre-trained templates; however gloves can
also be used to identify natural or untrained gestures. Wexelblat uses a
top down and bottom up approach to recognize natural gestural features
such as finger curvature and hand orientation, and temporal integration
to produce frames describing complete gestures . These frames can then
be passed to higher level functions for further interpretation.
Although instrumented gloves provide very accurate
results they are expensive and encumbering. Computer vision techniques
can also be used for gesture recognition overcoming some of these
limitations. A good review of vision based gesture recognition is
provided by Palovic et. al. . In general, vision based systems are more
natural to use that glove interfaces, and are capable of excellent hand
and body tracking, but do not provide the same accuracy in pose
determination. However for many applications this may not be important.