Computational Visual Attention Systems
Abstract of
Computational Visual Attention Systems
Computational Visual attention systems (CVAS ) have gained a lot of interest during the
last years. Similar to the human visual system, VSAS detect regions of interest in images:
by “directing attention” to these regions, they restrict further processing to sub-regions of
the image. Such guiding mechanisms are urgently needed, since the amount of
information available in an image is so large that even the most performant computer
cannot carry out exhaustive search on the data. Psychologists, neurobiologists, and
computer scientists have investigated visual attention thoroughly during the last decades
and profited considerably from each other. However, the interdisciplinary of the topic
holds not only benefits but also difficulties.
This seminar provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems. It includes basic theories and models like Feature Integration Theory(FIT model) and Guided Search Model(GSM).A Real time Computational Visual Attention System VOCUS (Visual Object detection with a CompUtational attention System) is also included. Furthermore, presents a broad range of applications of computational attention systems in fields like computer vision, cognitive systems, and mobile robotics
This seminar provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems. It includes basic theories and models like Feature Integration Theory(FIT model) and Guided Search Model(GSM).A Real time Computational Visual Attention System VOCUS (Visual Object detection with a CompUtational attention System) is also included. Furthermore, presents a broad range of applications of computational attention systems in fields like computer vision, cognitive systems, and mobile robotics
Introduction of
Computational Visual Attention Systems
Perhaps the most prominent outcome of neurophysiological findings on visual
attention is that there is no single brain area guiding the attention, but neural correlates of
visual selection appear to be reflected in nearly all brain areas associated with visual
processing. Additionally, new findings indicate that many brain areas share the processing
of information from different senses and there is growing evidence that large parts of the
cortex are multi sensory. Attentional mechanisms are carried out by a network of
anatomical areas. Important areas of this network are the Posterior Parietal cortex (PP), the
Superior Colliculus (SC), the lateral intraparietal area (LIP), the frontal eye field (FEF), and
the pulvinar. There are three major functions concerning attention: orienting of attention,
target detection, and alertness.
The first function, the orienting of attention to a salient stimulus, is carried out by the interaction of three areas: the PP, the SC, and the pulvinar. The PP is responsible for disengaging the focus of attention from its present location (inhibition of return), the SC shifts the attention to a new location, and the pulvinar is specialized in reading out the data from the indexed location. This combination of systems is called as the posterior attention system. The second attentional function, the detection of a target, is carried out by the anterior attention system. They claim that the anterior cingulate gyrus in the frontal part of the brain is involved in this task. Finally, the alertness to high-priority signals is dependent on activity in the norepinephrine system (NE) arising in the locus coeruleus. Brain areas involved in guiding eye movements are the FEF and the SC. There has been evidence that the source of top-down biasing signals may derive from a network of areas in parietal and frontal cortex.
At present, it is known that there is not a single brain area that controls attention but a network of areas. Several areas have been verified to be involved in attentional processes, but the accurate task and behavior of each area as well as the interplay among them still remain open questions
The first function, the orienting of attention to a salient stimulus, is carried out by the interaction of three areas: the PP, the SC, and the pulvinar. The PP is responsible for disengaging the focus of attention from its present location (inhibition of return), the SC shifts the attention to a new location, and the pulvinar is specialized in reading out the data from the indexed location. This combination of systems is called as the posterior attention system. The second attentional function, the detection of a target, is carried out by the anterior attention system. They claim that the anterior cingulate gyrus in the frontal part of the brain is involved in this task. Finally, the alertness to high-priority signals is dependent on activity in the norepinephrine system (NE) arising in the locus coeruleus. Brain areas involved in guiding eye movements are the FEF and the SC. There has been evidence that the source of top-down biasing signals may derive from a network of areas in parietal and frontal cortex.
At present, it is known that there is not a single brain area that controls attention but a network of areas. Several areas have been verified to be involved in attentional processes, but the accurate task and behavior of each area as well as the interplay among them still remain open questions
Feature Integration Theory:
The Feature Integration Theory (FIT) of Treisman has been one of the most
influential theories in the field of visual attention. The theory was first introduced in
1980, but it was steadily modified and adapted to current research findings.
The theory claims that “different features are registered early, automatically and in
parallel across the visual field, while objects are identified separately and only at a later
stage, which requires focused attention” .Information from the resulting feature maps—
topographical maps that highlight conspicuities according to the respective feature—is
collected in a master map of location. This map specifies where in the display things are,
but not what they are. Scanning serially through this map focuses the attention on the
selected scene regions and provides this data for higher perception tasks.
Treisman mentioned that the search for a target is easier the more features
differentiate the target from the distracters.
If the target has no unique features but differs
from the distracters only in how its features are combined, the search
is more difficult and
often requires focused attention (conjunctive search). This
usually results in longer search
times. However, if the features of the target are known in
advance, conjunction search can
sometimes be accomplished rapidly. She proposed that this is done
by inhibiting the feature
maps, which code nontarget features. Additionally, Treisman
introduced so called
object files as “temporary episodic representations of objects.”
An object file “collects the
sensory information that has so far been received about the
object. This information can be
matched to stored descriptions to identify or classify the object”
GSM was developed by Jeremy M Wolfe. The basic goal of the model is to explain and
predict the results of visual search experiments. There has also been a computer simulation
of the model. Wolfe has denoted successive versions of his model as Guided Search 1.0,
Guided Search 2.0, Guided Search 3.0 and Guided Search 4.0. It shares many concepts
with the FIT but is more detailed in several aspects that are necessary for computer
implementations. An interesting point is that in addition to bottom-up saliency, the model
also considers the influence of top-down information by selecting the feature type, which
distinguishes the target best from its distracters.