2009 | 2008 | 2007 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000
Nemeth D., Janacsek K. & Fiser J. (2013) Age-dependent and coordinated shift in performance between implicit and explicit skill learning. Frontiers in computational neuroscience 7, 147
It has been reported recently that while general sequence learning across ages conforms to the typical inverted-U shape pattern, with best performance in early adulthood, surprisingly, the basic ability of picking up in an implicit manner triplets that occur with high vs. low probability in the sequence is best before 12 years of age and it significantly weakens afterwards. Based on these findings, it has been hypothesized that the cognitively controlled processes coming online at around 12 are useful for more targeted explicit learning at the cost of becoming relatively less sensitive to raw probabilities of events. To test this hypothesis, we collected data in a sequence learning task using probabilistic sequences in five age groups from 11 to 39 years of age (N = 288), replicating the original implicit learning paradigm in an explicit task setting where subjects were guided to find repeating sequences. We found that in contrast to the implicit results, performance with the high- vs. low-probability triplets was at the same level in all age groups when subjects sought patterns in the sequence explicitly. Importantly, measurements of explicit knowledge about the identity of the sequences revealed a significant increase in ability to explicitly access the true sequences exactly around the age where the earlier study found the significant drop in ability to learn implicitly raw probabilities. These findings support the conjecture that the gradually increasing involvement of more complex internal models optimizes our skill learning abilities by compensating for the performance loss due to down-weighting the raw probabilities of the sensory input, while expanding our ability to acquire more sophisticated skills.
Fiser J., Lengyel M., Savin C., Orbán G., & Berkes P. (2013): How (not) to assess the importance of correlations for the matching of spontaneous and evoked activity arXiv preprint arXiv:1301.6554
A comment on `Population rate dynamics and multineuron firing patterns in sensory cortex' by Okun et al. Journal of Neuroscience 32(48):17108-17119, 2012 and our response to the corresponding reply by Okun et al's (arXiv, 2013).
Janacsek K., Fiser J. & Nemeth D. (2012) The best time to acquire new skills: age?related differences in implicit sequence learning across the human lifespan. Developmental science 15 (4), 496-505
Implicit skill learning underlies obtaining not only motor, but also cognitive and social skills through the life of an individual. Yet, the ontogenetic changes in humans’ implicit learning abilities have not yet been characterized, and, thus, their role in acquiring new knowledge efficiently during development is unknown. We investigated such learning across the lifespan, between 4 and 85 years of age with an implicit probabilistic sequence learning task, and we found that the difference in implicitly learning high- vs. low-probability events – measured by raw reaction time (RT) – exhibited a rapid decrement around age of 12. Accuracy and z-transformed data showed partially different developmental curves, suggesting a re-evaluation of analysis methods in developmental research. The decrement in raw RT differences supports an extension of the traditional two-stage lifespan skill acquisition model: in addition to a decline above the age 60 reported in earlier studies, sensitivity to raw probabilities and, therefore, acquiring new skills is significantly more effective until early adolescence than later in life. These results suggest that due to developmental changes in early adolescence, implicit skill learning processes undergo a marked shift in weighting raw probabilities vs. more complex interpretations of events, which, with appropriate timing, prove to be an optimal strategy for human skill learning.
McIlreavy L., Fiser J. & Bex PJ. (2012) Impact of simulated central scotomas on visual search in natural scenes. Optometry and vision science: official publication of the American Academy of Optometry
In performing search tasks, the visual system encodes information across the visual field at a resolution inversely related to eccentricity and deploys saccades to place visually interesting targets upon the fovea, where resolution is highest. The serial process of fixation, punctuated by saccadic eye movements, continues until the desired target has been located. Loss of central vision restricts the ability to resolve the high spatial information of a target, interfering with this visual search process. We investigate oculomotor adaptations to central visual field loss with gaze-contingent artificial scotomas. Methods. Spatial distortions were placed at random locations in 25° square natural scenes. Gaze-contingent artificial central scotomas were updated at the screen rate (75 Hz) based on a 250 Hz eye tracker. Eight subjects searched the natural scene for the spatial distortion and indicated its location using a mouse-controlled cursor. Results. As the central scotoma size increased, the mean search time increased [F(3,28) = 5.27, p = 0.05], and the spatial distribution of gaze points during fixation increased significantly along the x [F(3,28) = 6.33, p = 0.002] and y [F(3,28) = 3.32, p = 0.034] axes. Oculomotor patterns of fixation duration, saccade size, and saccade duration did not change significantly, regardless of scotoma size. In conclusion, there is limited automatic adaptation of the oculomotor system after simulated central vision loss.
White B., Abbott LF. & Fiser J. (2012) Suppression of cortical neural variability is stimulus-and state-dependent. Journal of neurophysiology 108 (9), 2383-2392
Internally generated, spontaneous activity is ubiquitous in the cortex, yet it does not appear to have a significant negative impact on sensory processing. Various studies have found that stimulus onset reduces the variability of cortical responses, but the characteristics of this sup- pression remained unexplored. By recording multiunit activity from awake and anesthetized rats, we investigated whether and how this noise suppression depends on properties of the stimulus and on the state of the cortex. In agreement with theoretical predictions, we found that the degree of noise suppression in awake rats has a nonmonotonic dependence on the temporal frequency of a flickering visual stimulus with an optimal frequency for noise suppression ~2 Hz. This effect cannot be explained by features of the power spectrum of the spontaneous neural activity. The nonmonotonic frequency dependence of the suppression of variability gradually disappears under increasing levels of anesthesia and shifts to a monotonic pattern of increasing suppression with decreasing frequency. Signal-to-noise ratios show a similar, although inverted, dependence on cortical state and frequency. These results suggest the existence of an active noise suppression mechanism in the awake cortical system that is tuned to support signal propagation and coding.
Berkes P., Orbán G., Lengyel M. & Fiser J. (2011) Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science 331 (6013), 83-87 [Higly Cited Paper]
The brain maintains internal models of its environment to interpret sensory inputs and to prepare actions. Although behavioral studies have demonstrated that these internal models are optimally adapted to the statistics of the environment, the neural underpinning of this adaptation is unknown. Using a Bayesian model of sensory cortical processing, we related stimulus-evoked and spontaneous neural activities to inferences and prior expectations in an internal model and predicted that they should match if the model is statistically optimal. To test this prediction, we analyzed visual cortical activity of awake ferrets during development. Similarity between spontaneous and evoked activities increased with age and was specific to responses evoked by natural scenes. This demonstrates the progressive adaptation of internal models to the statistics of natural stimuli at the neural level.
Roser ME., Fiser J., Aslin RN. & Gazzaniga MS. (2011) Right hemisphere dominance in visual statistical learning. Journal of cognitive neuroscience 23 (5), 1088-1099
Several studies report a right hemisphere (RH) advantage for visuo-spatial integration and a left hemisphere (LH) advantage for inferring conceptual knowledge from patterns of covariation. The present study examined hemispheric asymmetry in the implicit learning of new visual-feature combinations. A split-brain patient and normal control participants viewed multi-shape scenes presented in either the right or left visual fields. Unbeknownst to the participants the scenes were composed from a random combination of fixed pairs of shapes. Subsequent testing found that control participants could discriminate fixed-pair shapes from randomly combined shapes when presented in either visual field. The split-brain patient performed at chance except when both the practice and test displays were presented in the left visual field (RH). These results suggest that the statistical learning of new visual features is dominated by visuospatial processing in the right hemisphere and provide a prediction about how fMRI activation patterns might change during unsupervised statistical learning.
Fiser J., Berkes P., Orbán G. & Lengyel M. (2010) Statistically optimal perception and learning: from behavior to neural representations. Trends in cognitive sciences 14 (3), 119-130 [Highly Cited Paper]
Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and re-evaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty.
Fiser J. (2009) Perceptual learning and representational learning in humans and animals. Learning & behavior 37 (2), 141-153
Traditionally, perceptual learning in humans and classical conditioning in animals have been considered as two very different research areas, with separate problems, paradigms, and explanations. However, a number of themes common to these fields of research emerge when they are approached from the more general concept of representational learning. To demonstrate this, I present results of several learning experiments with human adults and infants, exploring how internal representations of complex unknown visual patterns might emerge in the brain. I provide evidence that this learning cannot be captured fully by any simple pairwise associative learning scheme, but rather by a probabilistic inference process called Bayesian model averaging, in which the brain is assumed to formulate the most likely chunking/grouping of its previous experience into independent representational units. Such a generative model attempts to represent the entire world of stimuli with optimal ability to generalize to likely scenes in the future. I review the evidence showing that a similar philosophy and generative scheme of representation has successfully described a wide range of experimental data in the domain of classical conditioning in animals. These convergent findings suggest that statistical theories of representational learning might help to link human perceptual learning and animal classical conditioning results into a coherent framework.
Fiser J. (2009) The other kind of perceptual learning. Learning & Perception 1 (1), 69-87
In the present review we discuss an extension of classical perceptual learning called the observational learning paradigm. We propose that studying the process how humans develop internal representation of their environment requires modifications of the original perceptual learning paradigm which lead to observational learning. We relate observational learning to other types of learning, mention some recent developments that enabled its emergence, and summarize the main empirical and modeling findings that observational learning studies obtained. We conclude by suggesting that observational learning studies have the potential of providing a unified framework to merge human statistical learning, chunk learning and rule learning.
Berkes P., Turner RE. & Sahani M. (2009) A structured model of video reproduces primary visual cortical organisation, PLoS Computational Biology, 2009. 5(9): e1000495
The visual system must learn to infer the presence of objects and features in the world from the images it encounters, and as such it must, either implicitly or explicitly, model the way these elements interact to create the image. Do the response properties of cells in the mammalian visual system reflect this constraint? To address this question, we constructed a probabilistic model in which the identity and attributes of simple visual elements were represented explicitly and learnt the parameters of this model from unparsed, natural video sequences. After learning, the behaviour and grouping of variables in the probabilistic model corresponded closely to functional and anatomical properties of simple and complex cells in the primary visual cortex (V1). In particular, feature identity variables were activated in a way that resembled the activity of complex cells, while feature attribute variables responded much like simple cells. Furthermore, the grouping of the attributes within the model closely parallelled the reported anatomical grouping of simple cells in cat V1. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into basic independent features, along with a parametrisation of their moment-by-moment appearances. We speculate that such a segmentation may form the initial stage of a hierarchical system that progressively separates the identity and appearance of more articulated visual elements, culminating in view-invariant object recognition.
Zito T., Wilbert N., Wiskott L. & Berkes P. (2009) Modular toolkit for data processing (MDP): a Python data processing framework. Frontiers in Neuroinformatics 2:8
Modular toolkit for Data Processing (MDP) is a data processing framework written in Python. From the user's perspective, MDP is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures. Computations are performed efficiently in terms of speed and memory requirements. From the scientific developer's perspective, MDP is a modular framework, which can easily be expanded. The implementation of new algorithms is easy and intuitive. The new implemented units are then automatically integrated with the rest of the library. MDP has been written in the context of theoretical research in neuroscience, but it has been designed to be helpful in any context where trainable data processing algorithms are used. Its simplicity on the user's side, the variety of readily available algorithms, and the reusability of the implemented units make it also a useful educational tool.
Orbán G., Fiser J., Aslin RN. & Lengyel M. (2008) Bayesian learning of visual chunks by human observers. Proceedings of the National Academy of Sciences 105 (7), 2745-2750
Efficient and versatile processing of any hierarchically structured information requires a learning mechanism that combines lower-level features into higher-level chunks. We investigated this chunking mechanism in humans with a visual pattern-learning paradigm. We developed an ideal learner based on Bayesian model comparison that extracts and stores only those chunks of information that are minimally sufficient to encode a set of visual scenes. Our ideal Bayesian chunk learner not only reproduced the results of a large set of previous empirical findings in the domain of human pattern learning but also made a key prediction that we confirmed experimentally. In accordance with Bayesian learning but contrary to associative learning, human performance was well above chance when pair-wise statistics in the exemplars contained no relevant information. Thus, humans extract chunks from complex visual patterns by generating accurate yet economical representations and not by encoding the full correlational structure of the input.
Fiser J., Scholl BJ. & Aslin RN. (2007) Perceived object trajectories during occlusion constrain visual statistical learning. Psychonomic bulletin & review 14 (1), 173-178
Visual statistical learning of shape sequences was examined in the context of occluded object trajectories. In a learning phase, participants viewed a sequence of moving shapes whose trajectories and speed profiles elicited either a bouncing or a streaming percept: The sequences consisted of a shape moving toward and then passing behind an occluder, after which two different shapes emerged from behind the occluder. At issue was whether statistical learning linked both object transitions equally, or whether the percept of either bouncing or streaming constrained the association between pre- and postocclusion objects. In familiarity judgments following the learning, participants reliably selected the shape pair that conformed to the bouncing or streaming bias that was present during the learning phase. A follow-up experiment demonstrated that differential eye movements could not account for this finding. These results suggest that sequential statistical learning is constrained by the spatiotemporal perceptual biases that bind two shapes moving through occlusion, and that this constraint thus reduces the computational complexity of visual statistical learning.
Aslin RN. & Fiser J. (2005) Methodological challenges for understanding cognitive development in infants. Trends in cognitive sciences 9 (3), 92-98
Studies of cognitive development in human infants have relied almost entirely on descriptive data at the behavioral level - the age at which a particular ability emerges. The underlying mechanisms of cognitive development remain largely unknown, despite attempts to correlate behavioral states with brain states. We argue that research on cognitive development must focus on theories of learning, and that these theories must reveal both the computational principles and the set of constraints that underlie developmental change. We discuss four specific issues in infant learning that gain renewed importance in light of this opinion.
Fiser J. & Aslin RN. (2005) Encoding multielement scenes: statistical learning of visual feature hierarchies. Journal of Experimental Psychology: General 134 (4), 521
The authors investigated how human adults encode and remember parts of multielement scenes composed of recursively embedded visual shape combinations. The authors found that shape combinations that are parts of larger configurations are less well remembered than shape combinations of the same kind that are not embedded. Combined with basic echanisms of statistical learning, this embeddedness constraint enables the development of complex new features for acquiring internal representations efficiently without being computationally intractable. The resulting representations also encode parts and wholes by chunking the visual input into components according to the statistical coherence of their constituents. These results suggest that a bootstrapping approach of constrained statistical learning offers a unified framework for investigating the formation of different internal representations in pattern and scene perception.
Fiser J., Chiu C. & Weliky M. (2004) Small modulations of ongoing cortical dynamics by sensory input during natural vision, Nature 2004 Sep 30; 431:573-578.
During vision, it is believed that neural activity in the primary visual cortex is predominantly driven by sensory input from the environment. However, visual cortical neurons respond to repeated presentations of the same stimulus with a high degree of variability. Although this variability has been considered to be noise owing to random spontaneous activity within the cortex, recent studies show that spontaneous activity has a highly coherent spatio-temporal structure. This raises the possibility that the pattern of this spontaneous activity may shape neural responses during natural viewing conditions to a larger extent than previously thought. Here, we examine the relationship between spontaneous activity and the response of primary visual cortical neurons to dynamic natural-scene and random-noise film images in awake, freely viewing ferrets from the time of eye opening to maturity. The correspondence between evoked neural activity and the structure of the input signal was weak in young animals, but systematically improved with age. This improvement was linked to a shift in the dynamics of spontaneous activity. At all ages including the mature animal, correlations in spontaneous neural firing were only slightly modified by visual stimulation, irrespective of the sensory input. These results suggest that in both the developing and mature visual cortex, sensory evoked neural activity represents the modulation and triggering of ongoing circuit dynamics by input signals, rather than directly reflecting the structure of the input signal itself.
Fiser J., Bex PJ. & Makous W. (2003) Contrast conservation in human vision. Vision Research 43 (25), 2637-2648
Visual experience, which is defined by brief saccadic sampling of complex scenes at high contrast, has typically been studied with static gratings at threshold contrast. To investigate how suprathreshold visual processing is related to threshold vision, we tested the temporal integration of contrast in the presence of large, sudden changes in the stimuli such occur during saccades under natural conditions. We observed completely different effects under threshold and suprathreshold viewing conditions. The threshold contrast of successively presented gratings that were either perpendicularly oriented or of inverted phase showed probability summation, implying no detectable interaction between independent visual detectors. However, at suprathreshold levels we found complete algebraic summation of contrast for stimuli longer than 53 ms. The same results were obtained during sudden changes between random noise patterns and between natural scenes. These results cannot be explained by traditional contrast gain-control mechanisms or the effect of contrast constancy. Rather, at suprathreshold levels, the visual system seems to conserve the contrast information from recently viewed images, perhaps for the efficient assessment of the contrast of the visual scene while the eye saccades from place to place.
Weliky M., Fiser J., Hunt RH. & Wagner DN. (2003) Coding of natural scenes in primary visual cortex. Neuron 37 (4), 703-718
Natural scene coding in ferret visual cortex was investigated using a new technique for multi-site recording of neuronal activity from the cortical surface. Surface recordings accurately reflected radially aligned layer 2/3 activity. At individual sites, evoked activity to natural scenes was weakly correlated with the local image contrast structure falling within the cells’ classical receptive field. However, a population code, derived from activity integrated across cortical sites having retinotopically overlapping receptive fields, correlated strongly with the local image contrast structure. Cell responses demonstrated high lifetime sparseness, population sparseness, and high dispersal values, implying efficient neural coding in terms of information processing. These results indicate that while cells at an individual cortical site do not provide a reliable estimate of the local contrast structure in natural scenes, cell activity integrated across distributed cortical sites is closely related to this structure in the form of a sparse and dispersed code.
Fiser J. & Aslin RN. (2002) Statistical learning of new visual feature combinations by infants. Proceedings of the National Academy of Sciences 99 (24), 15822-15826
The ability of humans to recognize a nearly unlimited number of unique visual objects must be based on a robust and efficient learning mechanism that extracts complex visual features from the environment. To determine whether statistically optimal representations of scenes are formed during early development, we used a habituation paradigm with 9-month-old infants and found that, by mere observation of multielement scenes, they become sensitive to the underlying statistical structure of those scenes. After exposure to a large number of scenes, infants paid more attention not only to element pairs that cooccurred more often as embedded elements in the scenes than other pairs, but also to pairs that had higher predictability (conditional probability) between the elements of the pair. These findings suggest that, similar to lower-level visual representations, infants learn higher-order visual features based on the statistical coherence of elements within the scenes, thereby allowing them to develop an efficient representation for further associative learning.
Fiser J. & Aslin RN. (2002) Statistical learning of higher-order temporal structure from visual shape sequences.. Journal of Experimental Psychology: Learning, Memory, and Cognition 28 (3), 458
In 3 experiments, the authors investigated the ability of observers to extract the probabilities of successive shape co-occurrences during passive viewing. Participants became sensitive to several temporal-order statistics, both rapidly and with no overt task or explicit instructions. Sequences of shapes presented during familiarization were distinguished from novel sequences of familiar shapes, as well as from shape sequences that were seen during familiarization but less frequently than other shape sequences, demonstrating at least the extraction of joint probabilities of 2 consecutive shapes. When joint probabilities did not differ, another higher-order statistic (conditional probability) was automatically computed, thereby allowing participants to predict the temporal order of shapes. Results of a single-shape test documented that lower-order statistics were retained during the extraction of higher-order statistics. These results suggest that observers automatically extract multiple statistics of temporal events that are suitable for efficient associative learning of new temporal features.
Fiser J. & Aslin RN. (2001) Unsupervised statistical learning of higher-order spatial structures from visual scenes. Psychological science 12 (6), 499-504
Three experiments investigated the ability of human observers to extract the joint and conditional probabilities of shape co-occurrences during passive viewing of complex visual scenes. Results indicated that statistical learning of shape conjunctions was both rapid and automatic, as subjects were not instructed to attend to any particular features of the displays. Moreover, in addition to single-shape frequency, subjects acquired in parallel several different higher-order aspects of the statistical structure of the displays, including absolute shape-position relations in an array, shape-pair arrangements independent of position, and conditional probabilities of shape co-occurrences. Unsupervised learning of these higher-order statistics provides support for Barlow’s theory of visual recognition, which posits that detecting "suspicious coincidences" of elements during recognition is a necessary prerequisite for efficient learning of new visual features.
Fiser J., Subramaniam S. & Biederman I. (2001) Size tuning in the absence of spatial frequency tuning in object recognition. Vision Research 41 (15), 1931-1950
How do we attend to objects at a variety of sizes as we view our visual world? Because of an advantage in identification of lowpass over highpass filtered patterns, as well as large over small images, a number of theorists have assumed that size-independent recognition is achieved by spatial frequency (SF) based coarse-to-fine tuning. We found that the advantage of large sizes or low SFs was lost when participants attempted to identify a target object (specified verbally) somewhere in the middle of a sequence of 40 images of objects, each shown for only 72 ms, as long as the target and distractors were the same size or spatial frequency (unfiltered or low or high bandpassed). When targets were of a different size or scale than the distractors, a marked advantage (pop out) was observed for large (unfiltered) and low SF targets against small (unfiltered) and high SF distractors, respectively, and a marked decrement for the complementary conditions. Importantly, this pattern of results for large and small images was unaffected by holding absolute or relative SF content constant over the different sizes and it could not be explained by simple luminance- or contrast-based pattern masking. These results suggest that size/scale tuning in object recognition was accomplished over the first several images (576 ms) in the sequence and that the size tuning was implemented by a mechanism sensitive to spatial extent rather than to variations in spatial frequency.
Fiser J. & Biederman I. (2001) Invariance of long-term visual priming to scale, reflection, translation, and hemisphere. Vision Research 41 (2), 221-234
The representation of shape mediating visual object priming was investigated. In two blocks of trials, subjects named images of common objects presented for 185 ms that were bandpass filtered, either at high (10 cpd) or at low (2 cpd) center frequency with a 1.5 octave bandwidth, and positioned either 5º right or left of fixation. The second presentation of an image of a given object type could be filtered at the same or different band, be shown at the same or translated (and mirror reflected) position, and be the same exemplar as that in the first block or a same-name different-shaped exemplar (e.g. a different kind of chair). Second block reaction times (RTs) and error rates were markedly lower than they were on the first block, which, in the context of prior results, was indicative of strong priming. A change of exemplar in the second block resulted in a significant cost in RTs and error rates, indicating that a portion of the priming was visual and not just verbal or basic-level conceptual. However, a change in the spatial frequency (SF) content of the image had no effect on priming despite the dramatic difference it made in appearance of the objects. This invariance to SF changes was also preserved with centrally presented images in a second experiment. Priming was also invariant to a change in left–right position (and mirror orientation) of the image. The invariance over translation of such a large magnitude suggests that the locus of the representation mediating the priming is beyond an area that would be homologous to posterior TEO in the monkey. We conclude that this representation is insensitive to low level image variations (e.g. SF, precise position or orientation of features) that do not alter the basic part-structure of the object. Finally, recognition performance was unaffected by whether low or high bandpassed images were presented either in the left or right visual field, giving no support to the hypothesis of hemispheric differences in processing low and high spatial frequencies.
Atkins JE., Fiser J. & Jacobs RA. (2001) Experience-dependent visual cue integration based on consistencies between visual and haptic percepts. Vision research 41 (4), 449-461
We study the hypothesis that observers can use haptic percepts as a standard against which the relative reliabilities of visual cues can be judged, and that these reliabilities determine how observers combine depth information provided by these cues. Using a novel visuo-haptic virtual reality environment, subjects viewed and grasped virtual objects. In Experiment 1, subjects were trained under motion relevant conditions, during which haptic and visual motion cues were consistent whereas haptic and visual texture cues were uncorrelated, and texture relevant conditions, during which haptic and texture cues were consistent whereas haptic and motion cues were uncorrelated. Subjects relied more on the motion cue after motion relevant training than after texture relevant training, and more on the texture cue after texture relevant training than after motion relevant training. Experiment 2 studied whether or not subjects could adapt their visual cue combination strategies in a context-dependent manner based on context-dependent consistencies between haptic and visual cues. Subjects successfully learned two cue combination strategies in parallel, and correctly applied each strategy in its appropriate context. Experiment 3, which was similar to Experiment 1 except that it used a more naturalistic experimental task, yielded the same pattern of results as Experiment 1 indicating that the findings do not depend on the precise nature of the experimental task. Overall, the results suggest that observers can involuntarily compare visual and haptic percepts in order to evaluate the relative reliabilities of visual cues, and that these reliabilities determine how cues are combined during three-dimensional visual perception.
Mel BW. & Fiser J. (2000) Minimizing binding errors using learned conjunctive features. Neural Computation 12 (4), 731-762
We have studied some of the design trade-offs governing visual representations based on spatially invariant conjunctive feature detectors, with an emphasis on the susceptibility of such systems to false-positive recognition errors — Malsburg’s classical binding problem. We begin by deriving an analytical model that makes explicit how recognition performance is affected by the number of objects that must be distinguished, the number of features included in the representation, the complexity of individual objects, and the clutter load, that is, the amount of visual material in the field of view in which multiple objects must be simultaneously recognized, independent of pose, and without explicit segmentation. Using the domain of text to model object recognition in cluttered scenes, we show that with corrections for the nonuniform probability and nonindependence of text features, the analytical model achieves good fits to measured recognition rates in simulations involving a wide range of clutter loads, word sizes, and feature counts.We then introduce a greedy algorithm for feature learning, derived from the analytical model, which grows a representation by choosing those conjunctive features that are most likely to distinguish objects from the cluttered backgrounds in which they are embedded.We show that the representations produced by this algorithm are compact, decorrelated, and heavily weighted toward features of low conjunctive order. Our results provide a more quantitative basis for understanding when spatially invariant conjunctive features can support unambiguous perception in multiobject scenes, and lead to several insights regarding the properties of visual representations optimized for specific recognition tasks.
Biederman I., Subramaniam S., Bar M., Kalocsai P. & Fiser J. (1999) Subordinate-level object classification reexamined. Psychological Research 62 (2-3), 131-153
The classication of a table as round rather than square, a car as a Mazda rather than a Ford, a drill bit as 3/8-inch rather than 1/4-inch, and a face as Tom have all been regarded as a single process termed "subordinate classification". Despite the common label, the considerable heterogeneity of the perceptual processing required to achieve such classifications requires, minimally, a more detailed taxonomy. Perceptual information relevant to subordinate-level shape classications can be presumed to vary on continua of (a) the type of distinctive information that is present, nonaccidental or metric, (b) the size of the relevant contours or surfaces, and (c) the similarity of the to-be-discriminated features, such as whether a straight contour has to be distinguished from a contour of low curvature versus high curvature. We consider three, relatively pure cases. Case 1 subordinates may be distinguished by a representation, a geon structural description (GSD), specify ing a nonaccidental characterization of an object’s large parts and the relations among these parts, such as a round table versus a square table. Case 2 subordinates are also distinguished by GSDs, except that the distinctive GSDs are present at a small scale in a complex object so the location and mapping of the GSDs are contingent on an initial basic-level classification, such as when we use a logo to distinguish various makes of cars. Expertise for Cases 1 and 2 can be easily achieved through specification, often verbal, of the GSDs. Case 3 subordinates, which have furnished much of the grist for theorizing with "view-based" template models, requireone metric discriminations. Cases 1 and 2 account for the overwhelming majority of shape-based basic- and subordinate-level object classifications that people can and do make in their everyday lives. These classifications are typically made quickly, accurately, and with only modest costs of viewpoint changes. Whereas the activation of an array of multiscale, multiorientation filters, presumed to be at the initial stage of all shape process ing, may suffce for determining the similarity of the representations mediating recognition among Case 3 subordinate stimuli (and faces), Cases 1 and 2 require that the output of these flters be mapped to classifiers that make explicit the nonaccidental properties, parts, and relations specified by the GSDs.
Dobbins AC., Jeo RM., Fiser J. & Allman JM. (1998) Distance modulation of neural activity in the visual cortex. Science 281 (5376), 552-555
Humans use distance information to scale the size of objects. Earlier studies demonstrated changes in neural response as a function of gaze direction and gaze distance in the dorsal visual cortical pathway to parietal cortex. These findings have been interpreted as evidence of the parietal pathway’s role in spatial representation. Here, distance-dependent changes in neural response were also found to be common in neurons in the ventral pathway leading to inferotemporal cortex of monkeys. This result implies that the information necessary for object and spatial scaling is common to all visual cortical areas.
Fiser J., Biederman I. & Cooper EE. (1996) To what extent can matching algorithms based on direct outputs of spatial filters account for human object recognition? Spatial Vision 10 (3), 237-271
A number of recent successful models of face recognition posit only two layers, an input layer consisting of a lattice of spatial filters and a single subsequent stage by which those descriptor values are mapped directly onto an object representation layer by standard matching methods such as stochastic optimization. Is this approach sufficient for modeling human object recognition? We tested whether a highly efficient version of such a two-layer model would manifest effects similar to those shown by humans when given the task of recognizing images of objects that had been employed in a series of psychophysical experiments. System accuracy was quite high overall, but was qualitatively different from that evidenced by humans in object recognition tasks. The discrepancy between the system’s performance and human performance is likely to be revealed by all models that map filter values directly onto object units. These results suggest that human object recognition (as opposed to face recognition) may be difficult to approximate by models that do not posit hidden units for explicit representation of intermediate entities such as edges, viewpoint invariant classifiers, axes, shocks and/or object parts.
Fiser J. & Biederman I. (1995) Size invariance in visual object priming of gray-scale images. Perception 24 (7), 741-748
The strength of visual priming of briefly presented gray scale pictures of real world objects, measured by naming reaction times and errors, was independent of whether the primed picture of the object was presented in the same or different size than the original picture. These findings replicate Biederman & Cooper’s (1992) results on size invariance in shape recognition, which were obtained with line drawings, and extend them to the domain of gray level images. Entry-level shape identification is based either predominantly on scale-invariant representations incorporating orientation and depth discontinuities which are well captured by line drawings, or both discontinuities and the representation derived from smooth gradual surface changes are scale invariant.