Visual statistical learning in adults and infants
The starting point of our research is asking the question: what do people learn when they see a novel visual scene? In order to investigate this question appropriately, we have developed a new learning paradigm, called observational learning, which differs from classical perceptual learning and is suitable for rigorous computational analysis (Fiser, 2009). Using this paradigm, we have conducted a series of adult and infant experiments showing that humans possess a fundamental ability to extract statistical regularities of unknown visual scenes automatically both in time and space from a very early age (Fiser & Aslin 2001, 2002a, 2002b, 2005). We argued that this basic ability is key in the formation of visual representations from the simplest levels of luminance changes to the level of conscious memory traces, and we showed how such learning interacts with perceptual processes (Fiser, Scholl & Aslin, 2007).
Currently, we are investigating the interaction between visual statistical learning and various perceptual constraints due to e.g., eye movements, clutter of the scene, occlusion, that make such learning feasible under natural circumstances. We are also exploring the link between statistical and more abstract "rule-learning" and the effect of sleep on these different types of learning (MacKenzie & Fiser, in prep). Using fMRI and patient studies ( Roser, Fiser, Aslin, & Gazzaniga 2010, in press ), we are also identifying the brain structures involved in this learning, so that we will be able to make predictions about the nature of the process.
Computational modeling of visual statistical learning
Our computational framework is based on the premise that brain functioning can be best described as performing probabilistic inference and learning for optimal prediction and control in an uncertain environment (Fiser, Berkes, Orban & Lengyel, 2010). Based on this framework, we have conducted an extensive computational analysis to show that the behavioral results of our visual statistical learning experiments cannot be explained by recursive pair-wise associative learning, but can be captured well with an optimal probabilistic model based on Bayesian model selection (Orban, Fiser, Aslin & Lengyel, 2008). This suggests that humans code their sensory input through an "unconscious inference" process that interprets the statistical structure of the input based on previous experience, and looks for the simplest description of this input in terms of its possible underlying causes. We have confirmed this prediction by showing that humans build up their internal representations in a coarse-to-fine manner which fits well the inference framework but not the recursive pair-wise learning framework (Fiser, Orban, Aslin & Lengyel, in prep). We also, showed how this framework gives a statistically based interpretation to contextual effects (Orban, Laredo, Lengyel & Fiser, in prep) and provides a tightly coupled explanation for visual perception and visual learning (Atkins, Fiser & Jacobs, 2001, Fiser, Berkes, Orban & Lengyel, 2010).
Currently, we use our method to reliably estimate the internal model developed in people when they are exposed to some visual input to address other questions of perception from a novel angle. For example, we showed that the dependence of the capacity of visual working memory on the number of items on the display is just a special case of a more general dependency on the complexity of the input as specified by prior experience (Lengyel, Orban, Fiser, in prep). We also use the same approach to investigate and predict eye movements based on the acquired internal model of the environment (Cui, Orban, Lengyel & Fiser, in prep).
Biologically plausible implementation of probabilistic learning and inference in the brain
The Bayesian framework requires a continuous reciprocal interaction between groups of elements at different levels of the hierarchical representation encoded in the brain. While this dynamic collective probabilistic coding is in contrast with the traditional feed-forward view of how visual information is processed in the cortex, there are very few proposals as to how such coding can be implemented realistically in the brain. We have developed such a proposal based on two basic hypothesis: 1) the cortex represents a generative model of the outside world, 2) neural activity can be functionally described as samples from the posterior probability of causes given the visual input (Fiser, Berkes, Orban & Lengyel, 2010). In earlier work, we have shown that both at the level of primary visual cortex and at higher areas the representation of visual information is best described as the activity pattern of cell assemblies suited for representing probabilities rather than a set of independent feature detectors (Weliky, Fiser, Hunt & Wagner, 2003, Dobbins, Jeo, Fiser & Allman, 1998). We have also shown that the precise developmental pattern and the correlational structure of cell responses in the primary visual cortex calls in question the notion that ongoing cortical activity is accidental noise unrelated to visual coding (Fiser, Chiu & Weliky 2004).
Based on these findings, we suggested that ongoing activity is the manifestation of internal states of the brain that expresses relevant prior knowledge of the world for perception, and sensory input only modulates these states in a probabilisitic manner (Fiser, Berkes, Orban & Lengyel, 2010). This view supports Hebb's original notion of internal dynamical states being crucial for integrating cognitive processes beyond simple stimulus-response associations, and it can potentially close the gap between response functions and complex behavior. It also redefines the basic notions of cortical computation and representation. Our computational framework based on the Generative Model and the Sampling hypotheses is a viable alternative to earlier proposals in the literature (Probabilistic Population Codes and Receptive Field models) of how the brain computes with probabilities (Fiser, Berkes, Orban & Lengyel, 2010). We have shown that this framework can give a normative account of receptive field development while explaining trial-to-trial variability (Orban, Fiser, Lengyel, 2007). It also generates the directly testable prediction that the distribution of spontaneous activity should converge with age to the distribution of evoked activity marginalized over natural visual stimuli. We have confirmed this prediction by analyzing multi-electrode recordings in the developing ferret brain at different ages (Berkes, Orban, Lengyel & Fiser, 2011).
Currently, we are expanding this framework to different modalities and to different levels in the sensory processing hierarchy. We have confirmed the above prediction not only in the visual but also in the auditory domain, and we are exploring the relation between evoked and spontaneous activity in extrastriate areas as well as in the prefrontal cortex (Bekres, David, Fritz, Shamma, Lengyel & Fiser, in prep). We are developing new experiments with behaving animals to confirm that the new framework gives natural explanation to trial-to-trial variability, noise suppression, bi-stable perception, cross-modal interactions, top-down modulations, imagination and dreaming. We also demonstrated that the variance of stimulus-evoked activity not only reduces when compared to spontaneous activity, as reported before, but does it in a frequency-dependent manner confirming theoretical predictions of dynamic systems with spontaneous activity operating just outside the chaotic regime (Berkes, White & Fiser 2010).
Emergence of visual constancies and invariances
In the classic framework of visual processing where the goal of sensory coding is to retain as much information of the input as possible, the existence of any constancy or invariance is a sign of information loss, i.e. a failure of achieving the original goal. In the generative probabilistic framework, where the goal is to develop an internal world model that is suitable for achieving the organism's goals, discarding information is desirable because it gives the most parsimonious model, and constancies and invariances are needed because they lead to the most efficient route to the goals. We provide evidence that a dominant view on the role of sensory coding, encoding maximal amount of information efficiently through sparsification, is not supported by neural data (Berkes, White & Fiser 2010). We have also shown that human recognition is strongly invariant to size, translation, reflection (Fiser & Biederman 2001, 1995). We demonstrated that in case of size, such invariance is adaptively emerging based on the immediate context of the visual input (Fiser, Subramaniam & Biederman, 2001) similar to what is typically found with low-level attributes, such as contrast constancy (Fiser, Bex & Makous, 2003; Fiser & Fine, in prep). We also showed that the neural representation of such size invariance is not represented by individual cells with more size invariant characteristics emerging specifically along the ventral pathway of the cortex (Dobbins, Jeo, Fiser & Allman, 1998).
Thus on the one hand, invariances exist, they seem to be manifested by dynamic adaptive interpretation of the input, and not encoded by individual cell features. On the other hand, while classical theories are unable to justify the existence of invariances and not supported by physiological data, a probabilistic account seems suitable to explain their emergence and representation. We are focusing on finding a biologically plausible link between the emergence of such invariances and our probabilistic framework by tying together statistical and rule learning with structural representations in the visual cortex.
Significant features of visual perception
There are two goals of this project: 1) by using a novel image processing method for generating synthetic natural images, to examine what the local visual element conjunctions are that support object detection and identification; 2) using these results, to investigate the effect of two types of visual loss in patient populations, glaucoma and age-related macular degeneration (AMD), in order to develop effective methods to improve the residual vision in this large and growing population. We have found that humans use high level visual templates for object identification that integrate spatial structure in a manner that depends on the predictability of the image (Galperin, Bex & Fiser 2008). Such templates provide top-down interpretations that allows faster and more robust perception, and at the same time render the observer unaware of high levels of image noise, analogous to our unawareness of image blur in the peripheral visual field. In visual search tasks using gaze-contingent artificial scotomas in the center of gaze simulating the visual field loss of people with AMD, we found that normal subjects required longer to locate a search target, displayed an increased number of saccades that were separated by periods of unstable fixation (McIlreavy, Fiser & Bex 2009). With gaze-contingent artificial scotomas in the peripheral visual field, simulating the visual impairment of patients with glaucoma, we also found that normal subjects displayed fixation patterns that did not change in amplitude compared to normal conditions despite the fact that a large fraction of these saccades landed now in areas of no information. Both of these results suggest more effective rehabilitative strategies that can be developed for patients with ADM and glaucoma.
During development, humans and animals learn to make sense of their visual environment based on their momentary sensory input and their internal representation of earlier short- and long-term experiences. Despite decades of behavioral and electrophysiological research, it is still not clear how this perceptual process occurs, what representations the brain uses for it, and how these internal representations are acquired through visual learning. We follow a systematic research program to clarify these issues.