Our research

During development, humans and animals learn to make sense of their visual environment based on their momentary sensory input and their internal representation of earlier short- and long-term experiences. Despite decades of behavioral and electrophysiological research, it is still not clear how this perceptual process occurs, what representations the brain uses for it, and how these internal representations are acquired through visual learning. We follow a systematic research program to clarify these issues.

 

Topics

Visual statistical learning in humans and animals

 

What do people learn when they see a novel visual scene? To investigate this question, we use a statistical learning paradigm, which differs from classical perceptual learning (Fiser, 2009). Using this paradigm, we have conducted a series of adult and infant experiments showing that humans possess a fundamental ability to extract statistical regularities of unknown visual scenes automatically both in time and space from a very early age (Fiser & Aslin 2001, 2002a, 2002b, 2005). We argued that this core ability is crucial for the formation of visual representations in the brain from the simplest levels of luminance changes up to the level of conscious memory traces, and we showed how such learning interacts with perceptual processes (Fiser, Scholl & Aslin, 2007)Using fMRI and patient studies, we also identified the brain structures involved in this learning (Roser, Fiser, Aslin, & Gazzaniga 2011; Roser, Aslin, McKenzie, Zahra & Fiser 2015; Karuza, Emberson, Roser, Cole, Aslin, & Fiser 2017 ).

Currently, we are expanding this line of research to four new directions.  First, we are studying the link between implicit statistical and more abstract "rule-learning" (Nemeth, Fiser & Janacsek 2012; Nemeth Janacsek & Fiser 2013; MacKenzie & Fiser, in prep) as well as between statistical learning and classical perceptual learning (Lengyel & Fiser, 2019). Second, we are investigating statistical learning across haptic and visual (Lengyel, Žalalytė, Pantelides, Ingram, Fiser, Lengyel & Wolpert, 2019), and auditory and visual modalities (Reguly, Nagy, Markus, & Fiser in prep) to understand the level of abstraction within the emerging internal representation.  Third, we are examining statistical learning in chicks (Rosa-Salva, Fiser,Versace, Dolci, Chehaimi, Santolin, & Vallortigara 2018) and honeybees (Avarguès-Weber, Finke, Nagy, Szabó, d’Amaro, Dyer & Fiser, 2020) to identify the similarities and differences in the underlying learning mechanism and relate these differences to humans' superior learning abilities. Fourth, we  are exploring the link between statistical learning and active vision with the help of eye movement analyses (Arato, Rothkopf, & Fiser in prep).

 

The Sampling Hypothesis: implementing probabilistic learning and inference in the cortex

 

The probabilistic framework requires a continuous reciprocal interaction between groups of elements at different levels of the hierarchical representation in sharp contrast with the traditional feed-forward view of how visual information is processed in the cortex. However, there are very few proposals as to how such coding can be implemented realistically in the brain. We have developed such a proposal based on two basic hypotheses: 1) the cortex represents a generative model of the outside world, 2) neural activity can be functionally described as samples from the posterior probability of causes given the visual input (Fiser, Berkes, Orban & Lengyel, 2010). In earlier work, we have shown that, both at the level of primary visual cortex and at higher areas, the representation of visual information is best described as the activity pattern of cell assemblies suited for representing probabilities rather than a set of independent feature detectors (Weliky, Fiser, Hunt & Wagner, 2003, Dobbins, Jeo, Fiser & Allman, 1998). We have also shown that the precise developmental pattern and the correlational structure of cell responses in the primary visual cortex calls in question the notion that ongoing cortical activity is accidental noise unrelated to visual coding (Fiser, Chiu & Weliky 2004).

Based on these findings, we proposed that ongoing activity is the manifestation of internal states of the brain that expresses relevant prior knowledge of the world for perception, and sensory input only modulates these states in a probabilistic manner (Fiser, Berkes, Orban & Lengyel, 2010). This proposal supports Hebb's original notion of internal dynamical states being crucial for integrating cognitive processes beyond simple stimulus-response associations, and it can potentially close the gap between response functions and complex behavior. Our sampling-based framework also provides a viable alternative to earlier proposals in the literature (Probabilistic Population Codes and Receptive Field models) of how the brain computes with probabilities (Fiser, Berkes, Orban & Lengyel, 2010, Fiser, Savin, Lengyel, Orban, Berkes, 2013).  Furthermore, it easily generates predictions that are directly testable physiologically. Using multi-electrode recordings in the developing ferret brain, we have confirmed the first of such predictions, namely that the distribution of spontaneous activity should converge with age to the distribution of evoked activity marginalized over natural visual stimuli (Berkes, Orban, Lengyel & Fiser, 2011).  We also found that suppression of cortical neural variability in the awake, but not in the anesthetized animal is stimulus- and state-dependent, further supporting the special status of spontaneous activity in cortical processing (White, Abbott & Fiser 2012).

Currently, we use our framework to investigate the internal model developed in people when they are exposed to some visual input. We found that the dependence of the capacity of visual working memory on the number of items on the display is just a special case of a more general dependency on the complexity of the input as specified by prior experience (Lengyel, Orban, Fiser, in prep). We also found that, contrary to the  classic view of coding with fixed receptive fields, humans encode orientation information in a dynamic context by continuously combining sensory information with expectations derived from earlier experiences (Christensen, Bex & Fiser, 2015). Moreover, we have evidence that orientation and position information of small contour segments are encoded in a different manner and combined together according to the rules of optimal cue combination (Christensen, Bex & Fiser, 2019).

Probabilistic modeling of visual perception and statistical learning

 

We have developed a computational framework of perception and learning based on the premise that cortical functioning can be best described as performing probabilistic inference and learning for optimal prediction and control in an uncertain environment (Fiser, Berkes, Orban & Lengyel, 2010). Based on this framework, we showed that the behavioral results of our visual statistical learning experiments cannot be explained by recursive pair-wise associative learning, while it can be captured well with an optimal probabilistic model based on Bayesian model selection (Orban, Fiser, Aslin & Lengyel, 2008). This model suggests that humans code their sensory input through an "unconscious inference" process that interprets the statistical structure of the input based on previous experience, and looks for the simplest description of this input in terms of its possible underlying causes. We can support this proposal by showing that humans build up their internal representations in a coarse-to-fine manner which fits well the inference framework but not the recursive pair-wise learning framework (Fiser, Orban, Aslin & Lengyel, in prep). We also showed how this framework gives a probabilistic interpretation to contextual effects in scenes (Orban, Lengyel & Fiser, in prep) and provides a tightly coupled explanation for visual perception and visual learning (Atkins, Fiser & Jacobs, 2001, Fiser, Berkes, Orban & Lengyel, 2010).

 
 

Currently, we are expanding the sampling framework to different directions.  First, we show that the above prediction holds only in normally reared animals confirming the role of visual experience in developing internal representations (Savin, Chui, Lengyel & Fiser, in prep), and investigate whether it holds not only in the visual but also cross-modally between vision and gustation (White & Fiser in prep). Second, we show that the new framework gives normative account of trial-to-trial variability, noise suppression, cross-modal interactions, and a large number of observations about signal and noise correlations reported in the literature (Orban, Berkes, Fiser & Lengyel 2016). Third, extending the framework hierarchically by incorporating a perceptual decision making task, we show that hte framework can naturally capture a number of top-down-effect-related phenomena (choice-probability changes, task dependence of noise correlation, changes in psychophysical kernel) earlier attributed to attention (Haefner, Berkes & Fiser 2016).  Fourth, we identify human behavioral hallmarks of sampling-based probabilistic computation in the brain (Lengyel, Koblinger, Zoltowski & Fiser in prep).

The effects of sequential perception and learning

 

All perceptual tasks unfold in time and there are strong consequences of the sequential nature of these processes. Although short-term effects have been vigorously investigated recently within the framework of sequential decision making, the long-term effects during sequential perception and learning are much less studied. Using a sequential decision making framework, we showed that long-term effects can be as strong in influencing momentary perceptions and decision as short-term effects, and that these effects cannot be explained by a gradual and continuous evidence integration as assumed by classical drift diffusion models (Arato & Fiser in prep).  We also found that observers automatically develop a full generative model of their perceptual experience even in the simplest tasks, and modify this model according to new experience. For example, when multiple adjustments of the model can describe the observer's experience, the observer unconsciously applies an optimal cue-combination-like arbitration, and chooses the adjustment that involves changing parameters of the model that proved to be less reliable in the past (Koblinger, Arato & Fiser in prep).

Emergence of visual constancies and invariances

 

In the classic framework of visual processing where the goal of sensory coding is to retain as much information of the input as possible, the existence of any constancy or invariance is a sign of information loss, i.e. a failure of achieving the original goal. In the generative probabilistic framework, where the goal is to develop an internal world model that is suitable for achieving the organism's goals, discarding information is desirable because it gives the most parsimonious model, and constancies and invariances are needed because they lead to the most efficient route to the goals. We provide evidence that a dominant view on the role of sensory coding, encoding maximal amount of information efficiently through sparsification, is not supported by neural data (Berkes, White & Fiser 2010). We have also shown that human recognition is strongly invariant to size, translation, reflection (Fiser & Biederman 2001, 1995). We demonstrated that in case of size, such invariance is adaptively emerging based on the immediate context of the visual input (Fiser, Subramaniam & Biederman, 2001) similar to what is typically found with low-level attributes, such as contrast constancy (Fiser, Bex & Makous, 2003; Fiser & Fine, in prep). We also showed that the neural representation of such size invariance is not represented by individual cells with more size invariant characteristics emerging specifically along the ventral pathway of the cortex (Dobbins, Jeo, Fiser & Allman, 1998).

Currently, we are investigating how such invariances could emerge through statistical learning processes (Ledley & Fiser, in prep).

 
 
 

Significant features of visual perception

 

There are two goals of this project: 1) by using a novel image processing method for generating synthetic natural images, to examine what the local visual element conjunctions are that support object detection and identification; 2) using these results, to investigate the effect of two types of visual loss in patient populations, glaucoma and age-related macular degeneration (AMD), in order to develop effective methods to improve the residual vision in this large and growing population. We have found that humans use high level visual templates for object identification that integrate spatial structure in a manner that depends on the predictability of the image (Christensen, Bex & Fiser, submitted). Such templates provide top-down interpretations that allows faster and more robust perception, and at the same time render the observer unaware of high levels of image noise, analogous to our unawareness of image blur in the peripheral visual field. In visual search tasks using gaze-contingent artificial scotomas in the center of gaze simulating the visual field loss of people with AMD, we found that normal subjects required longer to locate a search target, displayed an increased number of saccades that were separated by periods of unstable fixation (McIlreavy, Fiser & Bex 2009). With gaze-contingent artificial scotomas in the peripheral visual field, simulating the visual impairment of patients with glaucoma, we also found that normal subjects displayed fixation patterns that did not change in amplitude compared to normal conditions despite the fact that a large fraction of these saccades landed now in areas of no information (McIlreavy, Fiser & Bex 2012). Both of these results suggest more effective rehabilitative strategies that can be developed for patients with ADM and glaucoma.

Currently, we are expanding the framework into attributes other than orientation, such as position and motion  (Christensen, Bex & Fiser, 2019), and investigate how coding of such low level attributes in a probabilistic manner can be feasibly implemented by a sampling based representation (Christensen, Lengyel & Fiser, in prep).