Search This Blog

Tuesday, November 13, 2007

Confirmation bias as a tool of perception

I've been trying to figure out where to go next with my study of perception. One concept I'm exploring is the idea that our expectations enhance our ability to recognize patterns.

I recently found a brilliant illustration of this from researcher Matt Davis, who studies how humans process language. Try out the following audio samples. Listen to the first one several times. It's a "vocoded" version of the plain English recording that follows. Can you tell what's being said?

Vocoded version.

Click here to open this WAV file

Give up? Now listen to the plain English version once and then listen to the vocoded version again.

Clear English version.

Click here to open this WAV file

Davis refers to this a-ha effect as "pop-out":

    Perhaps the clearest case of pop-out occurs if you listen to a vocoded sentence before and immediately after you hear the same sentence in vocoded form. It is likely that the vocoded sentence will sound a lot clearer when you know the identity of that sentence.

To me, this is a wonderful example of confirmation bias. Once you have an expectation of what to look for in the data, you quickly find it.

How does this relate to perception? I believe that recognizing patterns in real world data involves not only the data causing simple pattern matching to occur (bottom up), but also higher level expectations prompting the lower levels to search for expected patterns (top down). To help illustrate and explain, consider how you might engineer a specific task of perception: detecting a straight line in a picture. If you're familiar with machine vision, you'll know this is an age-old problem that has been fairly well solved using some good algorithms. Still, it's not trivial. Consider the following illustration of a picture of a building and some of the steps leading up to our thought experiment:

The first three steps we'll take are pretty conventional ones. First, we get our source image. Second, we apply a filter that looks at each pixel to see if it strongly contrasts with its neighbors. Our output is represented by a grayscale image, with black pixels representing strong contrasts in the source image. In our third step, we "threshold" our contrast image so each pixel goes either to black or white; no shades of gray.

Here's where our line detection begins. We'll say we start by making a list of all sets of neighboring black pixels that have, say, 10 or more pixels touching one another. Next, we filter these by seeing which have a large number of those pixels roughly fitting a line function. We end up with a bunch of small line segments. Traditionally, we could stop here, but we don't have to. We could pick any of these line segments and extend it out in either direction to see how far it can go and still find black pixels that roughly fit that line function. We might even tolerate a gap of a white pixel or two as we continue extending out. And we might try different variations of the line function that still fit but fit better as the line segment gets longer, in order to further refine the line function. But then uncertainty kicks in and we conservatively stop stretching out when we no longer see black pixels.

Here's where confirmation bias can help. Once we have a bunch of high-certainty line segments to work with, we now have expectations set about where lines form. So maybe we take our line segments back to the grayscale version of the contrast image. To my thinking, those gray pixels that got thresholded to white earlier still contain useful information. In fact, each grey pixel in the hypothesized line provides "evidence" that the line continues onward; that the "hypothesis" is "valid". It doesn't even matter that there may be lots of other grey -- or even black -- pixels just outside the hypothesized line. They don't add to or distract from the hypothesis. Only the "positive confirmation" of grey pixels adds weight to the hypothesis that the line extends further than we could tell by the black pixels in the thresholded version. Naturally, as the line extends out, we may get to a point where most of the pixels are white or light. Then we stop extending our line.

I love this example. It shows how we can start with the source data "suggesting" certain known patterns (here, lines) and that a higher level model can then set expectations about bigger patterns that are not immediately visible (longer lines) and use otherwise "weak evidence" (light grey pixels) as additional confirmation that such patterns are indeed found. To me, this is a wonderful illustration of inductive reasoning at work. The dark pixels may give strong, deductive proof of the existence of lines in the source data, but the light pixels that fit the extended line functions give weaker inductive evidence of the same.

I don't mean to suggest that perception is now solved. This example works because I've predefined a model of an "object"; here, a line. I could extend the example to search for ellipses, rectangles, and so on. But having to predefine these primitive object types seems to miss the point that we are quite capable of discovering these and much more sophisticated models for ourselves. There's no real learning in my example; only refinement. Still, I like that this illustrates how confirmation bias -- something of a dirty phrase in the worlds of science and politics -- probably plays a central role in the nature of perception.

Tuesday, November 6, 2007

What bar code scanners can tell us about perception

It may not be obvious, but a basic bar code scanner does something that machine vision researchers would love to see their own systems do: find objects amidst noisy backgrounds of visual information. What is an "object" to a bar code scanner? To answer that, let's start by explaining what a bar code is.

What is a bar code?

You've probably seen bar codes everywhere. Typically, they are represented as a series of vertical bars with a number or code underneath. There are many standards for bar codes, but we'll limit ourselves to one narrow class, typified by the following example:

This sort of bar code has a start code and an end code. These typically feature a very wide bar. One of its main purposes is to serve as a standard for bar widths. This is sometimes 4x the unit width for a bar. The remaining bars and gaps between them will be some multiple of that unit width (e.g., 1x, 2x, or 3x). Each sequence of bars and gaps relates to a unique number (or letter or other symbol) that is specified in advance by the standard for that kind of bar code.

A bar code scanner, like the handheld version pictured at right, doesn't actually care that the code is 2D, as you see it. To the scanner, the input is a stream of alternating light and dark signals, typically furnished by a laser signal bouncing off white paper or being absorbed by black ink (or reflecting / not reflecting off an aluminum can, etc.). If you're a programmer or PhotoShop guru, you could visualize this as starting with a digital snapshot of a bar code and cropping away all but a single pixel line of the image that cuts across the bar code, then applying a threshold to convert it into a black and white image devoid of color and even shades of gray.

The size of the bar code doesn't much matter, either. Within a certain, wide range, a bar code scanner will take any string of solid black as a potential start of a bar code, whether it's small or large and whether it's off to the left or the right of the center of the scanner's view.

What the scanner is doing with this stream of information is looking for the beginning and ending of a black section and using that first sample as a cue to look for the rest of the start code (or stop code; the bar code could be upside down) following it. If it finds that pattern, it continues looking for the patterns that follow, translating them into the appropriate digits, letters, or symbols, until it reaches the stop code.

Now, bar codes are often damaged. And they often appear in a noisy background of information. In fact, the inventors of bar code standards are very aware that a random pattern on a printed page could be misinterpreted as a bar code. They dealt with this by adding in several checks. For instance, one or more of the digits in a bar code are reserved as a "check code", the output of a mathematical function applied to the other data. The scanner applies the same function. If the output isn't the same as what the check code read in is, the candidate bar code scan is rejected as corrupt. Even the digit representations themselves contain only a small subset of all possible bar/gap combinations in order to reduce the chances that an errant spot or other invalid information could be misconstrued as a valid bar code. In fact, the odds that a bar code scanner could misread a bar code like the one above are so infinitesimally small that engineers and clerks can place nearly 100% confidence in their bar codes. A bar code either does or does not scan. There's no "kinda".

Seeing things

Bar codes have been engineered so well that it's possible to leave a scanner turned on 24/7, scanning out over a wide area, seeing all sorts of noise continuously, and be nearly 100% guaranteed that when it thinks it sees a bar code in the environment, it is correct. Some warehouses feature stationary bar code scanners that scan large boxes as they are moved along by fork lifts, for instance.

What does this have to do with machine vision? Isn't it amazing that a bar code scanner can deal with an incredibly noisy environment and still have a nearly 100% accuracy when it finds a bar code? This is very much like how you can pick out a human face in a busy picture with nearly 100% accuracy. There's all sorts of things that may ping your face recognition capacity, but when your focus is brought to bear on them, your skill at filtering out noise and correctly identifying the real faces is incredible, just like the bar code scanner. What's more, it doesn't matter where in your visual field the face is and how near or far it is, within a reasonable range. Just like the scanner.

Vision researchers are still hard pressed to provide an accounting of how we perceive the world visually. Machine vision researchers have been doing all sorts of neat things for decades, but we're still barely scratching the surface, here, for lack of a comprehensive theory of perception. Yet engineers creating bar codes decades ago actually solved this problem in a narrow case.

A good bar code scanner has an elegant solution to the problems of noise, scale invariance (zoom & offset), bounds detection (via start and stop codes). They even made it so a single bar code could represent one of billions of unique messages, not just be a simple there/not-there marker.

The bigger picture

Of course, I don't want to suggest that bar code scanners hold the key to solving the basic problem of perception. You probably have already guessed that the secret to bar codes is that they follow well engineered standards that make it almost easy to pick bar codes out of a noisy environment. Vision researchers have likewise made many systems that are quite capable of picking out human faces, as well as a variety of special classes of clearly definable objects.

It's pretty much accepted wisdom in human brain research now that much of what we see in the world is what we are looking to find. A bar code scanner works because it knows what to look for. Obviously, one key difference between your perceptual faculty and a bar code scanner is that the scanner is "born" with all the knowledge it needs, while you have to learn how faces, chairs, and cars "work" for yourself.

Still, for people wondering how to approach the question of perception, bar coding is not a bad analogy to start with.