Machine vision: layer-based models

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

It's challenging for MV software to figure out, when looking at a complex scene, how to segment it into distinct objects. The main reason is that there doesn't seem to be anything intrinsic in an image to suggest boundaries among objects.

Perhaps expectations can play into it. I was experimenting with a simple sort of expectation system in which the video camera gazes as a static scene. In time, the output image dissolves into black. Only when an object passes into the field of view does it break out from black. The moving parts stand out. I they stand still for a while, they too fade to black to indicate that they are now part of the static scenery. The mechanism is pretty simple. There's an "ambient" image that is built with time. Each pixel is constantly being scanned and an expectation for what its color should be is built. Later, a simple comparison of the current scene's image to the ambient image will only yield non-zero pixel differences wherever a pixel color has suddenly changed, typically because an object is moving through.

That's a cool experiment, but not useful for much. Perhaps it could be used to help isolate objects long enough to build simple models of them. Add a little sophistication to the above. Instead of constantly morphing an ambient image over time, the agent pauses a few moments initially to determine that the entire scene is static, then takes a snapshot, perhaps averaged out over two or three frames to cancel out typical noise. Henceforth, so long as the agent knows its looking at the same scene, it would cancel it out using the snapshot - the "model" - to see if anything new is there. A person might sit down in a chair in the scene for a few minutes, but he'd not disappear from the scene, even though he's stationary.

Comments

Popular posts from this blog

Coherence and ambiguities in problem solving

Neural network in C# with multicore parallelization / MNIST digits demo

Mechanical finger