One of my interests has always been to build a more accurate 3D model using computer vision. In the not too distant past, I have thought that it would be cool to do, but it won't help me graduate, and I'm not sure what I would use it for anyway. Now, I still think that it probably won't help me graduate, but at least I came up with a use for it besides the arm/mobile manipulation project.
I tend to see two sides in the papers on affordances I'm reading. Some go with a rich model approach and learn to recognize a rather small set of objects. Some go with no model and use observation. It seems like people use a little of both. Or perhaps people learn a rich model using observation. At any rate, people don't exclusively do one thing... we use models and we learn. The big question is how to build up the model.
It seems like one component of that model would be 3D structure. I have looked at a few "environment reconstruction from cameras" papers, and they don't look trivial to implement, as well as having results that are a little lacking. While reading today, I remembered a relatively old algorithm that Philip and I implemented in part for a class project... voxel coloring. Now it's ridiculously slow and consumes a lot of memory (slight improvement with paper called real-time voxel coloring), but the end results are better than any stereo vision algorithm I have come across (although I haven't looked in depth at multi-view reconstruction). The big trick is that you have to segment out the object of interest, and you have to know where the cameras are with a rather high degree of accuracy. It's the kind of algorithm that's simple in theory (and simulation), but becomes extremely difficult to make robust in practice.
Limitations/assumptions notwithstanding, I think this might be a good approach for a robot to build up a 3d model of an object it's interested in. The robot requires mobility (or big-brother multiple cameras), but I think that's a reasonable requirement for any project that truly wants to learn affordances of objects. I envision an approach where the robot identifies an object of interest, takes a picture, then moves around it and takes pictures at say, 10 degree intervals while tracking the object. From that point voxel coloring (or another reconstruction algorithm) can be applied to build a 3D model.
I actually had this idea before we got the SwissRanger in our lab as a way to build up a good 3D model with only a single monocular camera. But, as I said before, it probably won't help me graduate to implement such code, so whether I actually implement such a thing is still undecided.
Tuesday, November 17, 2009
Thursday, November 12, 2009
Inverse affordances
While thinking of hand manipulation affordances, an idea came to mind. I'll call it inverse affordances for now. The idea is something like this: the user wants to perform such and such an action on an object, so what tool or procedure would be useful for doing such an action? An example case would be unscrewing a bolt, so what size wrench (8mm? 12mm?) would be best for that task? Or a screwdriver... what size screwdriver do I need? I can see this being especially useful for telemanipulation, even something like space maintenance/construction. Hey, I'd even like to pull out a cameraphone, point it at whatever bolt I'm trying to unscrew, and have it tell me what size wrench I need. From that standpoint it could be a "mechanic's assistant."
Now, I'm not going to try to solve this problem from start to finish right now, it might be interesting to setup a sort of Woz study, where all of the objects are annotated manually ahead of time, and then the user interacts with the system.
Now, I'm not going to try to solve this problem from start to finish right now, it might be interesting to setup a sort of Woz study, where all of the objects are annotated manually ahead of time, and then the user interacts with the system.
Subscribe to:
Posts (Atom)