One of my interests has always been to build a more accurate 3D model using computer vision. In the not too distant past, I have thought that it would be cool to do, but it won't help me graduate, and I'm not sure what I would use it for anyway. Now, I still think that it probably won't help me graduate, but at least I came up with a use for it besides the arm/mobile manipulation project.
I tend to see two sides in the papers on affordances I'm reading. Some go with a rich model approach and learn to recognize a rather small set of objects. Some go with no model and use observation. It seems like people use a little of both. Or perhaps people learn a rich model using observation. At any rate, people don't exclusively do one thing... we use models and we learn. The big question is how to build up the model.
It seems like one component of that model would be 3D structure. I have looked at a few "environment reconstruction from cameras" papers, and they don't look trivial to implement, as well as having results that are a little lacking. While reading today, I remembered a relatively old algorithm that Philip and I implemented in part for a class project... voxel coloring. Now it's ridiculously slow and consumes a lot of memory (slight improvement with paper called real-time voxel coloring), but the end results are better than any stereo vision algorithm I have come across (although I haven't looked in depth at multi-view reconstruction). The big trick is that you have to segment out the object of interest, and you have to know where the cameras are with a rather high degree of accuracy. It's the kind of algorithm that's simple in theory (and simulation), but becomes extremely difficult to make robust in practice.
Limitations/assumptions notwithstanding, I think this might be a good approach for a robot to build up a 3d model of an object it's interested in. The robot requires mobility (or big-brother multiple cameras), but I think that's a reasonable requirement for any project that truly wants to learn affordances of objects. I envision an approach where the robot identifies an object of interest, takes a picture, then moves around it and takes pictures at say, 10 degree intervals while tracking the object. From that point voxel coloring (or another reconstruction algorithm) can be applied to build a 3D model.
I actually had this idea before we got the SwissRanger in our lab as a way to build up a good 3D model with only a single monocular camera. But, as I said before, it probably won't help me graduate to implement such code, so whether I actually implement such a thing is still undecided.
Tuesday, November 17, 2009
Thursday, November 12, 2009
Inverse affordances
While thinking of hand manipulation affordances, an idea came to mind. I'll call it inverse affordances for now. The idea is something like this: the user wants to perform such and such an action on an object, so what tool or procedure would be useful for doing such an action? An example case would be unscrewing a bolt, so what size wrench (8mm? 12mm?) would be best for that task? Or a screwdriver... what size screwdriver do I need? I can see this being especially useful for telemanipulation, even something like space maintenance/construction. Hey, I'd even like to pull out a cameraphone, point it at whatever bolt I'm trying to unscrew, and have it tell me what size wrench I need. From that standpoint it could be a "mechanic's assistant."
Now, I'm not going to try to solve this problem from start to finish right now, it might be interesting to setup a sort of Woz study, where all of the objects are annotated manually ahead of time, and then the user interacts with the system.
Now, I'm not going to try to solve this problem from start to finish right now, it might be interesting to setup a sort of Woz study, where all of the objects are annotated manually ahead of time, and then the user interacts with the system.
Thursday, October 1, 2009
Affordance Maps
We have come up with an overarching theme for my past, present, and future research: affordance maps. In terms of interacting with the physical world, objects afford certain actions. A common example is that a chair affords sitting. In computer vision, classifying a chair is a difficult problem, because chairs have highly varied form. For any object that affords sitting, saying that the object is a chair is quite likely not completely wrong, even if it isn't the best answer. For example, a table affords sitting, so it could be classified as a chair. Such an answer might get a chuckle out of another person, but only because it's a somewhat unusual answer, and not entirely wrong.
In this sense we can see that most objects afford many actions, and some actions are afforded by several things. A chair affords sitting, standing, pushing, pounding, kicking, etc. Throwing is afforded by almost all objects, at least for some person or machine (most people cannot throw a car, but a construction crane could). Any object will have an affordance ranking or preference pattern. For a cup, this might be drinking, pouring, drumming, trapping insects, in order from highest to lowest preference.
Extending this idea, we can consider the notion of social affordances. That is, given the current state of the world, what social actions are permitted or acceptable? Discovering this kind of thing automatically is certainly daunting, and the only idea I have so far is to classify human social actions then study and learn from humans interacting with each other and with the robot.
In this sense we can see that most objects afford many actions, and some actions are afforded by several things. A chair affords sitting, standing, pushing, pounding, kicking, etc. Throwing is afforded by almost all objects, at least for some person or machine (most people cannot throw a car, but a construction crane could). Any object will have an affordance ranking or preference pattern. For a cup, this might be drinking, pouring, drumming, trapping insects, in order from highest to lowest preference.
Extending this idea, we can consider the notion of social affordances. That is, given the current state of the world, what social actions are permitted or acceptable? Discovering this kind of thing automatically is certainly daunting, and the only idea I have so far is to classify human social actions then study and learn from humans interacting with each other and with the robot.
Thursday, August 27, 2009
Head tracking design changes
I spent some time the past couple of days refining the head tracking manipulation interface.
The first change was to mount the Wii remote up on the wall so it's farther away from the operator.
This change was motivated by the operator easily moving out of the camera's field of view when leaning left and right to change the view.
One side effect of this change due to the camera pointing at a steep downward angle is that "leaning closer in" now adjusts the declination of the virtual camera, instead of the zoom distance.
At first I thought I should modify the trig that calculates the head position, but first I decided to test it.
The result is I think I like the "lean in" to adjust the declination.
My explanation for why I prefer this way is adjusting zoom is rarely needed for this task, and adjusting declination is needed much more often.
Another change I made was to couple the virtual camera azimuth with the base joint rotation of the arm.
This means that the operator can sit still and rotate the arm, and the view keeps the arm in the same orientation by rotating the virtual camera.
The head tracking comes into play by offsetting from the coupled view.
This essentially means only relatively small head motions are required to get the most useful viewpoints (top down and side views).
Speaking of viewpoints reminded me of a configuration I should probably compare against.
Several people have asked whether I have tried displaying a side and top-down view at the same time.
I think it might be fast and usable in clean environments where you can isolate your object of interest, but cluttered environments would make it impossible to use such a display without an additional "3/4" dynamic view to understand what blob corresponds with what object.
It's probably something worth including for the journal paper I plan on writing.
The first change was to mount the Wii remote up on the wall so it's farther away from the operator.
This change was motivated by the operator easily moving out of the camera's field of view when leaning left and right to change the view.
One side effect of this change due to the camera pointing at a steep downward angle is that "leaning closer in" now adjusts the declination of the virtual camera, instead of the zoom distance.
At first I thought I should modify the trig that calculates the head position, but first I decided to test it.
The result is I think I like the "lean in" to adjust the declination.
My explanation for why I prefer this way is adjusting zoom is rarely needed for this task, and adjusting declination is needed much more often.
Another change I made was to couple the virtual camera azimuth with the base joint rotation of the arm.
This means that the operator can sit still and rotate the arm, and the view keeps the arm in the same orientation by rotating the virtual camera.
The head tracking comes into play by offsetting from the coupled view.
This essentially means only relatively small head motions are required to get the most useful viewpoints (top down and side views).
Speaking of viewpoints reminded me of a configuration I should probably compare against.
Several people have asked whether I have tried displaying a side and top-down view at the same time.
I think it might be fast and usable in clean environments where you can isolate your object of interest, but cluttered environments would make it impossible to use such a display without an additional "3/4" dynamic view to understand what blob corresponds with what object.
It's probably something worth including for the journal paper I plan on writing.
Labels:
design,
head tracking,
journal,
user interface
Wednesday, August 26, 2009
Erroneous artifacts in 3D display
Having recently finished annotating all of the video for the second user study, I noticed a few issues.
None of the issues I saw in the second user study were as severe as the first user study, but they are interesting.
Two people, that with my subjective judgment were complete novices to robot control and interpreting 3D information on a 2D display, mistook some artifacting in the 3D model as the deposit box, and so repeatedly dropped blocks on the artifacts.
The artifacts were showing because of an imperfect filter that is supposed to remove all parts of the 3D model that are not relevant to the task (that is, the blocks, pipes, and deposit box).
Some of the floor was showing up in the model, and these two subjects seemed to think it looked like the box.
If I were designing an interface to cater specifically to this task, I can think of ways to support the operator that would really make the deposit box stand out.
I don't think that's really what I'm researching, though, so I'm not going to change the interface design in that way, especially since 30 out of 32 people had no trouble finding the deposit box.
Most likely nobody will want to use a mobile manipulator for this particular task, since it's really mostly a toy world.
Another problem is a remnant from the first user study.
Yes, we're coming back to the alignment issue.
For one target block in one particular layout, the alignment was off by enough that at least half of the people had trouble getting it.
There were a couple other blocks that were slightly off, but most people got them as long as they followed the instructions in the training.
What it amounts to is that the calibration was slightly off for those couple of regions.
It's a little disappointing, but not too much so, since I can filter out the problem blocks to look at what happens with the well-aligned blocks.
Since there are 6 layouts and 3 blocks per layout, that means that only 1 or 2 block samples out of 18 is bad.
I think it's still plenty usable and will give some interesting insights.
None of the issues I saw in the second user study were as severe as the first user study, but they are interesting.
Two people, that with my subjective judgment were complete novices to robot control and interpreting 3D information on a 2D display, mistook some artifacting in the 3D model as the deposit box, and so repeatedly dropped blocks on the artifacts.
The artifacts were showing because of an imperfect filter that is supposed to remove all parts of the 3D model that are not relevant to the task (that is, the blocks, pipes, and deposit box).
Some of the floor was showing up in the model, and these two subjects seemed to think it looked like the box.
If I were designing an interface to cater specifically to this task, I can think of ways to support the operator that would really make the deposit box stand out.
I don't think that's really what I'm researching, though, so I'm not going to change the interface design in that way, especially since 30 out of 32 people had no trouble finding the deposit box.
Most likely nobody will want to use a mobile manipulator for this particular task, since it's really mostly a toy world.
Another problem is a remnant from the first user study.
Yes, we're coming back to the alignment issue.
For one target block in one particular layout, the alignment was off by enough that at least half of the people had trouble getting it.
There were a couple other blocks that were slightly off, but most people got them as long as they followed the instructions in the training.
What it amounts to is that the calibration was slightly off for those couple of regions.
It's a little disappointing, but not too much so, since I can filter out the problem blocks to look at what happens with the well-aligned blocks.
Since there are 6 layouts and 3 blocks per layout, that means that only 1 or 2 block samples out of 18 is bad.
I think it's still plenty usable and will give some interesting insights.
Labels:
calibration,
user interface,
user study
Tuesday, January 27, 2009
Novices testing interfaces that only experts will use
One of the sad ironies of my research (at least at this point) is that experts are an expensive, limited resource, and novices are readily available. So, which would you pick? We have a few strategies in mind to in a way get a little more than just novice results, but the stuff that will be viewed as statistically significant will be done by novices.
The sadness for me in this case comes from the view-dependent control I talked about in my previous post. I mentioned a hybrid joint control/end effector control that is not view-dependent, as it seems to most people that the view-dependence was confusing. People who had a bit more experience with working in a 3D world on a 2D screen seemed to like it more, but that's just anecdotal, and I might just be hoping. I certainly like it better. It's my impression that people would perform better with the view-dependent control after a bit of training time (like, a couple hours, not a couple minutes).
Now, I'm talking about the control type I used for the first user study, where two separate joysticks are used. The single-joystick view-dependent control was just plain confusing with head tracking. With two sticks, one is view-dependent, and the other is always up/down. That's my favorite configuration so far, but I understand the system pretty well, and I'm designing to my preferences.
The answers to these questions really can only be found through science and testing. I really need to keep moving!
The sadness for me in this case comes from the view-dependent control I talked about in my previous post. I mentioned a hybrid joint control/end effector control that is not view-dependent, as it seems to most people that the view-dependence was confusing. People who had a bit more experience with working in a 3D world on a 2D screen seemed to like it more, but that's just anecdotal, and I might just be hoping. I certainly like it better. It's my impression that people would perform better with the view-dependent control after a bit of training time (like, a couple hours, not a couple minutes).
Now, I'm talking about the control type I used for the first user study, where two separate joysticks are used. The single-joystick view-dependent control was just plain confusing with head tracking. With two sticks, one is view-dependent, and the other is always up/down. That's my favorite configuration so far, but I understand the system pretty well, and I'm designing to my preferences.
The answers to these questions really can only be found through science and testing. I really need to keep moving!
Labels:
control,
design,
head tracking,
user interface,
user study
Monday, January 26, 2009
This robot is out of control!
One of the things we robo-manipulation guys have to deal with is how to design the user control for the robot arm. I'm not talking about the whole user interface, but in particular the controls to make the robot move. If there's some autonomy in there, you have a little bit more flexibility, and you can fairly effectively just use a mouse. But when you just want to allow the operator to teleoperate or "remote control" the arm, things are a little trickier.
Now, just driving a wheeled robot around is not so challenging, because people are accustomed to driving cars, and sometimes even from a remote perspective. With a robot arm, you have to make a control that operates in full 3D, and that can even include 3 axes of rotation. If you want to go really low level, then you need some way to control each joint of the arm.
My first attempt at it was to use two "thumbsticks" on a modern video game controller. The type of control I'm going for is end-effector control, where you move a virtual target point for the robot's gripper to reach. One thumbstick controls motion in one plane, and the other stick controls motion in the remaining axis. This works OK with some training, but a lot of people seemed to still struggle with it. The next attempt was to reduce the control to one stick, and change the way the controls work depending on the view. From a top-down view, the end effector moves in the XY plane (where Z is vertical), and from a side view, the end effector moves in a plane parallel to the Z axis. The tricky part is in views that are neither side or top views. When the view is halfway between those, how should the end effector move? For now I have it somewhat "remember" what control it's in, and you have to go almost all the way to the other view to switch modes. This ends up being rather confusing even for me after practicing for a while. Another option would be to simply move the end effector in the view plane. It's hard to say whether that would be good.
Some other ideas would be to make a non-standard end effector control approach. For instance, left and right on the stick could rotate the base of the arm, and forward and back would extend the arm. Instead of controlling individual joints, however, this mode would still be moving the end effector. There is a difference.
The other day one of the people on my committee offered his Novint Falcon (3D force feedback controller) for controlling the robot arm. At first I thought this was totally the way to go and would solve all of my problems including world hunger. This would mean that I wouldn't have to use two separate joysticks or different modes... with a 3D controller up is up, left is left, and back is back.
Then I remembered the whole view-dependent thing. The trouble is that the interface has head tracking to adjust the view. So it's really easy to adjust the view. It's possibly good and important, but it also makes one wonder whether the controls should always do the same thing, or depend on the view. There's a paper by Jose Macedo called "The Effect of Automated Compensation for Incongruent Axes on Teleoperator Performance" that talks about this, and they basically say that people do better with the automatic compensation (or as I say, view-dependent) control than without.
I think my situation's a little different than theirs, though. They evaluate 2D control, and it's also static. By static I mean that once they have the control axes and the display axes determined, they remain in their particular (mis)alignment for the duration of the experiment. In my case, it's 3D control, and the alignment between axes is dynamic throughout the experiment. So I think it needs to be tested. Perhaps after I get this thesis done.
Now, just driving a wheeled robot around is not so challenging, because people are accustomed to driving cars, and sometimes even from a remote perspective. With a robot arm, you have to make a control that operates in full 3D, and that can even include 3 axes of rotation. If you want to go really low level, then you need some way to control each joint of the arm.
My first attempt at it was to use two "thumbsticks" on a modern video game controller. The type of control I'm going for is end-effector control, where you move a virtual target point for the robot's gripper to reach. One thumbstick controls motion in one plane, and the other stick controls motion in the remaining axis. This works OK with some training, but a lot of people seemed to still struggle with it. The next attempt was to reduce the control to one stick, and change the way the controls work depending on the view. From a top-down view, the end effector moves in the XY plane (where Z is vertical), and from a side view, the end effector moves in a plane parallel to the Z axis. The tricky part is in views that are neither side or top views. When the view is halfway between those, how should the end effector move? For now I have it somewhat "remember" what control it's in, and you have to go almost all the way to the other view to switch modes. This ends up being rather confusing even for me after practicing for a while. Another option would be to simply move the end effector in the view plane. It's hard to say whether that would be good.
Some other ideas would be to make a non-standard end effector control approach. For instance, left and right on the stick could rotate the base of the arm, and forward and back would extend the arm. Instead of controlling individual joints, however, this mode would still be moving the end effector. There is a difference.
The other day one of the people on my committee offered his Novint Falcon (3D force feedback controller) for controlling the robot arm. At first I thought this was totally the way to go and would solve all of my problems including world hunger. This would mean that I wouldn't have to use two separate joysticks or different modes... with a 3D controller up is up, left is left, and back is back.
Then I remembered the whole view-dependent thing. The trouble is that the interface has head tracking to adjust the view. So it's really easy to adjust the view. It's possibly good and important, but it also makes one wonder whether the controls should always do the same thing, or depend on the view. There's a paper by Jose Macedo called "The Effect of Automated Compensation for Incongruent Axes on Teleoperator Performance" that talks about this, and they basically say that people do better with the automatic compensation (or as I say, view-dependent) control than without.
I think my situation's a little different than theirs, though. They evaluate 2D control, and it's also static. By static I mean that once they have the control axes and the display axes determined, they remain in their particular (mis)alignment for the duration of the experiment. In my case, it's 3D control, and the alignment between axes is dynamic throughout the experiment. So I think it needs to be tested. Perhaps after I get this thesis done.
Subscribe to:
Posts (Atom)