Wednesday, December 17, 2008

Seeing double

We recently got a Videre STOC stereo camera, and I've been working on integrating it with our robot arm setup. It has been somewhat of a challenge, mostly because the API provided by Videre/SRI is in C++, and our interface is in C#. A couple of my labmates mentioned that they had been able to work with C++ dlls before, so I looked into it.

SRI's API unfortunately is not just function calls, they also have custom classes that need to be created as well. So I needed to make a C++ wrapper class that would use the objects as needed to get me the information I wanted, and make everything function-based.

The next hurdle was that I wanted to get some 50,000+ 3D points from the camera, and doing a single GetPoint call for each point would be a massive bottleneck... it would be better to get the entire list of points with one call. Several hours later, I discovered that I didn't create a normal C++ wrapper class, but a C++/CLI project. That means that I have some managed code concepts to use inside C++.

After some research, I discovered that there are basically two main approaches to writing interop code with C# and C++ in windows. The more widely documented way is to do stuff with P/Invoke and marshaling data structures and write a COM dll in vanilla C++. The way I stumbled onto with CLI is to use an extended version of C++ that adds managed structures that can be used directly in C#. Basically it's a .NET-specific dll, but it really reduces the headache associated with getting funny data structures across to C++, at least if you're content with duplicating the contents of memory rather than using the existing C++ memory in-place. There are probably ways to marshal the data better than what I'm doing so far, but I will leave that to the optimization zealots. Here ( http://www.codeproject.com/KB/mcpp/quickcppcli.aspx ) is a link to the resource I used to learn what I needed for the CLI stuff. Very concise quick reference with examples, though I doubt it's complete.


The end result of all of this is that I have a working integration of the new stereo camera into the interface, although there are still a few adjustments left to make.

It looks to me like the objects will be unrecognizable from a viewpoint other than the camera's, but I'm hoping that the position will be accurate. So standard video can be used to identify what objects are, and the 3D model can be used to determine where they are. Of course by the time I'm done with this research, the 3D model will be a perfect (down to the subatomic particles) representation of reality ;)