Search This Blog

Wednesday, June 22, 2011


OK, so there haven't been any posts lately, and I have a good reason for that; I was spending the time developing an algorithm called Plan B which is a provably good path planning algorithm to plan paths that include the probability of losing communications between a group of robots.  The handy part of this algorithm is that it is designed to accept any radio propagation model I cook up, including ones based on Fourier Optics. 

In addition, I talked with Joe Kider about working on this, and we've come up with a bit of a plan on what we're going to do.  That is the reason for this post.

So, in a vaguely ordered form:
  1. We need to step back from doing code, and do math first.  I spent some time learning about the different ways that the math for light is calculated in normal space, but neither Joe nor I have any good ideas on how some of that translates into Fourier space.  That means that we need to figure out things like reflection, refraction, polarization, etc. in an analytical form in Fourier space.  If we can do that, then just as we can calculate the affine transforms of a triangle in Fourier space, we can do the rest there as well.  Done right, that should reduce the amount of work the CPU/GPU/box of highly caffeinated squirrels needs to do, and increase accuracy (decreases the work because we won't have to do a forward FFT; the inverse FFT still needs to be done.
  2. I haven't discussed this with Joe yet, but I want to switch away from CUDA and do OpenCL instead.  From what I heard from one of the other guys in GPGPU, OpenCL is finally starting to catch up with CUDA in performance, and to surpass it in some areas.  That is a Good Thing™as it means that the code we write could run on many different vendors' platforms.  The only downside is that there aren't any really good FFT libraries for OpenCL yet, so we'd need to write one.  The good thing about that is that the FFT is an exceedingly well understood transform, so we can cut our teeth learning OpenCL by doing the FFT library we need, and comparing the output of it to other, known good implementations, like FFTW
  3. Much of the code I wrote originally is still usable, but it needs some cleanup.  I've learned a great deal more about CMake and Google Test than I knew when I wrote the original code, so one of the first tasks is to reorganize everything so that it works correctly within CMake's and Google Test's concept of 'correct'.  Some bits are probably going to go away, and other bits are going to stay.  More than likely, we're going to have to create several layers of code, to act as different abstraction layers.  So, one of the first things that will likely happen is that we put together a good FFT project, and ignore the current code until the FFT part works (at least a little bit).
And that about wraps up where things are at right now.  More than likely, the wiki is what will get updated, and that will be about it for a while.  Once we have all that really nailed down, THEN we'll move onto making the code good.

Tuesday, April 26, 2011

??? CuFFT what ARE you doing ???

OK, so here is what I've figured out so far.
  • CuFFT doesn't fail all the time; it fails every OTHER time its called.
  • When it does fail, it is data dependent as to which part fails; that is, if you break up line 466 of FourierOptics/src/Private/ so that pointAccumulator has its real and imaginary parts updated separately, then comment out the update of the imaginary part only, CuFFT doesn't fail, provided you input only one triangle, and that it is in a 'goldilocks' size range.  I have yet to find a range of sizes and positions that make both the real and imaginary part succeed.  Regardless, this is wrong; it shouldn't fail just because the data is an unexpected size.
I am going to continue investigating this, but won't really be able to do so for another week; other projects demand my time.  However, I talked with Joe Kider and know that he's interested in working on Fourier optics as well; turns out that it is getting interest in the graphics community.  So I'm going to work with him over the summer to see what can be done. 

Monday, April 25, 2011

Rules to live by

Don't upgrade your drivers before a deadline.  You will be unhappy.

As for the movie, its still uploading to bitbucket.  And its after midnight now... yay.  Whenever it finishes uploading, I'll post a link to it.  You can pull the code as it is, but I want to figure out WHY cufft is failing on the inverse transform, so I'm going to keep on debugging while the movie slowly (oh, SO slowly) uploads.  If it turns out is something in my code that wasn't tripping up the old drivers, I'll post that as well.

EDIT It's been 20 minutes, and its still uploading.  Which is unfortunate, as there is a chance I can use cuda-gdb to figure out what's going on, but to do so, I have to log out and then log back in in console mode, which means interrupting the upload.  At this point, I'm going to go to bed and let the upload take as long as it takes.  The movie will get up there in its own sweet time.

EDIT 2  It finished uploading sometime overnight.  You can download it here.  The movie is cheesy because everything went wrong; I'm going to keep on beating on cuFFT to figure out why its not doing the right thing, but if I can't figure it out, eventually I'm going to rip it out and use FFTW instead, or maybe look into OpenCL based implementations.  At the very least, I want to write a wrapper that makes all of the various flavors look the same, so I can have CPU & GPU unit tests that can cross-compare.  With enough time, that should make things work right (I hope).

Meh, onto other projects

Almost at the deadline!

I'm still working out how to display the output of my code.  I've downloaded and installed the cairo graphics package, and unlike AGG, it works.  Now the trick is to figure out its API between now and midnight! :(

Oh well, at least the poster and paper are up:
The only problem I've got when I look at both of those is realizing how much more I want to get done in order for this code to be complete.  Its never ending...

Saturday, April 23, 2011


In my last post, I mentioned trying to use AGG to visualize my output.  Unfortunately, it doesn't even compile on my computer.  That makes displaying the output impossible at the moment, unfortunately.  The other ways I know of are matplotlib (which requires python) and matlab (which I have no idea how to connect to to get it to do what I want).   I have some idea on how to do it with a mac, but the mac REALLY DOES NOT LIKE IT when you try to beat it into doing what you want in that way, which means many, many days of work to make that happen.  Right now, I'm fresh out of ideas, so I'm going to work on my poster and video instead.  Maybe I'll think of some nifty visualization of my output later.


I've finally gotten all my CUDA code in place, and I've realized that I've hit a brick wall.  I've written a number of unit tests using googletest, which is a fairly nice unit testing framework, and my code passes all of my tests.  The problem is that it is limited, as all unit testing frameworks are, in that it can really only test for equality well.  This is a big problem for the code I'm writing because after reading through quite a bit of material on the web, I've come to realize that not all cards support IEEE 754 in quite the same way, or as completely, as they should.  That means that even if I had a golden test set to test against, I could fail the test, while still being reasonably accurate.  The only way I may be able to bypass this problem is if I can render my output to a file or to the screen directly, and then eyeball it.  Towards that end, I just downloaded the AGG library, which claims to use only portable C++ to render images to files.  I'm going to briefly look at it, and see if I can get it to do what I want it to do (make pretty pictures) in the amount of time I have left.  However, since this is a brand-new (to me) API, and since I STILL don't have a poster or video up, if I can't get it all working within an hour or less, I'm going to have to abandon it to get my presentation working.

Thursday, April 21, 2011

Dual Quaternions

I spent much of today hacking out code so that I can use dual quaternions, only to realize that there is every chance that they are significantly slower than using homogenous coordinates.  Why?  Because dual quaternions require an 8 x 1 column vector to represent, while a rotation and translation (same power as a dual quaternion) requires a 4 x 4 matrix; the important point is that the 4 x 4 matrix may be hardware accelerated on the GPU as mat4 is an OpenGL type.  The only other advantage that dual quaternions have is that they are relatively easy to blend (interpolate & extrapolate) over.  However, for my purposes, there won't be any blending being done; you need to rotate & translate by some fixed amount, and that's it. 

For the time being, I'm side-stepping the issue.  Originally, I had the ability to embed frames within other frames, forming a mesh of frames (along with embedding objects within a frame).  Although I still allow all the embedding to happen, I don't walk out to find all the embedded parts, which means that if it isn't embedded in the top-level frame, it isn't found.  This isn't ideal, but it will allow me to get to the heart of the problem more quickly.  Once I have some really good results, I'll revisit this problem, and see what can be done about it.  The more I look at it though, the more I think I'm going to have to go with homogeneous coordinates, and just be done with it.