Search This Blog

Wednesday, June 22, 2011

IT'S NOT DEAD YET! IT'S FEELING MUCH BETTER!

OK, so there haven't been any posts lately, and I have a good reason for that; I was spending the time developing an algorithm called Plan B which is a provably good path planning algorithm to plan paths that include the probability of losing communications between a group of robots.  The handy part of this algorithm is that it is designed to accept any radio propagation model I cook up, including ones based on Fourier Optics. 

In addition, I talked with Joe Kider about working on this, and we've come up with a bit of a plan on what we're going to do.  That is the reason for this post.

So, in a vaguely ordered form:
  1. We need to step back from doing code, and do math first.  I spent some time learning about the different ways that the math for light is calculated in normal space, but neither Joe nor I have any good ideas on how some of that translates into Fourier space.  That means that we need to figure out things like reflection, refraction, polarization, etc. in an analytical form in Fourier space.  If we can do that, then just as we can calculate the affine transforms of a triangle in Fourier space, we can do the rest there as well.  Done right, that should reduce the amount of work the CPU/GPU/box of highly caffeinated squirrels needs to do, and increase accuracy (decreases the work because we won't have to do a forward FFT; the inverse FFT still needs to be done.
  2. I haven't discussed this with Joe yet, but I want to switch away from CUDA and do OpenCL instead.  From what I heard from one of the other guys in GPGPU, OpenCL is finally starting to catch up with CUDA in performance, and to surpass it in some areas.  That is a Good Thing™as it means that the code we write could run on many different vendors' platforms.  The only downside is that there aren't any really good FFT libraries for OpenCL yet, so we'd need to write one.  The good thing about that is that the FFT is an exceedingly well understood transform, so we can cut our teeth learning OpenCL by doing the FFT library we need, and comparing the output of it to other, known good implementations, like FFTW
  3. Much of the code I wrote originally is still usable, but it needs some cleanup.  I've learned a great deal more about CMake and Google Test than I knew when I wrote the original code, so one of the first tasks is to reorganize everything so that it works correctly within CMake's and Google Test's concept of 'correct'.  Some bits are probably going to go away, and other bits are going to stay.  More than likely, we're going to have to create several layers of code, to act as different abstraction layers.  So, one of the first things that will likely happen is that we put together a good FFT project, and ignore the current code until the FFT part works (at least a little bit).
And that about wraps up where things are at right now.  More than likely, the wiki is what will get updated, and that will be about it for a while.  Once we have all that really nailed down, THEN we'll move onto making the code good.

Tuesday, April 26, 2011

??? CuFFT what ARE you doing ???

OK, so here is what I've figured out so far.
  • CuFFT doesn't fail all the time; it fails every OTHER time its called.
  • When it does fail, it is data dependent as to which part fails; that is, if you break up line 466 of FourierOptics/src/Private/FourierOptics_TriangleAccumulator.cu so that pointAccumulator has its real and imaginary parts updated separately, then comment out the update of the imaginary part only, CuFFT doesn't fail, provided you input only one triangle, and that it is in a 'goldilocks' size range.  I have yet to find a range of sizes and positions that make both the real and imaginary part succeed.  Regardless, this is wrong; it shouldn't fail just because the data is an unexpected size.
I am going to continue investigating this, but won't really be able to do so for another week; other projects demand my time.  However, I talked with Joe Kider and know that he's interested in working on Fourier optics as well; turns out that it is getting interest in the graphics community.  So I'm going to work with him over the summer to see what can be done. 

Monday, April 25, 2011

Rules to live by

Don't upgrade your drivers before a deadline.  You will be unhappy.

As for the movie, its still uploading to bitbucket.  And its after midnight now... yay.  Whenever it finishes uploading, I'll post a link to it.  You can pull the code as it is, but I want to figure out WHY cufft is failing on the inverse transform, so I'm going to keep on debugging while the movie slowly (oh, SO slowly) uploads.  If it turns out is something in my code that wasn't tripping up the old drivers, I'll post that as well.

EDIT It's been 20 minutes, and its still uploading.  Which is unfortunate, as there is a chance I can use cuda-gdb to figure out what's going on, but to do so, I have to log out and then log back in in console mode, which means interrupting the upload.  At this point, I'm going to go to bed and let the upload take as long as it takes.  The movie will get up there in its own sweet time.

EDIT 2  It finished uploading sometime overnight.  You can download it here.  The movie is cheesy because everything went wrong; I'm going to keep on beating on cuFFT to figure out why its not doing the right thing, but if I can't figure it out, eventually I'm going to rip it out and use FFTW instead, or maybe look into OpenCL based implementations.  At the very least, I want to write a wrapper that makes all of the various flavors look the same, so I can have CPU & GPU unit tests that can cross-compare.  With enough time, that should make things work right (I hope).

Meh, onto other projects

Almost at the deadline!

I'm still working out how to display the output of my code.  I've downloaded and installed the cairo graphics package, and unlike AGG, it works.  Now the trick is to figure out its API between now and midnight! :(

Oh well, at least the poster and paper are up:
The only problem I've got when I look at both of those is realizing how much more I want to get done in order for this code to be complete.  Its never ending...

Saturday, April 23, 2011

AGG

In my last post, I mentioned trying to use AGG to visualize my output.  Unfortunately, it doesn't even compile on my computer.  That makes displaying the output impossible at the moment, unfortunately.  The other ways I know of are matplotlib (which requires python) and matlab (which I have no idea how to connect to to get it to do what I want).   I have some idea on how to do it with a mac, but the mac REALLY DOES NOT LIKE IT when you try to beat it into doing what you want in that way, which means many, many days of work to make that happen.  Right now, I'm fresh out of ideas, so I'm going to work on my poster and video instead.  Maybe I'll think of some nifty visualization of my output later.

Testing

I've finally gotten all my CUDA code in place, and I've realized that I've hit a brick wall.  I've written a number of unit tests using googletest, which is a fairly nice unit testing framework, and my code passes all of my tests.  The problem is that it is limited, as all unit testing frameworks are, in that it can really only test for equality well.  This is a big problem for the code I'm writing because after reading through quite a bit of material on the web, I've come to realize that not all cards support IEEE 754 in quite the same way, or as completely, as they should.  That means that even if I had a golden test set to test against, I could fail the test, while still being reasonably accurate.  The only way I may be able to bypass this problem is if I can render my output to a file or to the screen directly, and then eyeball it.  Towards that end, I just downloaded the AGG library, which claims to use only portable C++ to render images to files.  I'm going to briefly look at it, and see if I can get it to do what I want it to do (make pretty pictures) in the amount of time I have left.  However, since this is a brand-new (to me) API, and since I STILL don't have a poster or video up, if I can't get it all working within an hour or less, I'm going to have to abandon it to get my presentation working.

Thursday, April 21, 2011

Dual Quaternions

I spent much of today hacking out code so that I can use dual quaternions, only to realize that there is every chance that they are significantly slower than using homogenous coordinates.  Why?  Because dual quaternions require an 8 x 1 column vector to represent, while a rotation and translation (same power as a dual quaternion) requires a 4 x 4 matrix; the important point is that the 4 x 4 matrix may be hardware accelerated on the GPU as mat4 is an OpenGL type.  The only other advantage that dual quaternions have is that they are relatively easy to blend (interpolate & extrapolate) over.  However, for my purposes, there won't be any blending being done; you need to rotate & translate by some fixed amount, and that's it. 

For the time being, I'm side-stepping the issue.  Originally, I had the ability to embed frames within other frames, forming a mesh of frames (along with embedding objects within a frame).  Although I still allow all the embedding to happen, I don't walk out to find all the embedded parts, which means that if it isn't embedded in the top-level frame, it isn't found.  This isn't ideal, but it will allow me to get to the heart of the problem more quickly.  Once I have some really good results, I'll revisit this problem, and see what can be done about it.  The more I look at it though, the more I think I'm going to have to go with homogeneous coordinates, and just be done with it.

Saturday, April 16, 2011

Computational Holography

I've been going over the math for Fourier optics carefully for several days now, and have come to realize that what I'm actually doing is computational holography. This may appear surprising at first, but if you think about what I'm trying to do, it'll make sense.  When you create a hologram, what you're doing is recording the interference patterns of light reflected off of an object.  The only difference between what I'm doing and what a holographer does is that I'm not interested in reconstructing the scene, I'm interested in the interference patterns to determine how much power is received at some particular point in space.  I've also come to realize that pure computational holography may be inadequate to fully describe a scene; none of the papers or sources I've read take reflection into account.  That said, I do have a plan to handle reflection, provided I can handle a few, relatively simple cases first.  Basically, I'll do what any good ray-tracing program would do, and let surfaces accumulate light that they re-radiate.  However, that will likely be outside of the scope of what I'm trying to do for my class; it's just something that I need to keep in mind for the future.

Here are some papers to take a look at:

Friday, April 8, 2011

Memory Management

This is an open reply to Patrick's comment, which was important enough that it really needs to be addressed sooner rather than later.

The architecture I'm putting together for the Fourier optics library allows a mesh of contexts to form, including cycles of contexts.  Patrick pointed out that if you try automatic memory management (garbage collection, reference counting, etc.) you can get into trouble when you have a cycle.  It can get MUCH worse if you include the fact that there may be multiple threads of control, multiple processes, and (worse) a distributed network of machines that have slow, unreliable connections to one another.  In short, we're talking about 'the cloud'... and all the problems that entails.  This is many, many PhD. dissertations in itself.  It is not a problem I intend to try to solve myself!

Instead, I'm going to require that users of the library be aware of the objects that they are allocating, and know when they should all be live, and when they are dead.  In short, I'm passing the buck to you, the end user.

The bit of good news is that every object in the framework has its own UUID; you can query any object for its UUID, and you will know that if two objects have the same UUID, then you're really talking about the same object.  In addition, since UUIDs are universally unique, it means that distributed garbage collection (garbage collection across a network of machines) becomes a real possibility.  I am not going to implement this, but if you have the urge to do so, go ahead.

A note about copying objects

Copying implies different objects.  That means that even if the objects are equivalent, they are two separate chunks of memory.  For this reason, I require that they have separate UUIDs.  This can become extremely confusing if anyone tries to implement proxy objects; after all, the proxy is a stand-in for the real object, so what does it mean to copy the proxy, or to delete it?  While this isn't a big deal if you only have one process, it becomes a much bigger problem if you want something like distributed objects that are shared across the network.  I have no idea how to handle this well.  Again, it's up to you, the end user to cook up something intelligent. 

Let me know if anyone has a good solution that works across multiple platforms; I'm not touching any of that!

Wednesday, April 6, 2011

Updated API

Bit by bit, I'm getting there with the API.  Here's what I've added:
The two pieces above imply that I'm using a hierarchy, where there is some kind of tree-based system; I'm not.  Two contexts, A & B, can actually have each other as children; that is, A can be a child of B, and B can be a child of A.  There are a couple of reasons for permitting this:
  • Cameras/sensors/antennas/robots can all be placed at the origin of their own reference frame.  This means that if there are 3 robots, A, B, and C, A might want to know what C can see from A's point of view, and B might want to do the same thing from B's point of view.  If we calculate C's output across the whole world from within it's own point of view, and then translate this into A & B's point of view, we don't need to waste time/energy in recalculating the whole mess multiple times in a row.
  • If we have some relatively enclosed, and many-times-repeated system, it makes more sense to calculate what it radiates/receives once, and then copy & translate it to new positions.  For example, a really weird illumination source, like a candle within a lantern, may take a long time to calculate out, but once the light leaves the source, it is no longer affected by the glass, plastic, lava lamp fluid, etc. of the lantern.  Save all the illumination information, and replace it with a sphere that radiates differently depending on what point of the sphere that is visible.  You still have to trace the rays out of the surface of the sphere, but you don't have to trace how it bounces around inside of the lantern to get out. The downside to this is that every lantern will be a copy of each other, but depending on what you're trying to model, this might be acceptable.
Also note that despite what you might think, it is extremely easy to break cycles; breadth or depth-first search will handle it without any trouble.

I've also updated the illumination API slightly; right now, the only things you can change are the total energy and the frequency of the illumination.  In the future, I'm hoping to add the ability to specify ranges of frequencies, but that will require a great deal of time, and a great deal more thought on how to do it well.  Note that the illumination specifies something akin to a single photon or light packet.  That means that one illumination source will be generating many, many of these photon-like objects at one time.  I'm not sure if that is the best way of doing things yet, but its a start.

Finally, and perhaps most oddly, I've decided to use the GNU MP Library to pass in all scalar information as rational numbers.  Internally, I use floats or doubles, but by having the interface use rationals, I leave open the possibility of arbitrary precision sometime in the future.  At the moment, my plan is use the rationals to calculate combined rotations (since I can have an arbitrarily deep ancestor/decedent chain, accuracy can be a problem), and then convert the rotations into floats for use on the GPU.   My plan is that if you have a higher precision card (one that uses doubles instead of floats), the precision can automatically scale up.  In short, no built-in limits to my API!

Monday, April 4, 2011

Interest

Well, my project is approximately 12 days old, but when I mentioned what I was working on at an all-day meeting on Friday, I got immediate interest from 2 different groups, along the lines of 'why isn't it done yet?'.  I knew that there was a need for fast, accurate radio propagation software, but I didn't realize HOW MUCH need there was! 

On to more prosaic matters.

I spent a little while finding a good UUID generation library, which I have.  I'm using the OSSP UUID library for all the reasons I outlined in the wiki (which I updated), and more reasons besides.  I still haven't decided if I should wrap their functions to make it easy to port to other platforms or not, but I think that for the time being I'm not going to bother wrapping them.  I don't have enough time to do so at the moment, and the UUIDs will never be exposed in the public API, so in the worst case, someone is going to foolishly try to extract a UUID, and then get burned when the internals all suddenly change.  I'm fine with that, private APIs should never be linked against, it breaks modularity in a big way.

I also slightly reorganized my code so that everything that is supposed to be private will live in the Private directory from now on.  I can't think of any way to make it more clear that you shouldn't link to stuff down there than that, but I know someone is going to try anyways.

Finally, I think I'm going to have to go to a client/server architecture; I need to cache a lot of stuff on the backend, its the only way to get the necessary performance.  I was originally thinking I could do everything in C, but I need fast, well-written containers.  I can use uthash, but I know the STL is supported absolutely everywhere, which means C++.  I don't like it, but it's practical, and it lets me use Boost and Thrust, if I really need them.  I'm going to keep the public API in C though; it makes getting SWIG working easier, if I decide to add it in the future.  Thoughts?

Sunday, March 27, 2011

CMake

Anybody a CMake guru that can have me rapid-fire dumb questions their way? I've got ideas on how I'm going to make my API work, but now I need to really nail down the build system. I have a number of external dependencies that I can either require the user to download and install on their own, or I can find some way of grabbing all of them for the user, and putting it together into a package. The ideal method would involve my telling CMake what those external dependencies are and where to find them, and let it handle the chore of making sure that they are properly installed on whatever system the user has, offer to download and install them if they aren't available, or fail gracefully if they aren't. Note that I really don't want to be responsible for updates to external dependencies; as far as is possible, I want them to 'just work'. That will let me concentrate on coding, rather than packaging.

Tuesday, March 22, 2011

Current plan

After reading through various papers, and talking to Dr. Jaggard, I've realized that I can't tackle everything there is in the world of optics at one time.  I have to pick and choose my fights.  For that reason, I'm going to try to implement the following capabilities in the following order:
  1. Unpolarized light, in a world of perfect insulators and perfect conductors.  This is probably the simplest physical model of light, and is due to the fact that light is just  Electromagnetic radiation that we can see.  Since it has an electric field, that means that as soon as a wave hits a superconducting surface, it is absorbed.  That means that none of the objects in this model will have any transparency what-so-ever, except for the air, which I'm going to model as a perfect vacuum.
  2. Polarized light.  The next simplest thing to work on is polarized light; in this case, the most general model is elliptically polarized light, which can be analyzed using Jones Calculus.  The handy part about this is that if you squint a little, Jones Calculus starts to look like a rotation matrix that a first-year student has messed up somehow (it doesn't have to be skew symmetric).  Now, the reason why I need to get polarized light working quickly is because my primary interest in Fourier optics is for modeling radio propagation, where my antennae are all dipoles.  Two dipoles that both point in the same direction will be able to transmit to one another, but two dipoles that are 'crossed' will not be able to transmit much energy to each other (this is a simplified model, if you are a radio engineer reading this, don't shoot me! :)  OTOH, if you want to improve my model, feel free to comment here, or better yet, enter in an issue at the issues page.)  Note that as long as the world is full of perfect conductors or insulators, the materials in the world will not be able to polarize light; the light is created in a polarized state, and all we need to worry about is if the receiving antennae are in the right orientation to receive the incoming information.
  3. Hybrid raytracing/Fourier Optics model.  The problem with performing the Fourier transform across all of space is that it is expensive to do, with no guarantee that you'll really use all of the calculation that you so expensively performed.  In fact, once you're past about 10 wavelengths from where your photon packet is, you no longer really need to think about the wave nature of light; the ray cleanly missed the object.  One solution to that would be 'fat' rays; project a ray just as you would for ray tracing, but make each ray a solid rectilinear beam.  Do the FFT on the volume that the beam travels through, instead of all of space.  This will need to be optimized so that when the beams overlap in space, you don't waste time recomputing the FFT over that region, but since I know what the viewing frustrum is like, and how big my beams are, I can easily calculate the region of space near the camera where they will overlap (yes, you can develop edge cases where they don't overlap, but my suspicion is that the time wasted in determining if you overlap or not will be greater than just doing the FFT and throwing away anything close-by that you don't use)
  4. Imperfect conductors At long last, we arrive at a more realistic materials model!  The index of refraction can take on any value, because the permittivity and permeability of objects can vary.  Note that at this point, things get interesting (meaning hard), because both the permittivity and the permeability are rank 2 tensors.  This means that the permittivity and permeability of some small volume depends on where it is (a volume within a chunk of glass is different from a volume in air), and on the orientation of the volume (this affects how the material polarizes light that passes through it).  In addition, the permittivity and permeability affect how quickly light is extinguished within the material; if you think about what THAT means, it means that both must be frequency dependent, otherwise we'd never have colors.  Finally, you also have reflection and refraction.  Thus, modeling this is the most complex problem to date. 
The hard part with all of this is determining how to correctly model light, and to correctly model the permittivity and permeability of the material.  In addition, my need to perform ray-tracing directly affects where my code ends up in the OpenGL pipeline.  I can do all of point 1 in CUDA, and then emit the answer straight into a fragment buffer, and let it do all the heavy lifting of displaying the output.  However, the fragments won't have any information on how the rays went through space; I need have intersection information, as well as be able to generate new rays when there is reflection or refraction.  That suggests that I need to tap into the pipeline after the vertices are known, which means after the vertex shader stage (or geometry shader stage, if there is one) is done.  Taking everything together, my current plan is as follows:
  1. Define a world coordinate system from which all orientations are derived.  For the simple reason that I'm most familiar with it, I'm going to define a right-handed coordinate system where 'z' is 'up', 'x' is to the 'right', and 'y' is 'forwards'.  Along with this, I define the angle θ and ϕ, where θ ranges over [0,π] and 0 is parallel to the z-axis, and pointing 'up', and where φ ranges over [0,2π), and 0 is along the x-axis, pointing right.  ϕ = π/2 points along the y-axis, pointing 'forwards'.  
  2. Define every ray of light has 4 vector (intensity, frequency, θ, ϕ).  The downside of this model is that if a ray has multiple colors, then I will have to fire the same ray through the same medium multiple times.  Unfortunately, I can't think of a better method of handling this.  I will initially treat all rays as having an intensity of 1, a single frequency, and linearly polarized in the vertical direction.
  3. Model each cubic volume as having two rank 2 tensors, one for the permittivity, and one for the permeability.  Initially, these will be set to either the permittivity and permeability of free space, or to permittivity and permeability of perfect conductor.
I still have a lot more to think about, especially the ray-tracing part.  Too much work to do!

Talked to Dr. Jaggard

I talked to Dr. Jaggard yesterday about Fourier optics.  He gave me some suggestions, including looking at Classical Electrodynamics by J.D. Jackson, and the journals Radio Science and IEEE Transactions on Antennas and Propagation.  So this morning I checked out Classical Electrodynamics, and quickly read the chapter on propagation.  Unfortunately, I don't think that that chapter will be of much use; it breaks up the regions of radio propagation into very near, intermediate, and far, and then makes approximations for each of the regions so that you can solve them by hand.  I'm not solving anything by hand, so I don't need those approximations.  So at this point, I need to re-derive the equations.  This isn't particularly difficult as I'm going to be using many different books as guides, but it does increase my workload.

I'm also trying to figure out what translation and rotation in Fourier space really means.  Ideally, I'd find a general solution for an object, and then do an affine transform on its Fourier transform to get any translated, rotated, or scaled version of it that I want.  I don't know if this is mathematically possible, or it is better to do a pure FFT each time the environment changes.

I'm also trying to see if it is possible to perform limited Fourier transforms; that is, since I know the wavelengths I'm interested in (which is always some range), is it possible to reduce the amount of calculation necessary, so instead of integrating over all of space, I only integrate over those portions that are near the wavelength.  This may not be possible simply because of resonance; if I have a cavity (like the inside of a hallway), which happens to be a multiple of the wavelength, then there will be bright and dark spots.  If I don't include the higher spatial frequencies,  I risk making the hallways look 'blobby' rather than rectilinear, which means that resonance won't occur.

Most importantly, I still don't know if I can take one 3D spatial Fourier transform, and then convolve my cameras/lights whenever and wherever I want.  I think I can, but I can't state that for certain.

Monday, March 21, 2011

Design process

Designing a new API is one of the hardest things I have ever done.  Whatever you come up with, you're stuck with it forever... at least, once you hit 1.0.  Fortunately for me, I'm not anywhere near there yet, so I can make modifications as I see fit.  The downside is that I can see many, many different ways of designing the code, each of which has its own good and bad points.  Here are some of the problems I'm dealing with:
  • Choice of language - When choosing a language to write code in, you have to choose between performance and convenience, availability and modernity, etc.  Modern languages like OCaml and Python are powerful, and have sufficiently strong semantics that once you get your code compiling, you've probably solved most of your bugs already.  Unfortunately, their performance is pretty poor, and there is no guarantee that you'll find compilers for various platforms.  That forces your hand.  In my case, I had to choose between C and C++.  I've decided to use C, specifically C99, because it is widely supported, and because many of the higher level languages have the ability to bind to a C API.
  • Maintenance of state information - Next is how to maintain state information across function calls.  Older libraries maintained state information in an user-inaccessible global state variable or variables.  The problem with this approach is that more and more programs are multi-threaded; if each thread requires its own state information, you need to develop a method of doing so that is efficient, yet invisible to the user.  This becomes MUCH more problematic if a user wants different threads to share a context, or to have multiple contexts, or has multiple GPUs.  For that reason, my API has an opaque context pointer type, that points to a context that is allocated on the heap.  You can have an arbitrary number of contexts, and share them in any manner you wish.
  • Allocating & deallocating memory - malloc() and free() may be the C standards, but they aren't necessarily the most useful calls for an API, especially one that is designed to take advantage of a GPU.  Towards that end, I've designed my own memory allocation and deallocation routines.  They are designed to make debugging simpler, and to ensure that the necessary memory alignments to make use of the GPU are adhered to.  
  • Primitives - This is the part of the API that I'm having the most trouble with at the moment.  I have several different options, none of which are particularly good:
    • NURBS + material specification - This is by far the most general method I can think of.  Every object would be specified as a NURB surface, along with some information about the material it is made of (index of refraction, polarization information, transmittance and reflectance information for different optical frequencies, etc.).  With this information, the library performs the full simulation (Fourier transform, convolution, inverse transform) to determine what light does when it travels through the material.  There are a number of problems with this method though:
      • NURBS don't handle sharp corners well; you either have discontinuities, which can't be transformed, or you have very curved surfaces, but no sharp corners.
      • I'd have to figure out an analytic solution to a NURB surface.   While this can be done, I'd really rather not.
    • Library of primitive objects - I could define a small set of primitive objects, and require the end user to develop more complex objects from these primitives.  While this is conceptually simple (I get to choose the primitives, so I'll choose ones where the Fourier transform is easy), it means that the end user needs to build their objects from the library I provide.  The biggest problem with this is the interfaces between objects; regardless of how hard a user tries, there will be gaps between objects.  Those gaps will lead to numerical instabilities.  Worse, some objects require very large numbers of primitives; for example, how would you model a light bulb?  To a first order, it is a sphere within a sphere, but the neck is not a sphere, a cylinder, or anything else.  The only way to model it is via successive approximation, which can easily run out of memory.
    • Voxels + 3D FFT - A completely different approach from the analytical one is to use the discrete Fast Fourier Transform across a 3D voxel array.  The good part about this approach is that regardless of shape of the primitive, you can always find a discretized version of it to perform the FFT on.  Even better, if you tap into the OpenGL pipeline at the appropriate location, you can pull out a 3D texture, which means that the hardware did all the heavy lifting for you.  You also get all of OpenGL for free; you're tapping into the pipeline after it's been processed.  But on the downside, the smallest object you can handle is one voxel in size; the problem with this is that the wave nature of light is most important for objects that are on the order of the wavelength of light.  Visible wavelengths are on the order of 380nm to 780nm (http://en.wikipedia.org/wiki/Visible_light), which, thanks to the Nyquist Theorem means that our voxels need to be on the order of 190nm on a side.  Assuming a laughably small 1 bit/voxel storage requirement, that means 1 cubic meter will require ~2^64 bytes of memory.  Clearly, a pure voxel approach won't work.
  • Rotation/translation - Beyond the problems outlined above, I don't know if the mathematics remain unchanged when an object is rotated or translated.  Thus, precalculating what a sphere looks like may be pointless simply because I can't move the sphere to a different location in space, and expect it to remain the same mathematically.
At the moment, I have no idea what will be the best method.  If anyone has any ideas, post in the comments!