Search This Blog

Monday, March 21, 2011

Design process

Designing a new API is one of the hardest things I have ever done.  Whatever you come up with, you're stuck with it forever... at least, once you hit 1.0.  Fortunately for me, I'm not anywhere near there yet, so I can make modifications as I see fit.  The downside is that I can see many, many different ways of designing the code, each of which has its own good and bad points.  Here are some of the problems I'm dealing with:
  • Choice of language - When choosing a language to write code in, you have to choose between performance and convenience, availability and modernity, etc.  Modern languages like OCaml and Python are powerful, and have sufficiently strong semantics that once you get your code compiling, you've probably solved most of your bugs already.  Unfortunately, their performance is pretty poor, and there is no guarantee that you'll find compilers for various platforms.  That forces your hand.  In my case, I had to choose between C and C++.  I've decided to use C, specifically C99, because it is widely supported, and because many of the higher level languages have the ability to bind to a C API.
  • Maintenance of state information - Next is how to maintain state information across function calls.  Older libraries maintained state information in an user-inaccessible global state variable or variables.  The problem with this approach is that more and more programs are multi-threaded; if each thread requires its own state information, you need to develop a method of doing so that is efficient, yet invisible to the user.  This becomes MUCH more problematic if a user wants different threads to share a context, or to have multiple contexts, or has multiple GPUs.  For that reason, my API has an opaque context pointer type, that points to a context that is allocated on the heap.  You can have an arbitrary number of contexts, and share them in any manner you wish.
  • Allocating & deallocating memory - malloc() and free() may be the C standards, but they aren't necessarily the most useful calls for an API, especially one that is designed to take advantage of a GPU.  Towards that end, I've designed my own memory allocation and deallocation routines.  They are designed to make debugging simpler, and to ensure that the necessary memory alignments to make use of the GPU are adhered to.  
  • Primitives - This is the part of the API that I'm having the most trouble with at the moment.  I have several different options, none of which are particularly good:
    • NURBS + material specification - This is by far the most general method I can think of.  Every object would be specified as a NURB surface, along with some information about the material it is made of (index of refraction, polarization information, transmittance and reflectance information for different optical frequencies, etc.).  With this information, the library performs the full simulation (Fourier transform, convolution, inverse transform) to determine what light does when it travels through the material.  There are a number of problems with this method though:
      • NURBS don't handle sharp corners well; you either have discontinuities, which can't be transformed, or you have very curved surfaces, but no sharp corners.
      • I'd have to figure out an analytic solution to a NURB surface.   While this can be done, I'd really rather not.
    • Library of primitive objects - I could define a small set of primitive objects, and require the end user to develop more complex objects from these primitives.  While this is conceptually simple (I get to choose the primitives, so I'll choose ones where the Fourier transform is easy), it means that the end user needs to build their objects from the library I provide.  The biggest problem with this is the interfaces between objects; regardless of how hard a user tries, there will be gaps between objects.  Those gaps will lead to numerical instabilities.  Worse, some objects require very large numbers of primitives; for example, how would you model a light bulb?  To a first order, it is a sphere within a sphere, but the neck is not a sphere, a cylinder, or anything else.  The only way to model it is via successive approximation, which can easily run out of memory.
    • Voxels + 3D FFT - A completely different approach from the analytical one is to use the discrete Fast Fourier Transform across a 3D voxel array.  The good part about this approach is that regardless of shape of the primitive, you can always find a discretized version of it to perform the FFT on.  Even better, if you tap into the OpenGL pipeline at the appropriate location, you can pull out a 3D texture, which means that the hardware did all the heavy lifting for you.  You also get all of OpenGL for free; you're tapping into the pipeline after it's been processed.  But on the downside, the smallest object you can handle is one voxel in size; the problem with this is that the wave nature of light is most important for objects that are on the order of the wavelength of light.  Visible wavelengths are on the order of 380nm to 780nm (, which, thanks to the Nyquist Theorem means that our voxels need to be on the order of 190nm on a side.  Assuming a laughably small 1 bit/voxel storage requirement, that means 1 cubic meter will require ~2^64 bytes of memory.  Clearly, a pure voxel approach won't work.
  • Rotation/translation - Beyond the problems outlined above, I don't know if the mathematics remain unchanged when an object is rotated or translated.  Thus, precalculating what a sphere looks like may be pointless simply because I can't move the sphere to a different location in space, and expect it to remain the same mathematically.
At the moment, I have no idea what will be the best method.  If anyone has any ideas, post in the comments!

No comments:

Post a Comment