*"Light Makes Right"*

January 15, 1988

Volume 1, Number 2

Compiled by

All contents are copyright (c) 1988, all rights reserved by the individual authors

Archive locations: anonymous FTP at
ftp://ftp-graphics.stanford.edu/pub/Graphics/RTNews/,

wuarchive.wustl.edu:/graphics/graphics/RTNews, and many others.

You may also want to check out the Ray Tracing News issue guide and the ray tracing FAQ.

- Introduction
- Subdivision and CSG, by Erik Jansen
- Bounded Ray Tracing, by Masataka Ohta
- Spline Surface Rendering, and What's Wrong with Octrees, by Eric Haines
- Top Ten Hit Parade of Computer Graphics Books, by Eric Haines
- About Normal Vector Transform, Octrees, by Olin Lathrop and Eric Haines
- Subspaces and Simulated Annealing, by Jim Arvo

Speaking of whom, Andrew Glassner would like contributions to "The Ray Tracing News", hardcopy edition. He hopes to publish another one soon, but says it may be the last if no one sends him any more material. So, if you have an interesting technical memo or other short (less than 5 pages) piece you'd like to share with the rest of us, please write him (see the mailing list).

All for now,

Eric

back to contents

Here is the abstract:

Solid modelling with faceted primitives

F.W. Jansen

Computer Aided Design and Computer Graphics techniques are valuable tools in industrial design for the design and visualisation of objects. For the internal representation of the geometry of objects, several geometric modelling schemes are used. One approach, Constructive Solid Geometry (CSG), models objects as combinations of primitive solids. The subject of research in this project is a CSG representation where the surfaces of the primitives are approximated with flat surface elements (facets). Techniques to improve the efficiency of the display of models with a large number of these surface elements have been developed.

Two approaches have been taken. The first approach is based on the use of additional data structures to enhance the processing, sorting and search of these surface elements. Further, a method is presented to store intermediate results of the logical computations needed for the processing of CSG representations. These methods are applied to a CSG polygon modelling system.

The second approach aims at the development of algorithms for multi-processor systems and VLSI-based display systems. The central method is a CSG depth-buffer algorithm. A tree traversal method is introduced that combines several techniques to reduce the processing and memory use. The methods have been applied to a CSG halfspace modelling system.

Keywords: computer graphics, geometric modelling, solid modelling, Constructive Solid Geometry (CSG), ray tracing algorithm, depth-buffer algorithm, z-buffer algorithm, list-priority algorithm, depth-priority algorithm, spatial subdivision, CSG classification, CSG coherence.

____

The following subjects are also included: adaptive subdivision, crack removal.

You can send this information to all. I will read the discussion more carefully and will comment on it later.

Erik Jansen

back to contents

The discussions so far is very interesting one and I have several comments.

As I am charged for foreign mail (about $1 for 1K bytes, both incoming and out going), it costs considerablely to mail everyone on the list separately. So, I would like you to re-distribute my transpacific mail to everyone else.

Masataka Ohta

My comment on the flatness criteria with reflections follows: -----------------------------

Though I don't like subdividing patches into polygons for ray tracing (it's incoherent and, for example, CSG objects are difficult to render), good "flatness criteria" even with reflection, refraction or shadowing can be given using ray bound tracing.

The basic idea is simple. Ray bound is a combination of two bounds: a bound of ray origins and a bound of ray directions. A efficient bound can be formed by using a sphere for bounding ray origins and using a circle (on a unit sphere, i.e. using spherical geometry) for ray directions.

To begin with, bound a set of all rays which originates from each pixel. Flatness of a patch for the first generation ray should be computed against this ray bound, which is equivalent to measure flatness with perspective transformation, because rays are bounded by a pixel-sized cone.

As for the second generation rays, they can be bounded by a certain ray bound which can be calculated form the first generation ray bound. And those ray bounds should be used for the flatness check.

For those further interested in ray bound tracing, I will physically mail my paper titled "Bounded ray tracing for perfect and efficient anti-aliasing".

back to contents

__________

[Note: this first problem is kinda boring if you've never implemented an octree subdivision scheme before. Skip on to problem # 2, which I think is more important].

The first problem is: How do I cleverly chose octree bounds? This problem was first mentioned to me by Mike Kaplan, which I did not think about much until I suddenly noticed that all available memory was getting gobbled by certain polygonalized splines. The problem is that there are two parameters which are commonly used to end the further subdivision of an octree cube into its eight component "cubies".

One is a maximum number of primitives per octree cube. To make the octree in the first place we have a bounding cube which contains the environment. If the cube has more than a certain number of primitives in it, then octree subdivision takes place. The octree cubies formed are then each treated in a like fashion, subdividing until all leaf cubies contain less than or equal to the number of primitives. The second parameter is the maximum tree depth, which is the number of levels beyond which we will not subdivide cubes. This parameter generally has precedence over the first parameter, i.e. if the maximum level has been reached but the maximum number of primitives is still exceeded, subdivision will nonetheless halt.

The trick is that you have to pay close attention to both parameters. Originally I set these parameters to some reasonable numbers: 6 primitives and 8 levels being my maximums. What I found is that some objects would have very deep octrees, all the way down to level 8, even though their number of primitives was low. For example, an object with 64 patches would still have some leaf nodes down at level 8 which had 7+ primitives in them. I was pretty surprised by this problem.

My solution for spline surfaces was to keep the maximum number of primitives at 6 and use another parameter to determine the maximum level. I use the formula:

max level = round_up [ ln( primitives / K ) / ln( S ) ]

where K is the maximum number of primitives (i.e. 6) and S was a prediction of how much an octree subdivision would cut down the number of primitives in an octree. For example, in an environment consisting of a set of randomly distributed points, one would expect that when the octree cube containing these points was subdivided into eight octree cubies, each octree cubie would have about 1/8th of the points inside it. For a spline surface I reasoned that about four of the octree cubies might have some part of the surface in them, which would give an S=4 (Note that the largest, original octree must have at least four cubies filled by the surface. However, this is not necessarily true for suceedingly smaller cubies). Another factor which had to be taken into account was that there would also be some overlap: some primitives would appear in two or more cubies. So, as a final reasonable guess I chose S=3.5 . This seems to work fairly well in practice, though further testing would be very worthwhile.

Coming up with some optimal way to chose a maximum octree depth still seems to be an open question. Further study on how various environments actually fill space would be worthwhile: how many octree nodes really are filled on the average for each subdivision? More pragmatically, how do we determine the best maximum depth for ray-tracing an environment? The problem with not limiting the maximum level is primarily one of memory. If the octree grows without reasonable bounds a simple scene could use all available memory. Also, a large number of unnecessary octree nodes results in additional access time, either through having to search through the octree or through having extraneous objects in the hashing table.

A more intelligent approach might be to do adaptive subdivision: subdivide an octree cube as usual, then see how many fewer primitives there are in each cubie. If some cube has more than some percentage of primitives in it, the subdivision could be deemed useless and so subdivision would end at this point. If anyone knows a master's candidate looking for a project, this whole question of when it is profitable to subdivide might make a worthwhile topic. Judging from the interest in octrees by ray tracing researchers at last year's roundtable, I think this will become more and more important as time goes on.

_____________

The second problem with octrees: I decided to go with octrees for spline surfaces only because these objects would have fairly localized and even distribution of primitives (i.e. quadrilateral patches). I feel that octree efficiency techniques are probably horrible for ray tracing in general.

For example, imagine you have a football stadium made of, say, 5K primitives. Sitting on a goal line is a shiny polygonalized teapot of 5K quadrilaterals (note that the teapot is teapot sized compared to the stadium). You fill the frame with the teapot for a ray trace, hoping to get some nice reflections of the stadium on its surface.

If you use an octree for this scene, you'll run into an interesting problem. The teapot is, say, a foot long. The stadium is 200 yards long. So, the teapot is going to be only 1/600th the size of the stadium. Each octree subdivision creates 8 cubies which are each half the length of the parent cube. You could well subdivide down to 9 levels (with that 9th level cubie having a length of 1/512th of the stadium length: about 14 inches) of octrees and still have the whole teapot inside one octree cube, still undivided. If you stopped at this 9th level of subdivision, your ray trace would take forever. Why? Because whenever a ray would enter the octree cubie containing the teapot (which most of the rays from your eye would do, along with all those reflection and shadow rays), the cubie would contain a list of the 5K teapot polygons. Each of these polygons would have to be tested against the ray, since there is no additional efficiency structure to help you out. In this case the octree has been a total failure.

Now, you may be in a position where you know that your environments will be well behaved: you're ray tracing some specific object and the surrounding environment is limited in size. However, the designer who is attempting to create a system which can respond to any user's modeling requests is still confronted by this problem. Further subdivision beyond level nine down to level eighteen may solve the problem in this case. But I can always come up with a worse pathological case. Some realistic examples are an animation zooming in on a texture mapped earth into the Caltech campus: when you're on the campus the sphere which represents the earth would create a huge octree node, and the campus would easily fall within one octree cubie. Or a user simply wants to have a realistic sun, and places a spherical light source 93 million miles away from the scene being rendered. Ridiculous? Well, many times I find that I will place positional light sources quite some distance away from a scene, since I don't really care how far the light is, but just the direction the light is coming from. If a primitive is associated with that light source, the octree suddenly gets huge.

Solutions? Mine is simply to avoid the octree altogether and use Goldsmith's automatic bounding volume generation algorithm (IEEE CG&A, May 1987). However, I hate to give up all that power of the octree so easily. So, my question: has anyone found a good way around this problem? One method might be to do octree subdivision down to a certain level, then consider all leaf cubies that have more than the specified number of primitives in their lists as "problem cubies". For this list of primitives we perform Goldsmith's algorithm to get a nice bounding volume hierarchy. This method reminds me of the SIGGRAPH 87 paper by John Snyder and Alan Barr, "Ray Tracing Complex Models Containing Surface Tesselations". Their paper uses SEADS on the tesselated primitives and hierarchy on these instanced SEADS boxes to get around memory constraints, while my idea is to use the octree for the total environment so that the quick cutoff feature of the octree can be used (i.e. if any primitive in an octree cubie is intersected, then ray trace testing is done, versus having to test the whole environment's hierarchy against the ray). Using bounding volume hierarchy locally then gets rid of the pathological cases for the octree.

However, I tend to think the above just is not worthwhile. It solves the pathological cases, but I think that automatic bounding volume hierarchy (let's call it ABVH) methods will be found to be comparable in speed to octrees in many cases. I think I can justify that assertion, but first I would like to get your opinions about this problem.

back to contents

Without further ado, here are my top ten book recommendations. Most should be well known to you all, and so are listed mostly as a kernel of core books I consider useful. I look forward to your additions!

_The Elements of Programming Style, 2nd Edition_, Brian W. Kernighan, P.J. Plauger, 168 pages, Bell Telephone Laboratories Inc, 1978.

All programmers should read this book. It is truly an "Elements of Style" for programmers. Examples of bad coding style are taken from other textbooks, corrected, and discussed. Wonderful and pithy.

_Fundamentals of Interactive Computer Graphics_, James D. Foley, A. Van Dam, 664 pages, Addison-Wesley Inc, 1982.

A classic, covering just about everything once over lightly.

_Principles of Interactive Computer Graphics, 2nd Edition_, William M. Newman, R.F. Sproull, 541 pages, McGraw-Hill Inc, 1979.

The other classic. It's older (e.g. ray-tracing did not exist at this point), but gives another perspective on various algorithms.

_Mathematical Elements for Computer Graphics_, David F. Rogers, J.A. Adams, 239 pages, McGraw-Hill Inc, 1976.

An oldie but goodie, its major thrust is a thorough coverage of 2D and 3D transformations, along with some basics on spline curves and surfaces.

_Procedural Elements for Computer Graphics_, David F. Rogers, 433 pages, McGraw-Hill Inc, 1985.

For information on how to actually implement a wide variety of graphics algorithms, from Bresenham's line drawer on up through ray-tracing, this is the best book I know. However, for complicated algorithms I would recommend also reading the original papers.

_Numerical Recipes_, William H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, 818 pages, Cambridge University Press, 1986.

Chock-full of information on numerical algorithms, including code in FORTRAN and PASCAL (no "C", unfortunately). The best part of this book is that they give good advice on what methods are appropriate for different types of problems.

_A First Course in Numerical Analysis, 2nd Edition_, Anthony Ralston, P. Rabinowitz, 556 pages, McGraw-Hill Inc, 1978.

Tom Duff's recommendation says it best: "This book is SO GOOD [<-these words should be printed in italics] that some colleges refuse to use it as a text because of the difficulty of finding exam questions that are not answered in the book". It covers material in depth which _Numerical Recipes_ glosses over.

_C: A Reference Manual_, Samuel P. Harbison, G.L. Steele Jr., 352 pages, Prentice-Hall Inc, 1984.

A comprehensive and comprehensible manual on "C".

_The Mythical Man-Month_, Frederick P. Brooks Jr, 195 pages, Addison-Wesley Inc, 1982.

A classic on the pitfalls of managing software projects, especially large ones. A great book for beginning to learn how to schedule resources and make good predictions of when software really is going to be finished.

_Programming Pearls_, Jon Bentley, 195 pages, Bell Telephone Laboratories Inc, 1986.

Though directed more towards systems and business programmers, there are a lot of clever coding techniques to be learnt from this book. Also, it's just plain fun reading.

As an added bonus, here's one more that I could not resist:

_Patterns in Nature_, Peter S. Stevens, 240 pages, Little, Brown and Co. Inc, 1974.

The thesis is that simple patterns recur again and again in nature and for good reasons. A quick read with wonderful photographs (my favorite is the comparison of a turtle shell with a collection of bubbles forming a similar shape). Quite a few graphics researchers have used this book for inspiration in simulating natural processes.

back to contents

*
previous discussion of problem*

__________

About the normal vector transform:

Eric, you are absolutely right. I also ran into this when some of my squashed objects just didn't look right, about 4 years ago. I would just like to offer a slightly different way of looking at the same thing. I find I have difficulty with mathematical concepts unless I can attatch some sort of physical significance to them. (I think of a 3x4 transformation matrix as three basis vectors and a displacement vector instead of an amorphous pile of 12 numbers.)

My first attack at finding a transformed normal was to find two non-paralell surface vectors at the point in question. These could be transformed regularly and the transformed normal would be their cross product. This certainly works, but is computationally slow. It seemed clear that there should exist some 3x3 matrix that was the total transform the normal vector really went thru. To simplify the thought experiment, what if the original normal vector was exactly along the X axis? Well, the unit surface vectors would be the Y and Z axis vectors. When these are sent thru the regular 3x3 transformation matrix, they become the Y and Z basis vectors of that matrix. The final resulting normal vector is therefore the cross product of the Y and Z basis vectors of the regular matrix. This is then what the X basis vector of the normal vector transformation matrix should be. In general, a basis vector in the normal vector transformation matrix is the cross product of the other two basis vectors of the regular transformation matrix. I wasn't until about a year later that I realized that this resulting matrix was the inverse transpose of the regular one.

This derivation results in exactly the same matrix that Eric was talking about, but leaves me with more physical understanding of what it represents.

Now for a question: It has always bothered me that this matrix trashes the vector magnitude. This usually implies re-unitizing the transformed normal vector in practise. Does anyone avoid this step? I don't want to do any more SQRTs than necessary. You can assume that the original normal vector was of unit length, but that the result also needs to be.

__________

About octrees:

1) I don't use Andrew's hashing scheme either. I transform the ray so that my octree always lives in the (0,0,0) to (1,1,1) cube. To find the voxel containing any one point, I first convert the coordinates to 24 bit integers. The octree now sits in the 0 to 2**23 cube. Picking off the most significant address bit for each coordinate yields a 3 bit number. This is used to select one of 8 voxels at the top level. Now pick off the next address bit down and chose the next level of subordinate voxel, etc, until you hit a leaf node. This process is LOGn, and is very quick in practise. Finding a leaf voxel given an integer coordinate seems to consume about 2.5% of the time for most images. I store direct pointers to subordinate voxels directly in the parent voxel data block. In fact, this is the ONLY way I have of finding all but the top voxel.

2) Choosing subdivision criteria: First, the biggest win is to subdivide on the fly. Never subdivide anything until you find there is a demand for it. My current subdivision criteria in order of precidence (#1 overrides #2) are:

1) Do not subdivide if hit subdivision generation limit. This is the same as what Eric talked about. I think everyone does this.

2) Do not subdivide if voxel is empty.

3) Subdivide if voxel contains more than one object.

4) Do not subdivide if less than N rays passed thru this voxel, but did not hit anything. Currently, N is set to 4.

5) Subdivide if M*K < N. Where M is the number of rays that passed thru this voxel that DID hit something, and K is a parameter you chose. Currently, K is set to 2, but I suspect it should be higher. This step seeks to avoid subdividing a voxel that may be large, but has a good history of producing real intersections anyway. Keep in mind that for every ray that did hit something, there are probably light source rays that did not hit anything. (The shader avoids launching light rays if the surface is facing away from the light source.) This can distort the statistics, and make a voxel appear less "tight" than it really is, hence the need for larger values of K.

6) Subdivide.

Again, the most important point is lazy evaluation of the octree. The above rules are only applied when a ray passes thru a leaf node voxel. Before any rays are cast, my octree is exactly one leaf node containing all the objects.

3) First solution to teapot in stadium: This really cries out for nested objects. Jim Arvo, Dave Kirk, and I submitted a paper last year on "The Ray Tracing Kernel" which discussed applying object oriented programming to designing a ray tracer. Jim just told me he is going to reply about this in detail so I will make this real quick. Basically, objects are only defined implicitly by the results of various standard operations they must be able to perform, like "intersect yourself with this ray". The caller has no information HOW this is done. An object can therefore be an "aggregate" object which really returns the result of intersecting the ray with all its subordinate objects. this allows for easily and elegantly mixing storage techniques (octrees, linear space, 5D structures, etc.) in the same scene. More on this from JIM.

4) Second solution to teapot in stadium: I didn't understand why an octree wouldn't work well here anyway. Suppose the teapot is completely enclosed in a level 8 voxel. That would only "waste" 8x8=64 voxels in getting down to the space you would have chosen for just the teapot alone. Reflection rays actually hitting the rest of the stadium would be very sparse, so go ahead and crank up the max subdivision limit. Am I missing something?

__________________________________

(This is a reply to Olin Lathrop. Summary: "well, maybe the octree is not so bad after all...").

From: Eric Haines

Olin Lathrop writes:

> To simplify the thought experiment, what if the original normal vector was exactly

> along the X axis? Well, the unit surface vectors would be the Y and Z axis

> vectors. When these are sent thru the regular 3x3 transformation matrix,

> they become the Y and Z basis vectors of that matrix. The final resulting

> normal vector is therefore the cross product of the Y and Z basis vectors of the

> regular matrix. This is then what the X basis vector of the normal vector

> transformation matrix should be. In general, a basis vector in the normal

> vector transformation matrix is the cross product of the other two basis

> vectors of the regular transformation matrix. It wasn't until about a year

> later that I realized that this resulting matrix was the inverse transpose

> of the regular one.

The problem is the sign of the basis vector is unclear by this method. I tried this approach, but it fails on mirror matrices. Suppose your transformation matrix is: [ -1 0 0 0 ] [ 0 1 0 0 ] [ 0 0 1 0 ] [ 0 0 0 1 ]

This matrix definitely affects the surface normal in X, but your two vectors in Y and Z are unaffected. This problem never occurs in the "real" world because such a matrix is equivalent to twisting an object through 4D space and making it go "through the looking glass". However, it happens in computer graphics a lot: e.g. I model half a car body, then mirror reflect to get the other half. If you have a two sided polygon laying in the YZ plane, with one side red & the other blue, and apply the above transformation, no vertices (and no tangent vectors) have any non-zero X components, and so will not change. But the normal does reverse, and the sides switch colors. My conclusion was that you have to use the transpose of the inverse to avoid this problem, since surface normals fail for this case. (p.s. did you get a copy of Glassner's latest (2nd edition) memo on this problem? He does a good job explaining the math).

> About octrees:

>

> 1) I don't use Andrew's hashing scheme either. I transform the ray so that

> my octree always lives in the (0,0,0) to (1,1,1) cube...

Actually, this is the exact approach I finally took, also. I had rejected the hashing scheme earlier, and forgotten why (and misremembered that it was because of memory costs) - the correct reason for not hashing is that it's faster to just zip through the octree by the above method; no hashing is needed. It's pretty durn fast to find the right voxel, I agree.

Have you experimented with trying to walk up and down the octree, that is, when you are leaving an octree voxel you go up to the parent and see if the address is inside the parent? If not, you go to its parent and check the address, etc, until you find that you can go back down. Should be faster than the straight downwards traversal when the octree is deep: the neighboring voxels of the parent of the voxel you're presently in account for 3 of the 6 directions the ray can go, after all. You have 1/2 a chance of descending the octree if you check the parent, 3/4ths if you go up two parents, etc. (Where did I read of this idea, anyway? Fujimoto? Kaplan? Whatever the case, it's not original with me).

Another idea that should be mentioned is one I first heard from Andrew Glassner: putting quadtree-like structures on the cube faces of the octree cubes. It's additional memory, but knowing which octree cube is the next would be a faster process. Hopefully Andrew will write this up sometime.

The subdivision criteria ideas are great - makes me want to go and try them out! When are you going to write it up and get it published somewhere? Lazy subdivision sounds worthwhile: it definitely takes awhile for the octrees to get set up under my present "do it all at the beginning" approach (not to mention the memory costs). That was something I loved about the Arvo/Kirk paper - without it the 5D scheme would appear to be a serious memory hog.

> 4) Second solution to teapot in stadium: I didn't understand why an octree

> wouldn't work well here anyway. Suppose the teapot is completely enclosed

> in a level 8 voxel. That would only "waste" 8x8=64 voxels in getting down

> to the space you would have chosen for just the teapot alone. Reflection

> rays actually hitting the rest of the stadium would be very sparse, so go

> ahead and crank up the max subdivision limit. Am I missing something?

There are two things which disturbed me about the use of the octree for this problem. One was that if the maximum subdivision level was reached prematurely then the octree falls apart. I mentioned that you could indeed subdivide down another 9 levels and have an 18 level octree that would work. However, the problem with this is knowing when to stop - why not go on to 24 levels? For me it boils down to "when do I subdivide?". I suspect that your additional criteria might solve a lot of the pathological cases, which is why I want to test them out. Also note that there are built in maximum subdivision levels in octree schemes which could be reached and still not be sufficient (though admittedly your 24 levels of depth are probably enough. Of course, I once thought 16 bits was enough for a z-buffer - now I'm not so sure. Say you have a satellite which is 5 feet tall in an image, with the earth in the background. We're now talking 23 levels of subdivision before you get within the realm of subdividing the satellite. With 24 levels of depth being your absolute maximum you've hit the wall, with only one subdivision level helping you out on the satellite itself).

Good point that as far as memory goes it's really just 8x8 more voxels "wasted". One problem is: say I'm 8 feet in each direction from the teapot, with me and the teapot in diagonally opposite corners of a cube which is then made into an octree. The only way to get through the 8 cubes in the containing box is to travel through 4 of them (i.e. if I'm in Xhi, Yhi, Zhi and the teapot is in Xlo, Ylo, Zlo, then I have to intersect my own box and then three other boxes to move me through in each "lo" direction). In this case there are only 3 levels of octree cubes I have to go through before getting to the 1 foot cube voxel which contains the teapot. The drawback of the octree is that I have to then do 3x4=12 box intersections which must be done each ray and which are useless. Minor, but now think of reflection rays from the teapot which try to hit the stadium: each could go through up to 8 levels x 4 voxels per level = 32 voxels just to escape the stadium without hitting anything (not including all the voxels needed to be traversed from the teapot to the one foot cube). Seems like a lot of intersection and finding the next octree address and tree traversal for hitting the background. I suspect less bounding volumes would be hit using hierarchy, and the tests would be simpler (many of them being just a quick "is the ray origin inside this box?": if so, check inside the box).

I guess it just feels cleaner to have to intersect only bounding volumes which are needed, which is the feel which automatic bounding volume hierarchy has to it. Boxes can be of any size, so that if someone adds a huge earth behind a satellite all that is added is a box that contains both. With hierarchy you can do some simple tricks to cut down on the number of bounding volumes intersected. For example, by recording that the ray fired at the last pixel hit such and so object, you can test this object first for intersection. This quickly gets you a maximum depth that you need not go beyond: if a bounding volume is intersected beyond this distance you don't have to worry about intersecting its contents. This trick seems to gain you about 90% of the speed-up of the octree (i.e. not having to intersect any more voxels once an intersection is found), while also allowing you the speed up of avoiding needless octree voxel intersections. I call this the "ray coherency" speedup - it can be used for all types of rays (and if you hit when the ray is a shadow ray, you can immediately stop testing - this trick will work for the octree, too! Simply save a pointer to the object which blocked a particular shadow ray for a particular light last pixel and try it again for the next shadow ray).

I still have doubts about the octree. However, with lazy evaluation I think you get rid of one of my major concerns: subdividing too deep makes for massive octrees which soak up tons of memory. Have you had to deal with this problem, i.e. has the octree ever gotten too big, and do you have some way to free up memory (some "least recently used" kind of thing)?

An interesting comment that I read by John Peterson on USENET news some months ago was:

>> [John Watson @ Ames:]

>> Anyway, I know there have been a few variations of the constant-time

>> algorithms around, and what I need to know is, what is the _best_,

>> i.e. simplest, most effiecent, etc, ... version to implement.

>>

>> Could some of you wonderful people comment on these techniques in general,

>> and maybe give me some pointers on recent research, implementions, etc.

>

> This is an interesting question. Here at Utah, myself and Tom Malley

> implemented three different schemes in the same ray tracer; Whitted/Rubin,

> Kay/Kajiya, and an octree scheme (similar to the Glassner/Kaplan, camp, I

> think). The result? All three schemes were within 10-20% of each other

> speedwise. Now, we haven't tested these times extensively; I'm sure you could

> find wider variances for pathological cases. But on the few generic test

> cases we measured, there just wasn't much difference. (If we get the time,

> we plan on benchmarking the three algorithms more closely).

I suspect that this is probably the case, with octree working best when the scene depth (i.e. the number of objects which are intersected by each ray, regardless of distance) is high, the "ray coherency" method outlined above for hierarchy fails, and so cutting off early is a big benefit. Automatic hierarchy probably wins when there are large irregularities in the density of the number of objects in space. (Of course, the SEADS method (equal sized voxels and 3DDDA) is ridiculous for solving the "teapot in a stadium" kind of problems, but it's probably great for machines with lots of memory ray tracing scenes with a localized set of objects.

By the way, I found Whitted/Rubin vs. Kay/Kajiya to be about the same: Kay had less intersections, but the sorting killed any time gained. I find the coherency ray technique mostly does what Kay/Kajiya does: quickly gets you a maximum intersection depth for cutoff.

Without the memory constraints limiting the effectiveness of the octree I can believe it well could be the way of the future: it is ideal for hardware solution (so those extra voxel intersection and traversal tests don't bother me if they're real fast), sort of like how the z-buffer is the present winner in hidden surface algorithms because of its simplicity.

So, how's that for a turnabout on my polemical anti-octree position? Nonetheless, I'm not planning to change my hierarchy code in the near future - not until the subdivision and memory management problems are more fully understood.

All for now,

Eric Haines

back to contents

One way that we've dealt with situations similar to Eric's teapot example is to use a combination of spatial subdivision and bounding volume techniques. For instance, we commonly mix two or three of the following techniques into a "meta" hierarchy for ray tracing a single environment:

1) Linear list

2) Bounding box hierarchy

3) Octrees (including BSP trees)

4) Linear grid subdivision

5) Ray Classification

We commonly refer to these as "subspaces". For us this means some (convex) volume of space, a collection of objects in it, and some technique for intersecting a ray with those objects. This technique is part of an "aggregate object", and all the objects it manages are the "children". Any aggregate object can be the child of any other aggregate object, and appears simply as a bounding volume and intersection technique to its parent. In other words, it behaves just like a primitive object.

Encapsulating a subspace as just another "object" is very convenient. This is something which Dave and Olin and I agreed upon in order to make it possible to "mix and match" our favorite acceleration techniques within the same ray tracer for testing, benchmarking, and development purposes.

As an example of how we've used this to ray trace moderately complex scenes I'll describe the amusement park scene which we animated. This consisted of a number of rides spread throughout a park, each containing quite a bit of detail. We often showed closeups of objects which reflected the rest of the park (a somewhat scaled down version of the teapot reflecting the stadium). There were somewhere around 10,000 primitive objects (not including fractal mountains), which doesn't sound like much anymore, but I think it still represents a fairly challenging scene to ray trace -- particularly for animating.

The organization of the scene suggested three very natural levels of detail. A typical example of this is

I) Entire park ( a collection of rides, trees, and mountains ) II) Triple decker Merry-go-round ( one of the rides ) III) A character riding a horse ( a "detail" of a ride )

Clearly a single linear grid would not do well here because of the scale involved. Very significant collections of primitives would end up clumped into single voxels. Octress, on the other hand, can deal with this problem but don't enjoy quite the "voxel walking" speed of the linear grid. This suggests a compromise.

What we did initially was to place a coarse linear grid around the whole park, then another linear grid (or octree) around each ride, and frequently a bounding box hierarchy around small clusters of primitives which would fall entirely with a voxel of even the second-level (usually 16x16x16) linear grid.

Later, we began to use ray classification at the top level because, for one thing, it did some optimizations on first-generation rays. The other levels of the hierarchy were kept in place for the most part (simplified a bit) in order to run well on machines with < 16 MB of physical memory. This effectively gave the RC (ray classification) aggregate object a "coarser" world to deal with, and drastically cut down the size of the candidate sets it built. Of course, it also "put blinders" on it by not allowing it to distinguish between objects inside these "black boxes" it was handed. It's obviously a space-time trade-off. Being able to nest the subspaces provides a good deal of flexibility for making trade-offs like this.

A small but sort of interesting additional benefit which falls out of nesting subspaces is that it's possible to take better advantage of "sparse" transformations. Obviously the same trick of transforming the rays into a canonical object space before doing the intersection test (and transform the normal on the way out) also works for aggregate objects. Though this means doing possibly several transforms of a ray before it even gets to a primitive object, quite often the transforms which are lower in the hierarchy are very simple (e.g. scale and translate). So, there are cases when a "dense" (i.e. expensive) transform gets you into a subspace where most of the objects have "sparse" (i.e. cheap) transforms. [I'll gladly describe how we take advantage of matrix sparsity structures if anybody is interested.] If you end up testing N objects before finding the closest intersection, this means that (occasionally) you can do the job with one dense transform and N sparse ones, instead of N dense transforms. This is particularly appropriate when you build a fairly complex object from many scaled and translated primitives, then rotate the whole mess into some strange final orientation. Unfortunately, even in this case it's not necessarily always a win. Often just pre-concatenating the transforms and tossing the autonomous objects (dense transforms and all) into the parent octree (or whatever) is the better thing to do. The jury is still out on this one.

Currently, all of the "high level" decisions about which subspaces to place where are all made manually and specified in the modeling language. This is much harder to do well than we imagined initially. The tradeoffs are very tricky and sometimes counter-intuitive. A general rule of thumb which seems to be of value is to put an "adaptive" subspace (e.g. an octree, RC) at the top level if the scene has tight clusters of geometry, and a Linear grid if the geometry is fairly uniform. Judicious placement of bounding box hierarchies within an adaptive hierarchy is a real art. On the one hand, you don't want to hinder the effectiveness of the adaptive subspace by creating large clumps of geometry that it can't partition. On the other hand, a little a priori knowledge about what's important and where bounding boxes will do a good job can often make a big difference in terms of both time and space (the space part goes quintuple for RC).

Now, the obvious question to ask is "How can this be done automatically?" Something akin to Goldsmith and Salmon's automatic bounding volume generation algorithm may be appropriate. Naturally, in this context, we're talking about a heterogeneous mixture of "volumes," not only differing in shape and surface area, but also in "cost," both in terms of space and time. Think of each subspace as being a function which allows you to intersect a ray with a set of objects with a certain expected (i.e. average) cost. This cost is very dependent upon the spatial arrangement and characteristics of the objects in the set, and each type of subspace provides different trade-offs. Producing an optimal (or at least good) organization of subspaces is then a very nasty combinatorial optimization problem.

An idea that I've been toying with for quite some time now is to use "simulated annealing" to find a near-optimal subspace hierarchy, where "optimality" can be phrased in terms of any desired objective function (taking into account, e.g., both space and time). Simulated annealing is a technique for probabilistically exploring the vast solution space (typically) of a combinatorial optimization problem, looking for incremental improvements WITHOUT getting stuck too soon in a local minimum. It's very closely linked to some ideas in thermodynamics, and was originally motivated by nature's ability to find near-optimal solutions to mind-bogglingly complex optimization problems -- like getting all the water molecules in a lake into a near-minimum-energy configuration as the temperature gradually reaches freezing. It's been fairly successfull at "solving" NP-hard problems such as the travaling salesman and chip placement (which are practically the same thing).

This part about simulated annealing and subspace hierarchies is all very speculative, mind you. It may not be practical at all. It's easy to imagine the "annealing" taking three CPU-years to produce a data structure which takes an hour to ray trace (if it's done as a preprocessing step -- not lazily). There are many details which I haven't discussed here -- largely because I haven't figured them out yet. For example, one needs to get a handle on the distribution of rays which will be intersected with the environment in order to estimate the efficiency of the various subspaces. Assuming a uniform distribution is probably a good first approximation, but there's got to be a better way -- perhaps through incremental improvements as the scene is ray traced and, in particular, between successive frames of an animation.

If this has any chance of working it's going to require an interesting mix of science and "art". The science is in efficiently estimating the effectiveness of a subspace (i.e. predicting the relevant costs) given a collection of objects and a probability density function of rays (probably uniform). The art is in selecting an "annealing schedule" which will let the various combinations of hierarchies percolate and gradually "freeze" into a near-optimal configuration. Doing this incrementally for an animation is a further twist for which I've seen no analogies in the simulated annealing literature.

If you've never heard of simulated annealing and you're interested in reading about it, there's a very short description in "Numerical Recipes." The best paper that I've found, though, is "Optimization by Simulated Annealing," by S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, in the May 13, 1983 issue of Science.

Does this sound at all interesting to anybody? Is anyone else thinking along these or similar lines?

back to contents

Eric Haines / erich@acm.org