sobota, 28 sierpnia 2010

nlog(n)log(n) kd-tree construction using SAH

Disclaimer:
I don't know if anyone has ever invented this algorithm yet or not. A much more important fact is that I haven't found any intuitive explanation of it in the internet. Here is my attempt. Note that this article is inspired by a real piece of code, executing in my ray tracer.

Introduction

KD tree is a data structure, based on a space subdivision scheme. It is useful for fast search for shapes intersecting with a line in 3D space. Ray tracing is an obvious application of kd trees. The defining principle of a kd tree is that an axis-aligned bounding box of all shapes is split recursively into two child boxes, using an axis aligned plane. Each child box is going to have a list of all shapes that intersect with it. The split plane is positioned according to some kind of a sensible heuristic calculation. One of the best heuristics is the Surface Area Heuristic (SAH for short).
In this post I am not going to waste time repeating wildly accessible general information about kd trees and SAH. It is the fast use of geometry that is interesting in my method. Instead I would like to point the reader to materials that in my opinion explain it clearly enough:

http://en.wikipedia.org/wiki/Kd-tree
http://www.devmaster.net/articles/raytracing_series/part7.php (skip the beginning and go to 'kd trees' section)
http://www.devmaster.net/forums/showthread.php?t=15004

Now I assume that the reader knows how is kd-tree structured, what principles is SAH based on and what is the simple, naive algorithm for building the tree using SAH.

General outline of the method (also the naive method):

1. Find all position of the split planes which are worth consideration.
2. For each split position P:
2.1. Compute the coordinates that define the two boxes: one to the left and one to the right of the split plane.
2.2. Find the number NL of shapes that intersect left box and the NR of shapes that intersect the right box.
2.3. Calculate the SAH as NL*[the area of sides of the left box] + NR*[the area of sides of the right box]. There are usually some additional constants in this formula for adjusting the number of shapes per leaf, but they are not significant for the purpose of this post.
2.4. If the computed SAH value is the best so far, save it, together with the split position.
3. Take the best SAH value and check of it makes sense to split this node. If it is the case, compute the coordinates of the left and right box, create two children of this node and insert the intersecting shapes to them.

The most expensive step of the naive version is 2.2. In the naive version it involves checking if an intersection exists between each shape and the left/right box. This is O(n) step in an O(n) loop, which lifts the time complexity to O(n*n). It is done in each node of a tree, whose expected depth is O(log(n)), so the total expected complexity of the naive version of tree building is O(n*n*log(n)).

Lets assume, without the loss of generality, that the split axis is X (the split axis is the axis orthogonal to the splitting plane). My modification is based on observation that its not necessary to traverse all shapes for all possible split positions on X axis. Instead, I compute intersection (common part) of each shape with the node box and create two lists of all shapes, sorted by one of the extreme X positions of the intersection. First list is sorted by the minimal X values (later called 'Min'), second by the maximum X values (later called 'Max'). Note that for any split position P, all shapes which have Min <= P are going to intersect with the left child box (where 'left' is defined as having Xs smaller than P). Also, all shapes which have Max >= P are going to intersect the right box. The shape counts in the left and right box are simply the shape counts before and after certain elements on the Min and Max sorted lists. I can find those 'certain elements' in log(n) time if I use appropriate implementations of those lists.

Here is a more specific explanation of my modification.
I do additional precomputation in step 0:

0. For each shape compute the Min and Max coordinates of its intersection with box of the node. By intersection I mean common points of the shape and the box. For example an intersection of a box and a plane could be a rectangle. Only the coordinates on the axis orthogonal to the splitting plane are needed, so for each shape there will be just one pair of coordinates.
Create two sorted maps, containing all shapes:
  [Min]->[(shape pointer; an integer variable C, equal to 0)],
  [Max]->[(shape pointer; an integer variable C, equal to 0)].
Notation: In [a]->[b] a is a key, b is value and '->' means 'maps to'. (a;b) is a pair of elements a and b.
In fact, the maps above have to be multimaps, since min or max may be equal for multiple shapes. Any decent programming language should provide an abstraction of a sorted (multi)map, with logarithmic insertion and search operations. If your language provides a map, but does not provide multimap, you can use a map in which values are arrays of the pairs. If you use the C language, you can implement it as a balanced binary tree (e.g. red-black tree) or a skip list.
So far, the contents of the 0 step may be easily implemented by creating empty maps before the loop and inserting the shapes to them as soon as the min and max values are computed in each loop cycle. Then comes the assignment to the integer variables in the map:
  Initialize 'i' to 0.
  For each entry 'e' in the map:
    Set 'e'.value.C to 'i'.
    Increment 'i'.
This way each shape's map entry contains the number of shapes that preceed it in the map. Hence, the maps become:
  [Min]->[(pointer to S; count of shapes before S in this map)],
  [Max]->[(pointer to S; count of shapes before S in this map)].

Now the step 2.2 becomes:

2.2.Please, recall that we are interested in computing NL and NR, i.e. the counts of the shapes that intersect the left/right child box. Also recall that we are currently considering split position P.
Lets introduce integer Nt - the total shape count in the box (in other words: in both child boxes together). The calculation of NL an NR is as follows:
2.2.1 NL.
In the Min map find the first entry 'e' whose key is greater than P.
NL value will be equal to 'e'.value.C.
If there is no such entry (because P is greater than all Min keys), NL is equal to Nt.
2.2.2 NR.
In the Max map find first entry 'e', whose key is greater or equal to P.
NR value will be equal to Nt-('e'.value.C).
If there is no such entry (because P is greater than all Max keys), NR will be 0.

Any decent sorted map abstraction should provide a way of searching for the entries whose keys are most similar to given value, so I am not going to cover such search here. Note, however, that the complexity of 2.2.1 and 2.2.2 in a sorted map is log(n).

The rest of the algorithm stays the same. Actually, I think that this may also be implemented using a sorted array and binary search instead of the multimap, but I didn't want to obfuscate the explanation with too much generality, so I chose to refer to multimap concept consistently.

In the next post I am going to describe one more thing that may not be immediately obvious: how to compute the polygon created by intersecting a triangle with box. I am not going to cover shapes other than triangles since I solely use triangles, so I would not be writing from experience.