8. Managing Massive Amounts of Data – OpenSceneGraph 3 Cookbook

Chapter 8. Managing Massive Amounts of Data

In this chapter, we will cover:

  • Merging geometry data

  • Compressing textures

  • Sharing scene objects

  • Configuring the database pager

  • Designing simple culling strategy

  • Using occlusion query to cull objects

  • Managing scene objects with an octree algorithm

  • Rendering point cloud data with draw instancing

  • Speeding up the scene intersections

Introduction

It is more and more common to handle massive scene data in 3D programs. Terrain visualization is one common usage that converts extremely high-resolution images into grid or triangular models. We have already discussed terrain building with VPB in Chapter 7. However, this is not enough. Many types of unordered data, such as the geographical and geological information, point cloud from scanners, crowded people and vehicle models, and other scientific data, should also be reconstructed and rendered in 3D applications. It is not surprising if these kinds of data include millions or even billions of points and triangles. Also, we must think of some solutions to cull and render them within the limits of the operating system and hardware abilities.

In this chapter, we will introduce some common methods to optimize nodes, geometries, and textures in OSG, and provide several ways to cull scene objects, in order to reduce the number of objects before they are sent to the rendering pipeline. A spatial indexing algorithm called an octree is also introduced here with a very simple example.

We will add some more common functions in the osgCookBook namespace. They are used for generating random values for constructing huge scenes. The randomValue() function will return a float value between min and max. The randomVector() function returns a 3D vector with one component each between the min and max values. The randomMatrix() function will return a new 4x4 matrix, which is made up of a translation and an euler rotation operation. The translation and the angle values are randomly generated between the input min and max parameters.

Namespace osgCookBook {
float randomValue( float min, float max )
{
return (min + (float)rand()/(RAND_MAX+1.0f) * (max - min));
}
osg::Vec3 randomVector( float min, float max )
{
return osg::Vec3( randomValue(min, max),
randomValue(min, max),
randomValue(min, max) );
}
osg::Matrix randomMatrix( float min, float max )
{
osg::Vec3 rot = randomVector(-osg::PI, osg::PI);
osg::Vec3 pos = randomVector(min, max);
return osg::Matrix::rotate(rot[0], osg::X_AXIS, rot[1],
osg::Y_AXIS, rot[2], osg::Z_AXIS)*osg::Matrix::translate(pos);
}
}

Merging geometry data

It is common in a complex program to contain hundreds of thousands of geometry objects. For example, a digital city running on the local machine or the Internet may have 10,000 detailed houses, and each house can have doors, windows, fences, and many other components.

It may be sometimes confusing for developers in this situation—should we try to merge all (or majority) of them in to one osg::Geometry object or use more geometries to represent the different house elements? The answer depends on different situations that we may face. However, less geometry objects always perform better while rendering the same number of triangles, as the shown in the following section.

How to do it...

Let us start.

  1. Include necessary headers:

    #include <osg/Geometry>
    #include <osg/Group>
    #include <osgDB/ReadFile>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    
  2. We will try to show the importance of merging geometries, so the first thing is to have a createTilescreateTiles() function to create as many geometries as possible and force the scene to be slow:

    osg::Node* createTiles( unsigned int cols, unsigned int
    rows )
    {
    osg::ref_ptr<osg::Geode> geode = new osg::Geode;
    for ( unsigned int y=0; y<rows; ++y )
    {
    for ( unsigned int x=0; x<cols; ++x )
    {
    osg::ref_ptr<osg::Geometry> geom = new osg::Geometry;
    ... // Please see the source code for details
    geode->addDrawable( geom.get() );
    }
    }
    return geode.release();
    }
    
  3. Now, let us try rendering these geometries:

    osgViewer::Viewer viewer;
    viewer.setSceneData( createTiles(300, 300) );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    return viewer.run();
    
  4. Press the S key to see the frame rate. I have tested it on an Intel Dual-Core computer with a GTX 460 graphics card, which achieved an underwhelming 20FPS. Absolutely not a good performance!

  5. Rewrite the createTiles() function and use only one osg::Geometry instance to contain all the quads. Compile and check the frame rate again:

    osg::Node* createTiles( unsigned int cols, unsigned int
    rows )
    {
    unsigned int totalNum = cols * rows, index = 0;
    osg::ref_ptr<osg::Vec3Array> va = new
    osg::Vec3Array(totalNum * 4);
    osg::ref_ptr<osg::Vec3Array> na = new
    osg::Vec3Array(totalNum);
    osg::ref_ptr<osg::Vec4Array> ca = new
    osg::Vec4Array(totalNum);
    osg::ref_ptr<osg::Geometry> geom = new osg::Geometry;
    for ( unsigned int y=0; y<rows; ++y )
    {
    for ( unsigned int x=0; x<cols; ++x )
    {
    ... // Please see the source code for details
    }
    }
    osg::ref_ptr<osg::Geode> geode = new osg::Geode;
    geode->addDrawable( geom.get() );
    return geode.release();
    }
    
  6. The rendering result is not changed, but the frame rate can rise to at least 60FPS this time (60 is the refreshed frequency for most kinds of displays unless you disable the vertical sync feature).

How it works...

Every osg::Drawable object by default will generate a display list on the GPU side, which can speed up the rendering process of static elements. These display lists, if too many, will decrease the frame rate, and affect the rendering performance. That happens because there is always a fixed cost for the execution of a display list, no matter how much or how little work that list does. Thus, if possible, we have to merge the osg::Geometry objects in this recipe to reduce such cost.

Of course, merging of geometries may lose some necessary information. For example, if each of the geometries has a different texture applied, it is nearly impossible to combine them into one osg::Geometry object. In that case, some other solutions may have to be considered, such as dividing the scene using quadtree or octrees.

The same problem may occur if you want to use vertex buffer object (VBO) on geometries and don't merge them, as VBO has to create buffer data for each of the geometry's vertices and vertex attribute arrays.

Compressing texture

OpenGL provides a series of texture compression methods and new internal texture formats for faster rendering and lower memory requirements. The compression mainly boosts the speed of downloading textures into the texture memory and reduces the file size on the local disk. Also, it massively reduces the GPU memory consumption, which is one of the most important reasons to use it. This technique is used frequently in modern 3D development because it can produce high-quality results with a much smaller size on both CPU and GPU sides.

OSG has already supported a very simple way to make use of different built-in compression algorithms. In this recipe, we will again create a huge number of quads and apply random textures to them, and also show the difference between using and not using compressed textures.

Getting ready

We will introduce the Process Explorer utility, which is developed by Mark Russinovich. It is an advanced process management tool that will show detailed information of a particular process and list all the DLLs it has loaded. It can also compute total and available virtual and physical memory, as well as the CPU and GPU usage on the fly. Its ability for tracking GPU usage and memory information is extremely useful here because we cannot read current GPU information directly in common ways. This functionality is not workable under non-Windows systems or versions lower than Windows 7.

You can download Process Explorer at:

http://technet.microsoft.com/en-us/sysinternals/bb896653

In order to view the GPU information, you must first start the procexp.exe executable and select System Information from View on the menu bar. Then you can change to the GPU tab to view current usage and available video memory if your operating system supports it, as shown in the following screenshot:

How to do it...

Let us start.

  1. Include necessary headers:

    #include <osg/Texture2D>
    #include <osg/Geometry>
    #include <osg/Group>
    #include <osgDB/ReadFile>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    
  2. We will use a createRandomImage() function to make numerous random images for texturing, instead of loading limited numbers of existing images from the local disk.

    osg::Image* createRandomImage( int width, int height )
    {
    osg::ref_ptr<osg::Image> image = new osg::Image;
    image->allocateImage( width, height, 1, GL_RGB,
    GL_UNSIGNED_BYTE );
    unsigned char* data = image->data();
    for ( int y=0; y<height; ++y )
    {
    for ( int x=0; x<width; ++x )
    {
    *(data++) = osgCookBook::randomValue(0.0f, 255.0f);
    *(data++) = osgCookBook::randomValue(0.0f, 255.0f);
    *(data++) = osgCookBook::randomValue(0.0f, 255.0f);
    }
    }
    return image.release();
    }
    
  3. First, let us see how much memory is occupied without any compression methods, that is, the default GL_RGB format will be used and passed to the OpenGL pipeline.

    osg::Node* createQuads( unsigned int cols, unsigned int
    rows )
    {
    osg::ref_ptr<osg::Geode> geode = new osg::Geode;
    for ( unsigned int y=0; y<rows; ++y )
    {
    for ( unsigned int x=0; x<cols; ++x )
    {
    osg::ref_ptr<osg::Texture2D> texture = new
    osg::Texture2D;
    texture->setImage( createRandomImage(512, 512) );
    osg::Vec3 center((float)x, 0.0f, (float)y);
    osg::ref_ptr<osg::Drawable> quad =
    osg::createTexturedQuadGeometry(
    center, osg::Vec3(0.9f, 0.0f, 0.0f), osg::Vec3(0.0f,
    0.0f, 0.9f) );
    quad->getOrCreateStateSet()->
    setTextureAttributeAndModes( 0, texture.get() );
    geode->addDrawable( quad.get() );
    }
    }
    return geode.release();
    }
    
  4. Now start the viewer:

    osgViewer::Viewer viewer;
    viewer.setSceneData( createQuads(20, 20) );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    return viewer.run();
    
  5. Open the ProcessExplorer and write down the available memory values on the CPU and GPU sides (~730MB on the GPU side on the author's PC).

  6. Now add two lines in the createQuads() function, after the texture object is just created:

    texture->setInternalFormatMode(
    osg::Texture2D::USE_S3TC_DXT1_COMPRESSION );
    texture->setUnRefImageDataAfterApply( true );
    
  7. Rebuild and re-run the application, and record the values in ProcessExplorer. You will find that both the free system memory and used graphics memory are much larger than the last time (~250MB on the GPU side). This of course saves space for further use of resources.

How it works...

It is really simple to use OpenGL-supported compressed textures in OSG applications. The setInternalFormatMode() method can be used to quickly specify the compression type, and OpenGL will internally do the work. For instance, in this recipe we indicate OSG to change all textures to S3TC DXT1 format:

texture->setInternalFormatMode(
osg::Texture2D::USE_S3TC_DXT1_COMPRESSION ););

Other supported formats include:

Internal format mode

Supported image type

Description

USE_IMAGE_DATA_FORMAT

Any

Let the developer decide the texture format by calling setInternalFormat().

USE_ARB_COMPRESSION

Any uncompressed

Use the ARB_texture_compression specification to compress the texture.

USE_S3TC_DXT1_COMPRESSION

RGB and RGBA

Use S3TC DXT1 compression type.

http://en.wikipedia.org/wiki/S3_Texture_Compression

USE_S3TC_DXT1_COMPRESSION

RGBA only (RGB is handled by DXT1)

Use S3TC DXT3 compression type.

USE_S3TC_DXT5_COMPRESSION

RGBA only (RGB is handled by DXT1)

Use S3TC DXT5 compression type.

USE_PVRTC_2BPP_COMPRESSION

USE_PVRTC_4BPP_COMPRESSION

RGB and RGBA

Use the GLES compression type:

http://www.khronos.o rg/registry/gles/extensions/IMG/IMG_texture_compression_pvrtc.txt

USE_ETC_COMPRESSION

RGB

Use another GLES compression type:

http://en.wikipedia.org/wiki/Ericsson_Texture_Comp ression

USE_ETC_COMPRESSION

RGB

Use another GLES compression type:

http://en.wikipedia.org/wiki/Ericsson_Texture_Comp ression

USE_RGTC1_COMPRESSIONN

USE_RGTC2_COMPRESSION

RGB and RGBA

Use the OpenGL 2.0 compression type:

http://www.opengl.org/registry/specs/EXT/texture_compression_rgtc.txt

This reduces the graphics card storage requirements effectively and accelerates the dynamic loading of textures, as you can see from the Process Explorer utility. However, a more optimal way for using compressed textures is to pre-compress them before rendering. Some utilities, such as NVTT, which is introduced in the last chapter, can produce .DDS images using compressed format and save them to disk files. These types of files can be directly recognized and used by OSG and OpenGL in user applications.

Another useful method here is setUnrefImageDataAfterApply(), as it can be set to true to force OSG to release the osg::Image objects (on the CPU side) after they are compiled into the GPU side, and thus release more system memory for other uses.

Note that as the image objects are deleted from the memory after being applied, they can never be read or written after deletion on the CPU side, and can only be retrieved by calling functions such as glReadPixels() or loading them from disk files again. It is suggested that you only use this feature on static textures.

Sharing scene objects

You may have already learnt some ways for sharing scene nodes, drawables, and state attributes to improve the storage and rendering performance. For example, you can make multiple mid-layer nodes, share the same child while building the scene graph, and you can also use osgDB::SharedStateManager to automatically collect and share texture objects of paged nodes.

In this recipe, we are going to use a share list to manage nodes read from the disk files, and cache the file reading process by comparing the filename with the names stored in this list. If a node in the share list is no longer referred to any other nodes or scene objects (that is, it is only referred by the list), it will be removed from the list to release system memory. This will be done at regular intervals (some seconds) in a 'prune' process.

How to do it...

Let us start.

  1. Include necessary headers:

    #include <osg/MatrixTransform>
    #include <osgDB/ReadFile>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    #include <fstream>
    #include <iostream>
    
  2. The osgDB::ReadFileCallback will replace the default implementation. Hence, we can make use of this class to check if a new file reading request is already recorded in the sharing list, and use the referenced object stored instead of reading from the file again.

    class ReadAndShareCallback : public osgDB::ReadFileCallback
    {
    public:
    virtual osgDB::ReaderWriter::ReadResult readNode( const
    std::string& filename, const osgDB::Options* options );
    void prune( int second );
    protected:
    osg::Node* getNodeByName( const std::string& filename );
    typedef std::map<std::string, osg::ref_ptr<osg::Node> >
    NodeMap;
    NodeMap _nodeMap;
    OpenThreads::Mutex _shareMutex;
    
  3. The readNode() function must be re-implemented to take control of the file reading and sharing process. First, we check simply to see if the filename exists in the sharing list. If not, we will redirect to the default implementation to read the file from a certain plugin, and add the new name to the list; otherwise, we inform the caller that we have found an existing filename and will directly use the stored one as the return value:

    osgDB::ReaderWriter::ReadResult
    ReadAndShareCallback::readNode( const std::string&
    filename, const osgDB::Options* options )
    {
    OpenThreads::ScopedLock<OpenThreads::Mutex> lock(
    _shareMutex );
    osg::Node* node = getNodeByName( filename );
    if ( !node )
    {
    osgDB::ReaderWriter::ReadResult rr =
    osgDB::Registry::instance()->readNodeImplementation(
    filename, options );
    if ( rr.success() ) _nodeMap[filename] = rr.getNode();
    return rr;
    }
    else
    std::cout << "[SHARING] The name " << filename << " is
    already added to the sharing list." << std::endl;
    return node;
    }
    
  4. The prune() function will traverse the sharing list and check if an element has no more reference besides the sharing list itself. It should be executed every few seconds to optimize the list, but not every frame because it takes some time for each call:

    void ReadAndShareCallback::prune( int second )
    {
    if ( !(second%5) ) // Prune the scene every 5 seconds
    return;
    OpenThreads::ScopedLock<OpenThreads::Mutex> lock(
    _shareMutex );
    for ( NodeMap::iterator itr=_nodeMap.begin();
    itr!=_nodeMap.end(); )
    {
    if ( itr->second.valid() )
    {
    if ( itr->second->referenceCount()<=1 )
    {
    std::cout << "[REMOVING] The name " << itr->first
    << " is removed from the sharing list." <<
    std::endl;
    itr->second = NULL;
    }
    }
    ++itr;
    }
    }
    
  5. The checking and getting of the node pointer from a filename is done in the getNodeByName() method.

    osg::Node* ReadAndShareCallback::getNodeByName( const
    std::string& filename )
    {
    NodeMap::iterator itr = _nodeMap.find(filename);
    if ( itr!=_nodeMap.end() ) return itr->second.get();
    return NULL;
    }
    
  6. We can have a RemoveModelHandler class to implement a picking and removing handler in the scene. Another important task is to call the pruning method in the FRAME event:

    class RemoveModelHandler : public osgCookBook::PickHandler
    {
    public:
    RemoveModelHandler( ReadAndShareCallback* cb ) :
    _callback(cb) {}
    virtual bool handle( const osgGA::GUIEventAdapter& ea,
    osgGA::GUIActionAdapter& aa )
    {
    if ( ea.getEventType()==osgGA::GUIEventAdapter::FRAME )
    {
    if ( _callback.valid() )
    _callback->prune( (int)ea.getTime() );
    }
    return osgCookBook::PickHandler::handle(ea, aa);
    }
    virtual void doUserOperations(
    osgUtil::LineSegmentIntersector::Intersection& result )
    {
    ... // Please see the source code for details
    }
    osg::observer_ptr<ReadAndShareCallback> _callback;
    };
    
  7. The addFileList() function can read model filenames and positions from an ASCII file and add the transformed node to the root. It doesn't handle the case of passing the same filename multiple times in the same file. But the ReadAndShareCallback will do this work instead:

    void addFileList( osg::Group* root, const std::string& file )
    {
    ... // Please see the source code for details
    }
    
  8. In the main entry, we add the sharing callback to the database registry singleton. Also, read the multi-line data from the data.txt.

    osg::ref_ptr<ReadAndShareCallback> sharer = new
    ReadAndShareCallback;
    osgDB::Registry::instance()->setReadFileCallback(
    sharer.get() );
    osg::ref_ptr<osg::Group> root = new osg::Group;
    addFileList( root.get(), "files.txt" );
    osgViewer::Viewer viewer;
    viewer.setSceneData( root.get() );
    viewer.addEventHandler( new
    RemoveModelHandler(sharer.get()) );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    return viewer.run();
    
  9. The application will read multiple lines of filenames and load them. You can use Ctrl + left mouse button to remove models from the current scene graph.

  10. The result and the terminal output are both shown in the following screenshot. Remove the line of setReadFileCallback() and restart the program. Open Process Explorer and see if there are any changes to the system memory. The more files read, the clearer result you will see:

How it works...

The most important step here is to design the file reading callback, which will be used to replace the standard file reading operation. In the readNode() method of ReadAndShareCallback, we first check if the input filename is already saved in the _nodeMap variable, and then directly return the stored node object if there is a matched result; otherwise, the standard method will be called and the returned value will be cached for further reading requests.

The following line defines a scoped read/write lock for multithreaded developing in both readNode() and prune() methods.

OpenThreads::ScopedLock<OpenThreads::Mutex> lock( _shareMutex );

It will work if either readNode() or prune() is called, and will be automatically disabled when the method ends. While the lock is enabled, all other threads will be blocked if they are trying to execute the same two methods. Therefore, it prevents the same reading process from being called by multiple threads, and avoids possible problems and crashes.

There's more...

In fact, OSG has already provided a much simpler way to implement caching of nodes and images. For instance, we can use the setObjectCacheHint() method of osgDB::Options class to indicate that the node or image to be read should be recorded in the OSG internal registry and thus avoid replicated loading of files. An example code segment is given as follows:

osg::ref_ptr<osgDB::Options> options = new osgDB::Options;
options->setObjectCacheHint( osgDB::Options::CACHE_NODES );
osg::Node* model = osgDB::readNodeFile( "cow.osg", options.get() );

You can use the CACHE_IMAGES value instead to cache images while calling the osgDB::readImageFile() function.

Configuring the database pager

It is not the first time we come across OSG's powerful database pager. It is actually an internal osgDB::DatabasePager object, which manages the loading of external files in a separate thread. It synchronizes the loaded files with the scene graph and makes them render properly in the rendering thread. It can also remove nodes that are out of view from the memory to reduce system resource consumption, and reload them when they are visible to the viewer again.

The osg::PagedLOD and osg::ProxyNode nodes, both use the database pager for implementing their underlying functionalities. The paged LOD nodes depend heavily on the pager to load and unload child levels dynamically. In this recipe, we will introduce several practical functions, which may help a lot when handling very large landscapes (possibly made up of hundreds of thousands of paged LODs).

How to do it...

Let us start.

  1. Include necessary headers:

    #include <osg/Texture>
    #include <osg/Node>
    #include <osgDB/DatabasePager>
    #include <osgDB/ReadFile>
    #include <osgUtil/PrintVisitor>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    
  2. We will get the scene filename to load from the argument parser. Meanwhile, we will set up the maximum texture pool size to 64000 bytes. We will explain the usage of a texture pool later:

    osg::ArgumentParser arguments( &argc, argv );
    osg::ref_ptr<osg::Node> root =
    osgDB::readNodeFiles(arguments);
    osg::Texture::getTextureObjectManager(0)->
    setMaxTexturePoolSize( 64000 );
    
  3. Every viewer object will have a default database pager. We can directly obtain it and alter its parameters. The meanings of the two methods (setDoPreCompile() and setTargetMaximumNumberOfPageLOD()) used here will be discussed later in the How it works... section, too:

    osgViewer::Viewer viewer;
    osgDB::DatabasePager* pager = viewer.getDatabasePager();
    pager->setDoPreCompile( true);
    pager->setTargetMaximumNumberOfPageLOD( 10 );
    
  4. Now start the viewer:

    viewer.setSceneData( root.get() );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    return viewer.run();
    
  5. We have to pass a filename for the recipe to load and render it. For example, you may work on the gcanyon terrain generated in the last chapter (assuming that the executable is named as cookbook_08_04):

    # cookbook_08_04 output/out.osgb
    
    
  6. With modern devices and full-fledged computers, you may not be able to figure out the performance difference between using this application and using osgviewer directly. In this case, the system memory and IO usage graphs provided in the Process Explorer (in the System Information dialog) may help you understand something. View the same terrain with this recipe's code and the osgviewer. Use your mouse to zoom in and zoom out the scene for more than one time. You may get graphs as shown in the following screenshot:

  7. The first line shows the memory utilization of the two executables with the same paged terrain file. You can see that this recipe (left-side) has a more frequent increasing and decreasing alternation of the memory, which surely occurs when the camera is zooming in and out. This means the paged nodes of the terrain are dynamically loaded and unloaded in a more frequent way. In contrast, the result for osgviewer holds a constant memory usage, which means there are nearly no unloading processes during the navigation of the scene.

  8. The second pair of graphs graphs record the disk I/O requests. It actually indicates how often OSG reads the files from disk, which are the child nodes of the osg::PagedLOD nodes. The implementation of this recipe obviously requires more I/O operations, but the other one has a higher number of I/O requests simultaneously, which may lead to frame dropping and low scene performance.

How it works...

The setMaxTexturePoolSize() method enables OSG's internal texture object pool, which can be used for recycling orphaned texture objects or reusing textures that are out of date. Without a texture pool, we may have to repeatedly free and allocate memories used by textures while paged nodes are loaded and unloaded with a high frequency. This can lead to memory fragmentation problems and thus slow down the system.

The texture pool partly solves this issue. It preserves a piece of system memory (64000 bytes in this recipe, but this can be customized), which can be reused at any time for texture creation requests. When we allocate space for new textures, the pool will be considered first and its available space will be split and used directly instead of asking for new ones. This feature can efficiently avoid or reduce the frame drops due to un-needed memory allocation and freeing.

The setDoPreCompile() is another important method that we should pay attention to, especially when there are too many objects loaded in one frame. However, all these GL objects (buffers, textures, and so on) must be compiled immediately for OpenGL to render them. It may cause terrible frame drops as the system is too busy to handle so many requests and allocations.

However, incremental setDoPreCompile(true) enables an incremental pre-compilation mechanism, that is, the objects will be compiled and rendered in succession during several frames instead of just one frame. So, if we have too many scene objects to compile at the same time (mostly, because of improper LOD scales or spatial index strategies), we had better use this feature to avoid excessive stalls because of hanging up the newly loaded nodes until they are compiled. Of course, more balanced scene graphs, compressed and pre-mipmapped textures, and consistent texture sizes (good for the texture pool mechanism) are always very helpful. The performance of the graphics hardware and operating system can often make a big difference too.

In the last part of this section, we will explain why the previous screenshot gives different memory and I/O monitoring results. It is the unloading and reloading of nodes from disk files that makes a difference! Thus, we can conclude that the cookbook_08_04 executable unloads scene objects (and re-reads them) much more frequently than osgviewer. That is happening because of the use of the setTargetMaximumNumberOfPageLOD() method, which can set a maintaining target for the database pager. When the number of osg::PagedLOD nodes loaded into the scene is greater than the target, the pager will automatically recycle the paged LODs, as they are out-of-date or outside the view frustum. If not, it will just leave these paged nodes in the system memory for caching.

By default, the maximum number of paged nodes is 300. However, we changed it to 10 in this recipe. It means that any invisible paged nodes should be removed as soon as possible in order to fit in a very small target value, and we re-read them when they are in the range of vision again. This leads to drastic changes of memory and I/O as we have seen earlier.

Designing a simple culling strategy

Scene culling is a very important step of the scene rendering operation. In every frame, it checks the visibility of each of the geometries in the current field of vision and reduces the total number of scene objects as much as possible before sending them to the rendering pipeline. A good culling strategy makes the rendering work smooth and does not take much time.

OSG has already provided some efficient culling algorithms that can be used directly. However, sometimes we may have some easier and better solutions for some special cases. In this recipe, we will take a maze game as an example. A maze can be described as a 2D map and an extrusion about the Z axis. Hence, we can cull elements in the maze according to the 2D map. It can be simple but significant if there are massive numbers of small objects to be rendered in the maze.

How to do it...

Let us start.

  1. Include necessary headers.

    #include <osg/Texture2D>
    #include <osg/Geometry>
    #include <osg/ShapeDrawable>
    #include <osg/MatrixTransform>
    #include <osgDB/ReadFile>
    #include <osgGA/FirstPersonManipulator>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    #include <fstream>
    #include <sstream>
    #include <iostream>
    
  2. The maze map is in fact a 2D table with several columns and rows. The index of an element at a certain column and row is defined with a CellIndex. Each table element's value can be either 0 (ground) or 1 (wall), which decides the shape of a 1x1 area in the maze:

    typedef std::pair<int, int> CellIndex;
    typedef std::map<CellIndex, int> CellMap;
    CellMap g_mazeMap;
    
  3. We will use the getOrCreatePlane() function to create a 1x1 quad on the XOY plane. It can be transformed and used to construct a ground tile in the maze, which is walkable:

    osg::Geode* getOrCreatePlane()
    {
    ... // Please see the source code for details
    }
    
  4. Use the getOrCreateBox() function to create a 1x1x1 box. It will be transformed to construct an impassable area surrounded by walls, that is, form one of the maze "blocks". Everything placed in this area is invisible and can be culled before the rendering process:

    osg::Geode* getOrCreateBox()
    {
    ... // Please see the source code for details
    }
    
  5. The next step is to create the maze according to the specified maze map information. We are going to design simple mazes using the following ASCII format:

    1 1 1 1 1 1 0 1
    1 0 0 0 0 0 0 1
    1 0 1 0 1 1 0 1
    1 0 1 0 1 0 0 1
    1 0 1 0 1 1 1 1
    1 1 1 0 0 0 0 1
    0 0 0 0 1 0 1 1
    1 1 1 1 1 1 1 1
    
  6. The previous step generates a maze with 8x8 cells, each of which may be set to 0 or 1. The player (or the viewer) can only walk on the ground cells set to 0, and can only see objects placed on the ground. The createMaze() function will read the map information text from a file:

    osg::Node* createMaze( const std::string& file )
    {
    ...
    }
    
  7. In the function, we first open the map file and fill the g_mazeMap variable with read values. This variable will not only be used to create the maze geometry, but also for checking the visibility of scene objects and helping manipulate the viewer (in first-person mode):

    std::ifstream is( file.c_str() );
    if ( is )
    {
    std::string line;
    int col = 0, row = 0;
    while ( std::getline(is, line) )
    {
    std::stringstream ss(line);
    while ( !ss.eof() )
    {
    int value = 0; ss >> value;
    g_mazeMap[CellIndex(col, row)] = value;
    col++;
    }
    col = 0;
    row++;
    }
    }
    
  8. The second part of the createMaze() function is to generate maze geometries, create all ground or wall tiles, and place them at the correct columns and rows.

    osg::ref_ptr<osg::Group> mazeRoot = new osg::Group;
    for ( CellMap::iterator itr=g_mazeMap.begin();
    itr!=g_mazeMap.end(); ++itr )
    {
    const CellIndex& index = itr->first;
    osg::ref_ptr<osg::MatrixTransform> trans = new
    osg::MatrixTransform;
    trans->setMatrix( osg::Matrix::translate(index.first,
    index.second, 0.0f) );
    mazeRoot->addChild( trans.get() );
    int value = itr->second;
    if ( !value ) // Ground
    trans->addChild( getOrCreatePlane() );
    else // Wall
    trans->addChild( getOrCreateBox() );
    }
    return mazeRoot.release();
    
  9. OSG already provides us a well-designed osgGA::FirstPersonManipulator that is controlled by moving the mouse (look direction) and wheel (move forwards and backwards). However, we have to rewrite it a little to make sure that the viewer can never go into impassable tiles. This is done in the derived MazeManipulator class:

    class MazeManipulator : public osgGA::FirstPersonManipulator
    {
    public:
    virtual bool handle( const osgGA::GUIEventAdapter& ea,
    osgGA::GUIActionAdapter& aa );
    };
    
  10. In the handle() function, we will first record the manipulator's unhandled matrix, do the default manipulating, and then obtain the viewer's position from the new matrix. The position will be converted to index value and passed to the maze map. If it is outside the maze or in an impassable area, roll back the viewer to the last unhandled matrix variable lastMatrix. The method of checking the maze cells is simpler and faster than doing intersections with scene objects directly:

    osg::Matrix lastMatrix = getMatrix();
    bool ok = osgGA::FirstPersonManipulator::handle(ea, aa);
    if ( ea.getEventType()==osgGA::GUIEventAdapter::FRAME ||
    ea.getEventType()==osgGA::GUIEventAdapter::SCROLL )
    {
    osg::Matrix matrix = getMatrix();
    osg::Vec3 pos = matrix.getTrans();
    if ( pos[2]!=0.5f ) // Fix the player height
    {
    pos[2] = 0.5f;
    matrix.setTrans( pos );
    setByMatrix( matrix );
    }
    CellIndex index(int(pos[0] + 0.5f), int(pos[1] + 0.5f));
    CellMap::iterator itr = g_mazeMap.find(index);
    if ( itr==g_mazeMap.end() ) // Outside the maze
    setByMatrix( lastMatrix );
    else if ( itr->second!=0 ) // Don't intersect with walls
    setByMatrix( lastMatrix );
    }
    return ok;
    
  11. Now, in the main entry, we will create the maze from a file named maze.txt (you can find it in the source code directory of this book). However, this is not enough. We will then randomly add a lot of small objects (dumptruck.osg in this recipe, which has over 26000 points) to test the performance of the scene:

    osg::ref_ptr<osg::Group> root = new osg::Group;
    root->getOrCreateStateSet()->setMode( GL_NORMALIZE,
    osg::StateAttribute::ON );
    root->getOrCreateStateSet()->setMode( GL_LIGHTING,
    osg::StateAttribute::OFF );
    root->addChild( createMaze("maze.txt") );
    osg::Node* loadedModel =
    osgDB::readNodeFile("dumptruck.osg" );
    for ( int i=0; i<2000; ++i )
    {
    float x = osgCookBook::randomValue(0.5f, 6.5f);
    float y = osgCookBook::randomValue(0.5f, 6.5f);
    float z = osgCookBook::randomValue(0.0f, 1.0f);
    osg::ref_ptr<osg::MatrixTransform> trans = new
    osg::MatrixTransform;
    trans->setMatrix(osg::Matrix::scale(0.001, 0.001, 0.001) *
    osg::Matrix::translate(x, y, z) );
    trans->addChild( loadedModel );
    osg::ref_ptr<osg::Group> parent = new osg::Group;
    parent->addChild( trans.get() );
    root->addChild( parent.get() );
    }
    
  12. Create the MazeManipulator object and set its home position to the maze entrance. Start the viewer and have a look at the result.

    osg::ref_ptr<MazeManipulator> manipulator = new MazeManipulator;
    manipulator->setHomePosition( osg::Vec3(6.0f, 0.0f, 0.5f),
    osg::Vec3(6.0f, 1.0f, 0.5f), osg::Z_AXIS );
    osgViewer::Viewer viewer;
    viewer.setSceneData( root.get() );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    viewer.setCameraManipulator( manipulator.get() );
    return viewer.run();
    
  13. Scroll the mouse wheel and make yourself move forward. We can see a huge number of trucks in front of you, as shown in the following screenshot. In order to display them, it will cost us lots of CPU and GPU resources. Press the S key to see the frame rate, which may be already slower than desired rate:

  14. Now, it is time for us to design our own culling strategy. We will make use of the node's cull callback instead of deriving a new node type and re-implementing the traverse() method. It is non-intrusive and can be easily applied to most built-in types of OSG nodes:

    class MazeCullCallback : public osg::NodeCallback
    {
    public:
    virtual void operator()( osg::Node* node,
    osg::NodeVisitor* nv );
    bool getCellIndex( CellIndex& index, const osg::Vec3& pos );
    };
    
  15. We must re-implement the operator() to cull the node according to the eye position provided by the cull visitor (nv). It is not the first time that we derive the NodeCallback class to implement cull callbacks; but here we will give up traversing the node's children if the getCellIndex() method returns false. It means the eye or the node itself is not in the visible area:

    void MazeCullCallback::operator()( osg::Node* node,
    osg::NodeVisitor* nv )
    {
    osg::Vec3 eye = nv->getEyePoint();
    osg::Vec3 center = node->getBound().center();
    osg::Matrix l2w = osg::computeLocalToWorld( node->
    getParentalNodePaths()[0] );
    eye = eye * l2w; center = center * l2w;
    CellIndex indexEye, indexNode;
    if ( getCellIndex(indexEye, eye) &&
    getCellIndex(indexNode, center) )
    {
    traverse( node, nv );
    }
    }
    
  16. The getCellIndex() method reads the position value and returns whether its corresponding index is 0 or 1 in the maze map. 0 means current position is not in a wall and can be seen by the viewer:

    bool MazeCullCallback::getCellIndex( CellIndex& index,
    const osg::Vec3& pos )
    {
    index.first = int(pos[0] + 0.5f);
    index.second = int(pos[1] + 0.5f);
    CellMap::iterator itr = g_mazeMap.find(index);
    if ( itr!=g_mazeMap.end() && itr->second==0 )
    return true;
    return false;
    }
    
  17. Now, when we are creating dump-trucks in the loop. Add the MazeCullCallback instance to each truck's parent node:

    parent->setCullCallback( new MazeCullCallback );
    
  18. Done! Now, recompile and see the result again, as shown in the following screenshot. You will find that the application runs much smoother than the last time and the cull time is a little longer because of the extra customized culling process:

How it works...

The osg::NodeCallback class, when used as cull callbacks, can determine whether a node should be culled or not by calling the traverse() method at an appreciated time. If the traverse() method is not executed in a certain condition, it means that the node will be ignored and all its children will not be visited by the cull visitor. Thus, we can design our own strategy, as shown in the following code:

if ( getCellIndex(indexEye, eye) && getCellIndex(indexNode,
center) )
{
traverse( node, nv );
}
// else ignore this node and it's subgraph.

In this example, we use the getCellIndex() method to check if a position in the maze is a wall or a ground. We can only accept the node to be rendered later when both the eye and the node center are in the ground area, and cull others to improve the performance.

Of course this algorithm can be modified to work even better. We can treat the maze walls as occluders, and check if all the line segments from the eye to one node intersect with these walls. Just figure out a solution by yourselves if you are interested in it.

Using occlusion query to cull objects

In the last recipe, we mentioned about the occluders that can be used to skip rendering objects behind them. You may already know the osg::OccluderNode class if you have ever read another OSG book published by Packt Publishing, that is,"OpenSceneGraph 3.0: Beginner's Guide", Rui Wang and Xuelei Qian. In that book, we introduced how to add convex planar occluders to the scene and make them work. This node can create highly efficient results for large scenes where only a small part is visible in each frame (others are hidden behind a few occluders).

However, the occlude node class is a software solution and can cost too much time for culling and unexpectedly decrease the rendering efficiency. Hence, is it possible for the user applications to ask the graphics hardware whether a pixel can be drawn or not? For example, any object hidden by other objects that are closer to the eye can be ignored before it is rendered to the buffer. Can the low-level 3D API check and return the query results efficiently enough?

The answer is yes. There are two possible ways for such kind of occlusion culling: occlusion query and early-Z algorithm. They both increase the rendering performance simply by not rendering geometries that are covered by other scene objects. We will introduce the first solution here, just because it can be done via the NV/ARB_occlusion_query OpenGL extension, and is already encapsulated in the osg::OcclusionQueryNode class in the core OSG library.

How to do it...

Let us start.

  1. The definition of g_mazeMap, the creation of the maze, and the maze manipulator's implementation are just the same as what we did in the last example. Of course, this time we will not use customized callbacks again, but will use occlusion-query nodes instead.

  2. In the main entry, the addition of 2000 dump-trucks will be done like this:

    osg::Node* loadedModel =
    osgDB::readNodeFile("dumptruck.osg" );
    for ( int i=0; i<2000; ++i )
    {
    float x = osgCookBook::randomValue(0.5f, 6.5f);
    float y = osgCookBook::randomValue(0.5f, 6.5f);
    float z = osgCookBook::randomValue(0.0f, 1.0f);
    osg::ref_ptr<osg::MatrixTransform> trans = new
    osg::MatrixTransform;
    trans->setMatrix(osg::Matrix::scale(0.001, 0.001, 0.001) *
    osg::Matrix::translate(x, y, z) );
    trans->addChild( loadedModel );
    osg::ref_ptr<osg::OcclusionQueryNode> parent = new
    osg::OcclusionQueryNode;
    parent->setVisibilityThreshold( 10 ); // Ten pixels
    parent->addChild( trans.get() );
    root->addChild( parent.get() );
    }
    
  3. The only change is to replace the type of truck parent nodes from osg::Group to osg::OcclusionQueryNode and set the necessary threshold attributes. Now, let us start the viewer and see if there are any improvements:

How it works...

The OSG implementation of occlusion query is really simple here. You may just create a new osg::OcclusionQueryNode node and use it as the to-be-culled geometry's parent node. The remainder of the scene will be automatically used to check if it can completely cover any of the query nodes. A query node, if proved to be 'invisible' after the checking process, will be culled and not rendered in the current frame. In this recipe, we have 2000 dump trucks in the query list. They will be efficiently culled by the maze geometries when the application is running.

The real OpenGL's occlusion query, which is working under the hood is a little more complex. It requires us to disable writing to depth buffer, and render the query object's bounding boxes to get the computation result. The result is in fact the number of visible pixels of the current query object's bounding box that are visible. If it is greater than a predefined threshold, then we can enable writing the depth and render this object normally; otherwise it can be treated as 'invisible' and ignored. The method setVisibilityThreshold() records the threshold value here.

However, occlusion query is not efficient enough at present. These queries need too many additional draw calls, and the returning of query results has latency, too. It still has a long way to go before becoming the most useful culling strategy of high efficiency 3D applications.

Managing scene objects with an octree algorithm

In the last chapter, we took enough time to discuss the structure of VPB's terrain models and are already familiar with its quad-tree scene graphs. With the help of LODs and paged LODs, we can quickly manage terrain tiles using the quad-tree algorithm and render unlimited size of terrain data. In fact, many other applications also use quad-tree to optimize the scene while working with massive data, such as city buildings, crowds of people, kinds of networks, and so on. A quad-tree's internal node has exactly four children, so it is always good at handling objects placed on the XOY plane.

What should we do if we have to partition a 3-dimensional space? For example, if we have a number of balls randomly placed in the 3D world, how can we manage them using an efficient spatial indexing algorithm? One of the solutions is called octree. It is another tree structure whose internal node (a 3D region) has exactly eight child regions, as shown in the following diagram:

VPB uses a 2D quad-tree to structure the terrain, similarly we can use a 3D octree to structure volume data or a complex scene (for example, the solar system with massive planets and asteroids) as well. In this recipe, we will use LOD nodes to construct such an octree structure for rendering massive numbers of sphere elements. These spheres are located at random positions in the 3D world and can have different sizes.

OSG also implements a KDTree internally, which can speed up the intersection computation. We will introduce it in the following recipe.

How to do it...

Let us start.

  1. Include necessary headers:

    #include <osg/PolygonMode>
    #include <osg/ShapeDrawable>
    #include <osg/Geometry>
    #include <osg/Geode>
    #include <osg/LOD>
    #include <osgDB/ReadFile>
    #include <osgUtil/PrintVisitor>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    #include <iostream>
    #include <fstream>
    #include <sstream>
    
  2. We will first declare an OctreeBuilder class for constructing a scene graph using the octree algorithm. It uses the setMaxChildNumber() method to determine how many geometries can be contained in one leaf node (default is 16), and the setMaxTreeDepth() method decides the maximum number of levels that the octree can have (default is 32):

    class OctreeBuilder
    {
    public:
    OctreeBuilder() : _maxChildNumber(16), _maxTreeDepth(32),
    _maxLevel(0) {}
    int getMaxLevel() const { return _maxLevel; }
    void setMaxChildNumber( int max ) { _maxChildNumber= max; }
    int getMaxChildNumber() const { return _maxChildNumber; }
    void setMaxTreeDepth( int max ) { _maxTreeDepth = max; }
    int getMaxTreeDepth() const { return _maxTreeDepth; }
    typedef std::pair<std::string, osg::BoundingBox>
    ElementInfo;
    osg::Group* build( int depth, const osg::BoundingBox&
    total, std::vector<ElementInfo>& elements );
    protected:
    osg::LOD* createNewLevel( int level, const osg::Vec3&
    center, float radius );
    osg::Node* createElement( const std::string& id, const
    osg::Vec3& center, float radius );
    osg::Geode* createBoxForDebug( const osg::Vec3& max,
    const osg::Vec3& min );
    int _maxChildNumber;
    int _maxTreeDepth;
    int _maxLevel;
    };
    
  3. The build() method will be recursively called to create each level of the octree structure. User can manually call it with the depth set to 0, and total elements to specify the global boundaries and all the elements of the huge scene:

    osg::Group* OctreeBuilder::build( int depth, const
    osg::BoundingBox& total, std::vector<ElementInfo>& elements )
    {
    ...
    }
    
  4. We have two 3-dimensional arrays for calculating the basic attributes of a region. The s[] array represents all eight cells in any level of an octree. Each value in the array can be 0 or 1 to describe the side (left or right) of the cell on three of the axes (X/Y/Z). The extentSet[] array records the minimum, average, and maximum points in a level's region, which will be used later for calculating the region of its children:

    int s[3]; // axis sides (0 or 1)
    osg::Vec3 extentSet[3] = {
    total._min,
    (total._max + total._min) * 0.5f,
    total._max
    };
    
  5. The elements variable contains all the elements in the scene, and hence we have to find out, which of those really intersect with current region total and save them to a temporary list childData. If elements in current region are few enough to form a leaf node, set isLeafNode to true; otherwise set it to false to go on splitting the space into eight children of the next level:

    std::vector<ElementInfo> childData;
    for ( unsigned int i=0; i<elements.size(); ++i )
    {
    const ElementInfo& obj = elements[i];
    if ( total.contains(obj.second._min) &&
    total.contains(obj.second._max) )
    childData.push_back( obj );
    else if ( total.intersects(obj.second) )
    {
    osg::Vec3 center = (obj.second._max + obj.second._min) * 0.5f;
    if ( total.contains(center) ) childData.push_back( obj );
    }
    }
    bool isLeafNode = false;
    if ( (int)childData.size()<=_maxChildNumber ||
    depth>_maxTreeDepth ) isLeafNode = true;
    osg::ref_ptr<osg::Group> group = new osg::Group;
    if ( !isLeafNode )
    {
    ...
    }
    else
    {
    ...
    }
    
  6. If isLeafNode is false, we will have to set up the region box of eight new child regions of the next level. These child regions are created using osg::Group and added to a parent group node. The build() method will be called recursively with different region parameters to check and build the sub-graphs for them:

    osg::ref_ptr<osg::Group> childNodes[8];
    for ( s[0]=0; s[0]<2; ++s[0] )
    {
    for ( s[1]=0; s[1]<2; ++s[1] )
    {
    for ( s[2]=0; s[2]<2; ++s[2] )
    {
    osg::Vec3 min, max;
    for ( int a=0; a<3; ++a )
    {
    min[a] = (extentSet[s[a] + 0])[a];
    max[a] = (extentSet[s[a] + 1])[a];
    }
    int id = s[0] + (2 * s[1]) + (4 * s[2]);
    childNodes[id] = build( depth+1, osg::BoundingBox
    (min, max), childData );
    }
    }
    }
    for ( unsigned int i=0; i<8; ++i )
    {
    if ( childNodes[i] && childNodes[i]->getNumChildren() )
    group->addChild( childNodes[i] );
    }
    
  7. If the current region is available as a leaf of the octree, we can simply call createElement() to generate the sphere and set the necessary parameters for rendering it. The renderable element will be added to the group node representing the leaf node of the octree:

    for ( unsigned int i=0; i<childData.size(); ++i )
    {
    const ElementInfo& obj = childData[i];
    osg::Vec3 center = (obj.second._max + obj.second._min) * 0.5;
    float radius = (obj.second._max -
    obj.second._min).length() * 0.5f;
    group->addChild( createElement(obj.first, center, radius) );
    }
    
  8. The last step of the build() method is to use an osg::LOD node to finish the construction of the current level. It displays a rough level of details, which only contains a debug box (or may contain nothing) when the viewer's eye is still far away. If the viewer is near enough, it turns to the second child, which may either contain eight child nodes or may be used as a leaf node with a few final spheres (determined by _maxChildNumber):

    osg::Vec3 center = (total._max + total._min) * 0.5;
    float radius = (total._max - total._min).length() * 0.5f;
    osg::LOD* level = createNewLevel( depth, center, radius );
    level->insertChild( 0, createBoxForDebug(total._max,
    total._min) ); // For debug use
    level->insertChild( 1, group.get() );
    return level;
    
  9. The createNewLevel() method is used for creating a customized LOD node:

    osg::LOD* OctreeBuilder::createNewLevel( int level, const
    osg::Vec3& center, float radius )
    {
    osg::ref_ptr<osg::LOD> lod = new osg::LOD;
    lod->setCenterMode( osg::LOD::USER_DEFINED_CENTER );
    lod->setCenter( center );
    lod->setRadius( radius );
    lod->setRange( 0, radius * 5.0f, FLT_MAX );
    lod->setRange( 1, 0.0f, radius * 5.0f );
    if ( _maxLevel<level ) _maxLevel = level;
    return lod.release();
    }
    
  10. The createElement() method creates a renderable sphere and returns it.

    osg::Node* OctreeBuilder::createElement( const std::string&
    id, const osg::Vec3& center, float radius )
    {
    osg::ref_ptr<osg::Geode> geode = new osg::Geode;
    geode->addDrawable( new osg::ShapeDrawable(new
    osg::Sphere(center, radius)) );
    geode->setName( id );
    return geode.release();
    }
    
  11. The createBoxForDebug() method will create a wire-frame box, which can represent the region's bounding box. It is drawn only for debug purposes here:

    osg::Geode* OctreeBuilder::createBoxForDebug( const
    osg::Vec3& max, const osg::Vec3& min )
    {
    ... // Please see source code for details
    }
    
  12. We will also implement a scene graph printing visitor that can write out the scene graph structure and leaf spheres' names to disk files. It is enough to only derive it from the osgUtil::PrintVisitor class:

    class PrintNameVisitor : public osgUtil::PrintVisitor
    {
    public:
    PrintNameVisitor( std::ostream& out ) :
    osgUtil::PrintVisitor(out) {}
    void apply( osg::Node& node )
    {
    if ( !node.getName().empty() )
    {
    output() << node.getName() << std::endl;
    enter();
    traverse( node );
    leave();
    }
    else osgUtil::PrintVisitor::apply(node);
    }
    };
    
  13. We are nearly done. Now, in the main entry, we add 5000 spheres with different positions and radii to the globalElements variable. The global bounding box is computed at the same time. After that, we call the build() method to create the top-level of the octree graph:

    osg::BoundingBox globalBound;
    std::vector<OctreeBuilder::ElementInfo> globalElements;
    for ( unsigned int i=0; i<5000; ++i )
    {
    osg::Vec3 pos = osgCookBook::randomVector( -500.0f, 500.0f );
    float radius = osgCookBook::randomValue( 0.5f, 2.0f );
    std::stringstream ss; ss << "Ball-" << i+1;
    osg::Vec3 min = pos - osg::Vec3(radius, radius, radius);
    osg::Vec3 max = pos + osg::Vec3(radius, radius, radius);
    osg::BoundingBox region(min, max);
    globalBound.expandBy( region );
    globalElements.push_back( OctreeBuilder::ElementInfo(ss.str(),
    region) );
    }
    OctreeBuilder octree;
    osg::ref_ptr<osg::Group> root = octree.build( 0, globalBound,
    globalElements );
    
  14. Print the generated scene graph to an ASCII file and start the viewer to render the huge scene:

    std::ofstream out("octree_output.txt");
    PrintNameVisitor printer( out );
    root->accept( printer );
    osgViewer::Viewer viewer;
    viewer.setSceneData( root.get() );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    return viewer.run();
    
  15. When the application is started, you can see only a box in the view field. Zoom in the camera and you can find the box is divided into smaller ones. Zoom in again, and the spheres in the leaf nodes will be shown when you are close enough to these spheres, as shown in the following screenshot:

How it works...

Let's open the outputted file (which is written out after the application ran once) and paste a part of it here:

osg::LOD
osg::Geode
osg::Group
osg::LOD
osg::Geode
osg::Group
Ball-438
...
osg::LOD
osg::Geode
osg::Group
Ball-729
...
osg::LOD
...

The node named Ball-* are random spheres that have to be rendered in the scene. As we can see from the preceding code, ball nodes are stored in group nodes (leaf nodes of the octree), and group nodes are added as the finer level of LOD nodes. The LOD nodes have customized centers and radii, and will decide when the child leaves can be shown according to the distance between the node center and the eye.

Every eight LOD nodes in the same level will be held together in a group node, and used as the finer level of a parent LOD node. This is actually the basic structure of the octree. The rough levels of all LODs are always represented by wire-frame boxes (osg::Geode).

There's more...

You may find that integrating the scene graph with an indexing algorithm like a quad-tree and octree is not a very complex procedure. In this recipe, we only used the osg::LOD node to manage different levels of the tree, but it is recommended to replace them with osg::PagedLOD to provide paging functionality for huge scene rendering, just like the VPB utility has done to terrain database.

You may be interested in some other spatial indexing methods and their introduction links, including:

Try to implement one or more of them along with the scene graph structure. You can use them either for culling objects before rendering, or for speeding up the intersection work between scene objects and a line segment (or some other operators).

Rendering point cloud data with draw instancing

Point cloud data is widely used in scientific fields. It is formed by a huge number of points, including the positions, normals, colors, and other attributes. Point cloud is usually generated using specific scanners (for example, laser, structured light, and so on) and can describe the surface of any complex objects. There are also many solutions for 3D model reconstruction using point cloud as the source. But the first problem we will face is—how to display them?

You might use a geometry to contain all the points and render them in GL_POINTS mode, but the frame rate may drop seriously if the number is too large. Fortunately, we have the draw instancing extension, which may help improve performance. In Chapter 3, we have already provided a recipe for a drawing instance. In this recipe, we will make use of it again and convert the sample point cloud data to texture and handle them in the shaders.

The cloud rendering example in Chapter 6, has a data.txt file, which can be directly used as the data source.

How to do it...

Let us start.

  1. Include necessary headers.

    #include <osg/Point>
    #include <osg/Group>
    #include <osgDB/ReadFile>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    #include <fstream>
    #include <iostream>
    
  2. We have already learnt that gl_InstanceID can be used to indicate a specific instance of the same drawable object. In order to visualize the sample data, we will have to place these instances at different positions for representing every point element in the point cloud. Thus, we will use the defaultTex variable to pass the texture object to the shader and use texels to describe point locations:

    const char* vertCode = {
    "uniform sampler2D defaultTex;\n"
    "uniform int width;\n"
    "uniform int height;\n"
    "varying float brightness;\n"
    "void main()\n"
    "{\n"
    " float row = float(gl_InstanceID) /
    float(width);\n"
    " vec2 uv = vec2(fract(row), floor(row) /
    float(height));\n"
    " vec4 texValue = texture2D(defaultTex, uv);\n"
    // Read and specify the position data from texture
    " vec4 pos = gl_Vertex + vec4(texValue.xyz, 1.0);\n"
    // Use alpha of the texel as the brightness value
    " brightness = texValue.a;\n"
    " gl_Position = gl_ModelViewProjectionMatrix * pos;\n"
    "}\n"
    };
    
  3. The fragment shader will be used to draw the color of points according to the brightness parameter read from the alpha component of the texture:

    const char* fragCode = {
    "varying float brightness;\n"
    "void main()\n"
    "{\n"
    " gl_FragColor = vec4(brightness, brightness,
    brightness, 1.0);\n"
    "}\n"
    };
    
  4. Now, we will create the instanced geometry in the createInstancedGeometry() function. It uses an osg::Image object to record point cloud data:

    osg::Geometry* createInstancedGeometry( osg::Image* img,
    unsigned int numInstances )
    {
    ...
    }
    
  5. In the function, we first create a geometry with only one point. It should use the numInstances parameter to enable using the OpenGL draw instancing extension, and disable display lists for the purpose of dynamic modification:

    osg::ref_ptr<osg::Geometry> geom = new osg::Geometry;
    geom->setUseDisplayList( false );
    geom->setUseVertexBufferObjects( true );
    geom->setVertexArray( new osg::Vec3Array(1) );
    geom->addPrimitiveSet( new osg::DrawArrays(GL_POINTS, 0, 1,
    numInstances) );
    
  6. Then we will set the image object as the texture of the geometry. We will add necessary uniforms and the program object to the state set in order to make shaders work properly:

    osg::ref_ptr<osg::Texture2D> texture = new osg::Texture2D;
    texture->setImage( img );
    texture->setInternalFormat( GL_RGBA32F_ARB );
    texture->setFilter( osg::Texture2D::MIN_FILTER,
    osg::Texture2D::LINEAR );
    texture->setFilter( osg::Texture2D::MAG_FILTER,
    osg::Texture2D::LINEAR );
    geom->getOrCreateStateSet()->setTextureAttributeAndModes
    ( 0, texture.get() );
    geom->getOrCreateStateSet()->addUniform( new
    osg::Uniform("defaultTex", 0) );
    geom->getOrCreateStateSet()->addUniform( new
    osg::Uniform("width", (int)img->s()) );
    geom->getOrCreateStateSet()->addUniform( new
    osg::Uniform("height", (int)img->t()) );
    osg::ref_ptr<osg::Program> program = new osg::Program;
    program->addShader( new osg::Shader(osg::Shader::VERTEX,
    vertCode) );
    program->addShader( new osg::Shader(osg::Shader::FRAGMENT,
    fragCode) );
    geom->getOrCreateStateSet()->setAttributeAndModes
    ( program.get() );
    return geom.release(); // end of createInstancedGeometry()
    
  7. We will implement one more function that is the readPointData() method. It will be used to load point data from an ASCII file and save them in an image with specified size (w and h):

    osg::Geometry* readPointData( const std::string& file,
    unsigned int w, unsigned int h )
    {
    ...
    }
    
  8. The image must use the GL_RGBA and GL_FLOAT format. We will discuss the reason why we should use this format later in the How it works... section. The position and brightness values read from the text file will be set to the data pointer of the image:

    std::ifstream is( file.c_str() );
    if ( !is ) return NULL;
    osg::ref_ptr<osg::Image> image = new osg::Image;
    image->allocateImage( w, h, 1, GL_RGBA, GL_FLOAT );
    unsigned int density, brightness;
    osg::BoundingBox boundBox;
    float* data = (float*)image->data();
    while ( !is.eof() )
    {
    osg::Vec3 pos;
    is >> pos[0] >> pos[1] >> pos[2] >> density >> brightness;
    boundBox.expandBy( pos );
    *(data++) = pos[0];
    *(data++) = pos[1];
    *(data++) = pos[2];
    *(data++) = brightness / 255.0;
    }
    
  9. We will create the geometry and accept an initial bounding box to make it visible in the scene; otherwise such a geometry object with only one point will be automatically culled while rendering.

    osg::ref_ptr<osg::Geometry> geom = createInstancedGeometry
    ( image.get(), w*h );
    geom->setInitialBound( boundBox );
    geom->getOrCreateStateSet()->setAttributeAndModes
    ( new osg::Point(5.0f) );
    return geom.release();
    
  10. Now in the main entry, we will add the draw instancing geometry to the scene graph and render it in the viewer. A 512x512 sized image is enough here to contain all the points in the sample data.txt file (from the cloud rendering example in Chapter 6) which includes about 64000 lines:

    osg::ref_ptr<osg::Geode> geode = new osg::Geode;
    geode->addDrawable( readPointData("data.txt", 512, 512) );
    osg::ref_ptr<osg::Group> root = new osg::Group;
    root->addChild( geode.get() );
    osgViewer::Viewer viewer;
    viewer.setSceneData( root.get() );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    return viewer.run();
    
  11. The result is shown in the following screenshot. It renders smoothly and doesn't need too much system memory on the CPU side. You can make any changes or improvements to the rendered data by altering the shader code. You can also modify the primitive set of the origin geometry object to change the shapes of all points or apply textures on them:

How it works...

The most important step in this recipe is to add the point cloud data to the image object, which should be used in the shader code for efficiently rendering the points. The osg::Image class uses the allocateImage() method to create an empty image object, and provides the data() method for developers to set up image pixels. We assume that the point cloud requires to record only the position and brightness of each element, so it is enough to have four float components (pixel format GL_RGBA, datatype GL_FLOAT) for each pixel. The first three components save the XYZ values of the point, and the last one saves the brightness parameter.

The draw instancing extension is not all that powerful. The size of texture object for keeping point cloud data cannot be infinite. If we have millions or even billions of points to display, the current solution will not be suitable for it. In that case, some spatial index methods can be used as well. For instance, the octree algorithm, which was introduced in this chapter, will be a good idea here. We can first pre-treat all the point data and divide them into different smaller areas. Each area uses draw instancing to render actual point attributes. Then we can use the octree algorithm along with the paged LOD nodes to manage these 'leaf' areas, guaranteeing that there are not too many points shown at the same time.

Speeding up the scene intersections

There is a common demand while developing huge scene viewing applications, such as when we move the mouse onto the terrain or other scene objects, we hope that we can immediately see the current intersection result of the mouse coordinate and the 3D objects. The intersection result is useful for practical use, for example, describing user's point of interest on the earth or in a digital city.

As OSG uses bounding volumes to manage scene graphs, the computation process can be much quicker than intersecting with all scene objects one by one. However, we still have room for speeding up the process. OSG internally provides the KDTree structure for scene intersections with geometries, which can help do the calculations and return results in a faster way.

Another potential problem to be considered here is that the results may not be accurate enough while intersecting with paged scene. That's because paged nodes are loaded due to the current viewer's position and attitude, and the highest level may not be reachable by the intersection visitor at any time. We will discuss a solution for this issue in this recipe.

How to do it...

Let us start.

  1. Include necessary headers:

    #include <osg/Group>
    #include <osgDB/ReadFile>
    #include <osgViewer/ViewerEventHandlers>
    #include <osgViewer/Viewer>
    #include <sstream>
    #include <iostream>
    
  2. The PagedPickHandler class is designed to calculate the intersections of current cursor coordinate and the scene graph when the mouse is moving. It refreshes the result intersection point to an HUD text object and can accept a reading callback for obtaining the highest levels of paged nodes:

    class PagedPickHandler : public osgGA::GUIEventHandler
    {
    public:
    virtual bool handle( const osgGA::GUIEventAdapter& ea,
    osgGA::GUIActionAdapter& aa );
    osg::ref_ptr<osgUtil::IntersectionVisitor::ReadCallback>
    _pagedReader;
    osg::ref_ptr<osgText::Text> _text;
    };
    
  3. In the handle() method, we will do the computations for all user events except the FRAME event:

    if ( ea.getEventType()==osgGA::GUIEventAdapter::FRAME )
    return false;
    osgViewer::View* viewer = dynamic_cast<osgViewer::View*>(&aa);
    if ( viewer && _text.valid() )
    {
    ...
    }
    
  4. The osg::Timer object is used for timekeeping. We will retrieve a time value and then use the intersection visitor to traverse the whole scene. The _pagedReader callback, if valid, will be applied to the setReadCallback() method here for handling paged nodes. We will show you how to quickly implement such a new callback later, but at present, we will leave this variable to NULL.

    osg::Timer_t t1 = osg::Timer::instance()->tick();
    osg::ref_ptr<osgUtil::LineSegmentIntersector> intersector =
    new osgUtil::LineSegmentIntersector
    (osgUtil::Intersector::WINDOW, ea.getX(), ea.getY());
    osgUtil::IntersectionVisitor iv( intersector.get() );
    iv.setReadCallback( _pagedReader.get() );
    viewer->getCamera()->accept( iv );
    
  5. If there is any result that is, the mouse has an intersection point with the scene in world space. Then we will obtain the intersection result and record the time again. The difference between the two time variables is the time spent for scene intersections. We will print all these this information on the screen in real-time:

    if ( intersector->containsIntersections() )
    {
    osgUtil::LineSegmentIntersector::Intersection result =
    *(intersector->getIntersections().begin());
    osg::Vec3 point = result.getWorldIntersectPoint();
    osg::Timer_t t2 = osg::Timer::instance()->tick();
    std::stringstream ss;
    ss << "X = " << point.x() << "; ";
    ss << "Y = " << point.y() << "; ";
    ss << "Z = " << point.z() << "; ";
    ss << "Delta time = " << osg::Timer::instance()->
    delta_m(t1, t2) << "ms" << std::endl;
    _text->setText( ss.str() );
    }
    
  6. In the main entry, we will use the setBuildKdTreesHint() method to enable building KDTree structure for all scene objects. Then we are going to construct the scene graph with the loaded model and an HUD camera displaying the text:

    osg::ArgumentParser arguments( &argc, argv );
    osgDB::Registry::instance()->setBuildKdTreesHint(
    osgDB::Options::BUILD_KDTREES );
    osg::ref_ptr<osg::Node> loadedModel =
    osgDB::readNodeFiles(arguments);
    osgText::Text* text =
    osgCookBook::createText(osg::Vec3(50.0f, 50.0f, 0.0f),
    "", 10.0f);
    osg::ref_ptr<osg::Geode> textGeode = new osg::Geode;
    textGeode->addDrawable( text );
    osg::ref_ptr<osg::Camera> hudCamera =
    osgCookBook::createHUDCamera(0, 800, 0, 600);
    hudCamera->addChild( textGeode.get() );
    osg::ref_ptr<osg::Group> root = new osg::Group;
    root->addChild( hudCamera.get() );
    root->addChild( loadedModel.get() );
    
  7. The PagedPickHandler will be then allocated and added to the viewer for intersection operations:

    osg::ref_ptr<PagedPickHandler> picker =
    new PagedPickHandler;
    picker->_text = text;
    osgViewer::Viewer viewer;
    viewer.setSceneData( root.get() );
    viewer.addEventHandler( picker.get() );
    viewer.addEventHandler( new osgViewer::StatsHandler );
    return viewer.run();
    
  8. We hope you kept the terrain models generated in the last chapter. The gcanyon data is a good sample for testing the picking handler here. We can execute this recipe (the executable is named as cookbook_08_09) using the following command:

    # cookbook_08_09 output/out.osgb
    
    
  9. The computation time is about 0.06-0.09 milliseconds on the author's computer. The output generated is shown in the following screenshot:

  10. Now delete the line enabling the KDTree. Rebuild the application and run it again. The time increases to 0.2 0.5 milliseconds.

  11. Now, we start implementing the special reading callback. It is derived from the osgUtil::IntersectionVisitor::ReadCallback class and the only method to be overridden is readNodeFile(). Regardless of file caching and other optimization solutions, we can directly load the filename variable by calling the osgDB::readNodeFile() method. The input filename variable is automatically invoked by the intersector and is actually the filename of the highest level child of each osg::PagedLOD node:

    struct PagedReaderCallback : public
    osgUtil::IntersectionVisitor::ReadCallback
    {
    virtual osg::Node* readNodeFile( const std::string& filename )
    { return osgDB::readNodeFile(filename); }
    };
    
  12. Set up the _pagedReader variable in the main entry:

    picker->_pagedReader = new PagedReaderCallback;
    
  13. Restart the application again. Oh, you will see the computation speed is much slower than before (even 200-400 milliseconds). This happens because we spent a lot of time loading new files in the paged LOD's subgraph. This is the price of obtaining the most precise intersection results.

How it works...

The KDTree is a binary tree for organizing vertices in a k-dimensional space. It is a special case of the famous Binary Space Partitioning (BSP) tree, but very useful for range searching and some other types of multidimensional data searching.

OSG encapsulates the KDTree algorithm totally in the OSG core and uses it to manage vertices in any osg::Geometry objects. By default, the intersection visitor can make use of the scene graph's bounding volume hierarchy to quickly find the leaf nodes that may have intersections with the intersector. But on the geometry level, it must traverse all the primitives (for example all the triangles) to find out intersections, no matter whether only a small portion of the node is intersected. Most calculations are useless here and will cost precious time. In this case, KDTree is extremely important, as it creates another new spatial index structure inside the geometry and thus makes the traversing of primitives much faster.

The following line will enable building KDTree structure for all geometries loaded:

osgDB::Registry::instance()->setBuildKdTreesHint(
osgDB::Options::BUILD_KDTREES );

But we can also enable KDTree on a few specified nodes before they are ready to be loaded, for instance:

osg::ref_ptr<osgDB::Options> options = new osgDB::Options;
options->setBuildKdTreesHint( osgDB::Options::BUILD_KDTREES );
osg::Node* model = osgDB::readNodeFile( "cow.osg", options.get() );