RJudy

RJudy is a Ruby extension module that provides an interface to the Judy arrays library. Judy's home page is at www.sourcejudy.com.

Judy was originally implemented at Hewlett-Packard, and was subsequently open-sourced in June 2002 (with an LGPL license). Doug Baskins, the author of the Judy algorithm, has written a nice summary (see www.sourcejudy.com/downloads/10minutes.htm) of what makes Judy arrays so fast, and why you might want to use them.

Build and Installation Instructions

Before you can build, install and use this extension module you'll need to first obtain and install the Judy header files and library. The source tarball for the latest version of Judy can be found at its SourceForge project page (sourceforge.net/projects/judy). If you get lucky, you may also be able to find precompiled binaries for your platform.

Once you've built and installed Judy itself, you should be ready to build and install this extension module. From the top-level directory, first configure the build by typing:

    ruby install.rb config

Next, start the build process by typing:

    ruby install.rb setup

Finally, install it by typing:

    ruby install.rb install

Note that on Unix systems, you'll probably need to be logged in as "root" for that last step.

Summary

Once you've built and installed RJudy, you can use any of its bundled classes (Judy1, JudyL or JudySL) by simply require-ing the "judy" feature, i.e.

    require 'judy'

    judy1Array  = Judy::Judy1.new
    judyLArray  = Judy::JudyL.new
    judySLArray = Judy::JudySLArray.new

For a listing of the instance methods for each of these classes, refer to their respective documentation pages. For more background information about Judy arrays, refer to the Judy home page.

Test Suite

Test cases for each of the Judy classes are provided in the "tests" subdirectory. These tests use the Test::Unit framework, available for download from testunit.talbott.ws.

To run one of the individual tests, type something like:

    ruby TC_JudyL.rb

To run all of the tests, type:

    ruby TS_All.rb

You shouldn't get any failures; if you do, please contact lyle@users.sourceforge.net and let him know about it!

Credits

Judy was invented by Doug Baskins (doug@sourcejudy.com) and implemented by Hewlett-Packard.
SWIG (www.swig.org) was used to generate much of the wrapper code used for RJudy.

License

RJudy is Copyright (c) 2002 by Lyle Johnson, and is provided under the GNU Lesser General Public License (www.gnu.org/licenses/lgpl.html). See the LICENSE file in the top-level directory of the RJudy source distribution for more details.

Change History

Changes Since Version 0.1

The previous implementations of Judy1, JudyL, JudySL and JudyHash used the calloc() function for memory allocation instead of Ruby's ALLOC(). As a result, the normal Ruby garbage collection heuristics would not come into play and no garbage collection would take place during insertions into a JudyHash instance. This has been corrected; thanks to Mauricio Fernandez for pointing out this flaw.
The previous implementation of JudyHash did not make a duplicate of new string keys inserted into the hash, it just stored a reference to the input string key. This was inconsistent with Ruby's built-in Hash which does make a duplicate of new string keys. This has been corrected; thanks to Mauricio Fernandez for pointing out this flaw.
The previous version of the words.rb example wasn't quite fair because it failed to run the garbage collector between tests. As a result, the tests run first will have an advantage over tests run later, when the garbage collector may get called more frequently. I've rewritten this program to instead use the benchmark library's bmbm method, which should run the garbage collector in between tests and hopefully even up the playing field. Thanks to Mauricio Fernandez for pointing out this flaw in the tests.
Per Doug Baskins' advice, replaced all calls to Judy library functions (such as JudyLGet() and JudyLNext()) with calls to their macro equivalents (e.g. JLG() and JLN()). This should improve overall performance, especially for hash lookups.