JHashTable works together with with JHashCursor and JConstHashCursor to provide the basic JCore hash table facilities. It sacrifices a great deal of safety and generality in favor of performance. Because of this it only partly shields the client from the implementation details and desperately needs another object to encapsulate and protect it and the the outside world from each other. It should only be used as a low-level building block for other objects assembled with either ownership or inheritance; if you're using it as a storage object by itself you're misusing it (maybe you want something like JStringMap instead).
On the other hand, if you are building an object that does a lot of in-memory storage and retrieval on large numbers of individual records you're probably looking in the right place. You need to understand something about hash tables to use it directly, so if you know enough to use it safely you probably know whether a hash table is an appropriate data structure for your application.
Usage
Like any hash table, JHashTable stores its values in an internal array. This means, obviously, that its data items must be scalars (objects with a copy constructor and an assignment operator). If this is not true, you will have to store pointers to the actual objects. Perhaps less obviously, it also means that JHashTable must never be used to store a subclass of the template class which has its own data in addition to the base class data–C++ will most certainly slice it off. You must instantiate the template with the type you actually wish to store, or store pointers to the data instead.
The client needs to provide hash and comparison functions appropriate for the data being stored or always handle those services. The hash function must return values with all 32 bits significant, and failure to provide a good 32 bit hash function can entail a substantial degradation of performance.
Requiring that the hash value always be calculated internally would have been safer but slower, and in particular client code can often can avoid unnecessary and expensive calls to the hash function by repeatedly passing in a cached hash value. It is up to the client to ensure that the same hash function is always used, and failure to do so will destroy the table. This can happen in slightly inobvious ways, such as removing an item from a table which uses one hash function and moving it to a table which uses another, inserting it in a way that does not tell the second table to re-calculate the hash value.
JHashTable's public interface is very sparse, mainly providing access to the operating statistics and the parameters which control the resizing behavior It allows the client to adjust the time-space tradeoff involved in the resizing process.
To store and retrieve items the client will need to go through a cursor and perhaps directly through the protected interface. (The cursor interface is described more fully in the appropriate cursor *.tmpl files.) It is very easy to rip the table apart with either interface, so be careful.
The cursor interface is the primary interface and encapsulates the logic and temporary data necessary to traverse and modify the hash table. The protected interface is mainly of use for setting appropriate parameters in the constructors of derived classes; most of the code will still use the cursor interface. Not only is it more convenient, it is a good deal safer, and JHashTable even uses cursors internally to manipulate itself during resizing.
Design and Implementation
The entire design of JHashTable is driven by the primary goal of a fully dynamic table which transparently and efficiently resizes its internal table as necessary to maintain good performance. This has many consequences. First, hash values are not direct indexes into the internal table (which is unlikely to have 2^32 entries!) but rather full 32-bit values. Two benefits of this scheme is that the most common architectures tend to manipulate 32 bit quantities efficiently and that 32 bit values allow tables as large as can reasonably be expected to fit into memory (at least in the late 1990's!) without wasting too much space. In addition, I happen to have constants for a very fast 32-bit random number generator for dual-hashing.
Full word hash values have the inconvenient effect that they must be taken modulo the table size to obtain an index, but the very desirable property that if hash values are cached with the item it is very easy to re-hash the item in another table of a different size without reevaluating the hash function. This is vital for building dynamic tables efficiently, since the hash function is often quite expensive. The hash value occupies and extra four bytes of data per record, so caching trades space for time. However, this is probably justified. Consider the most common case where the keys are strings. Any reasonable general-purpose hash function (one which does not make use of special knowledge of the particular keys to be stored) will take time proportional to the length of the string, which can quickly become very expensive.
JHashTable limits table sizes to an even power of two. This allows an efficient bit-wise wrap for pointers rather than the slower modulus function (the HashToIndex function) and makes it easy to support dynamic resizing by doubling and halving the table as necessary. Powers of two also tend to interact well with memory managers. Malloc, operator new, and their kin often allocate a block of memory of the next even power-of-two larger than the size requested, so asking for power-of-two sizes allows efficient use of memory. (Of course it doesn't guarantee it, since the size of the data stored may not be an even power of two. If you really care, since the hash values are four bytes just make sure that sizeof(V) == 2^N-4 for some integral N.)
Finally, JHashTable uses double-hashing for space efficiency. Power-of-two tablesizes make it simple to ensure that a stepping interval obtained from a dual hashing function is relatively prime with the table size–just make sure it is odd! This can be done very efficiently by bit-wise anding the value with 1. (I've seen good textbooks go haywire and suggest that the table size be prime number! I believe the rationale for this trick is suspect even for fixed tablesize student code. It is not much easier to calculate, makes it much harder to find arbitrary new table sizes when dynamically resizing, and comes with none of the other advantages.)
Finally, for clarity and maintainability the algorithms for traversing the table are entirely external. The basic operations are in JConstHashCursor and JHashCursor, and they may be used to implement more user-friendly ones in external client code.
Future Development
While JHashTable is still under development, the ultimate goal is extremely high performance when manipulating highly dynamic tables of tens of thousands of entries or more. The basic code is now complete, so the main development goals are finalizing the interface, testing for performance and robustness, and addressing the several pages of optimization notes.
Base code generated by Codemill v0.1.0 lgSize is the base two log of the minimum and initial size of the hash table, not the table size itself. This will help you avoid the temptation to make tables whose sizes are not powers of two.