#include <JHashTable.h>

Inheritance diagram for JHashTable< V >:

Public Member Functions
	JHashTable (const JSize lgSize=kJDefaultLgMinTableSize)

virtual	~JHashTable ()

bool	IsEmpty () const

JSize	GetItemCount () const

JSize	GetLgSize () const

JSize	GetLoadCount () const

JSize	GetTableSize () const

JFloat	GetFillFactor () const

JFloat	GetLoadFactor () const

bool	GetResizeEnabled () const

void	SetResizeEnabled (const bool enabled)

JFloat	GetMaxLoadFactor () const

void	SetMaxLoadFactor (const JFloat newMax)

JFloat	GetMinFillFactor () const

void	SetMinFillFactor (const JFloat newMin)

JSize	GetMinTableSize () const

void	SetMinTableSize (const JSize newMin)

bool	IsOK () const

Protected Member Functions
void	AllowCursors ()

void	DisallowCursors ()

JHashCursor< V > *	GetCursor ()

JConstHashCursor< V > *	GetCursor () const

const JHashRecord< V > &	GetRecord (const JSize index) const

JHashRecordT::State	GetState (const JSize index) const

bool	IsEmpty (const JSize index) const

bool	IsDeleted (const JSize index) const

bool	IsFull (const JSize index) const

JHashValue	GetHashValue (const JSize index) const

const V &	GetValue (const JSize index) const

void	Set (const JSize index, const JHashRecord< V > &record)

void	Set (const JSize index, const JHashValue hash, const V &value)

void	Set (const JSize index, const V &value)

void	SetHash (const JSize index, const V &value)

void	Remove (const JSize index)

void	MarkEmpty (const JSize index)

void	MarkAllEmpty ()

bool	TryResizeTable (const JSize lgSize)

void	ResizeTable (const JSize lgTrialSize)

virtual bool	FitToLimits (const JSize required=0, const bool force=false)

JSize	HashToIndex (JHashValue hash) const

Friends
class	JConstHashCursor< V >

class	JHashCursor< V >

Detailed Description

template<class V>
class JHashTable< V >

Interface for the JConstHashCursor class.

Base code generated by Codemill v0.1.0

Interface for the JHashCursor class.

Base code generated by Codemill v0.1.0

Constructor & Destructor Documentation

◆ JHashTable()

template<class V >

JHashTable< V >::JHashTable ( const JSize lgSize = kJDefaultLgMinTableSize )

JHashTable works together with with JHashCursor and JConstHashCursor to provide the basic JCore hash table facilities. It sacrifices a great deal of safety and generality in favor of performance. Because of this it only partly shields the client from the implementation details and desperately needs another object to encapsulate and protect it and the the outside world from each other. It should only be used as a low-level building block for other objects assembled with either ownership or inheritance; if you're using it as a storage object by itself you're misusing it (maybe you want something like JStringMap instead).

On the other hand, if you are building an object that does a lot of in-memory storage and retrieval on large numbers of individual records you're probably looking in the right place. You need to understand something about hash tables to use it directly, so if you know enough to use it safely you probably know whether a hash table is an appropriate data structure for your application.

Usage

Like any hash table, JHashTable stores its values in an internal array. This means, obviously, that its data items must be scalars (objects with a copy constructor and an assignment operator). If this is not true, you will have to store pointers to the actual objects. Perhaps less obviously, it also means that JHashTable must never be used to store a subclass of the template class which has its own data in addition to the base class data–C++ will most certainly slice it off. You must instantiate the template with the type you actually wish to store, or store pointers to the data instead.

The client needs to provide hash and comparison functions appropriate for the data being stored or always handle those services. The hash function must return values with all 32 bits significant, and failure to provide a good 32 bit hash function can entail a substantial degradation of performance.

Requiring that the hash value always be calculated internally would have been safer but slower, and in particular client code can often can avoid unnecessary and expensive calls to the hash function by repeatedly passing in a cached hash value. It is up to the client to ensure that the same hash function is always used, and failure to do so will destroy the table. This can happen in slightly inobvious ways, such as removing an item from a table which uses one hash function and moving it to a table which uses another, inserting it in a way that does not tell the second table to re-calculate the hash value.

JHashTable's public interface is very sparse, mainly providing access to the operating statistics and the parameters which control the resizing behavior It allows the client to adjust the time-space tradeoff involved in the resizing process.

To store and retrieve items the client will need to go through a cursor and perhaps directly through the protected interface. (The cursor interface is described more fully in the appropriate cursor *.tmpl files.) It is very easy to rip the table apart with either interface, so be careful.

The cursor interface is the primary interface and encapsulates the logic and temporary data necessary to traverse and modify the hash table. The protected interface is mainly of use for setting appropriate parameters in the constructors of derived classes; most of the code will still use the cursor interface. Not only is it more convenient, it is a good deal safer, and JHashTable even uses cursors internally to manipulate itself during resizing.

Design and Implementation

The entire design of JHashTable is driven by the primary goal of a fully dynamic table which transparently and efficiently resizes its internal table as necessary to maintain good performance. This has many consequences. First, hash values are not direct indexes into the internal table (which is unlikely to have 2^32 entries!) but rather full 32-bit values. Two benefits of this scheme is that the most common architectures tend to manipulate 32 bit quantities efficiently and that 32 bit values allow tables as large as can reasonably be expected to fit into memory (at least in the late 1990's!) without wasting too much space. In addition, I happen to have constants for a very fast 32-bit random number generator for dual-hashing.

Full word hash values have the inconvenient effect that they must be taken modulo the table size to obtain an index, but the very desirable property that if hash values are cached with the item it is very easy to re-hash the item in another table of a different size without reevaluating the hash function. This is vital for building dynamic tables efficiently, since the hash function is often quite expensive. The hash value occupies and extra four bytes of data per record, so caching trades space for time. However, this is probably justified. Consider the most common case where the keys are strings. Any reasonable general-purpose hash function (one which does not make use of special knowledge of the particular keys to be stored) will take time proportional to the length of the string, which can quickly become very expensive.

JHashTable limits table sizes to an even power of two. This allows an efficient bit-wise wrap for pointers rather than the slower modulus function (the HashToIndex function) and makes it easy to support dynamic resizing by doubling and halving the table as necessary. Powers of two also tend to interact well with memory managers. Malloc, operator new, and their kin often allocate a block of memory of the next even power-of-two larger than the size requested, so asking for power-of-two sizes allows efficient use of memory. (Of course it doesn't guarantee it, since the size of the data stored may not be an even power of two. If you really care, since the hash values are four bytes just make sure that sizeof(V) == 2^N-4 for some integral N.)

Finally, JHashTable uses double-hashing for space efficiency. Power-of-two tablesizes make it simple to ensure that a stepping interval obtained from a dual hashing function is relatively prime with the table size–just make sure it is odd! This can be done very efficiently by bit-wise anding the value with 1. (I've seen good textbooks go haywire and suggest that the table size be prime number! I believe the rationale for this trick is suspect even for fixed tablesize student code. It is not much easier to calculate, makes it much harder to find arbitrary new table sizes when dynamically resizing, and comes with none of the other advantages.)

Finally, for clarity and maintainability the algorithms for traversing the table are entirely external. The basic operations are in JConstHashCursor and JHashCursor, and they may be used to implement more user-friendly ones in external client code.

Future Development

While JHashTable is still under development, the ultimate goal is extremely high performance when manipulating highly dynamic tables of tens of thousands of entries or more. The basic code is now complete, so the main development goals are finalizing the interface, testing for performance and robustness, and addressing the several pages of optimization notes.

Base code generated by Codemill v0.1.0 lgSize is the base two log of the minimum and initial size of the hash table, not the table size itself. This will help you avoid the temptation to make tables whose sizes are not powers of two.

◆ ~JHashTable()

template<class V >

JHashTable< V >::~JHashTable ( )

virtual

Member Function Documentation

◆ AllowCursors()

template<class V >

void JHashTable< V >::AllowCursors ( )

inlineprotected

◆ DisallowCursors()

template<class V >

void JHashTable< V >::DisallowCursors ( )

inlineprotected

◆ FitToLimits()

template<class V >

bool JHashTable< V >::FitToLimits	(	const JSize	required = `0`,
		const bool	force = `false`
	)

protectedvirtual

FitToLimits encapsulates as much as possible of the resizing behavior of JHashTable in one place. It attempts to be as efficient as possible (that is, perform as few resizes as possible) for a wide range of insert, search, and delete behaviors.

If 'force' is true, a resize is always done (and thus the load factor is reduced to the fill factor), unless resizing is disabled. If 'force is false (the default) a resize is only done if there are fewer slots available than that specified by 'required' argument, or if the current limits are being violated. After a resize, there will be enough space for at least 'required' more items before the limits will be violated.

Note that the 'required' argument let's you make a single call to guarantee that an insert will succeed before you do it. If you're going to insert multiple items, you can avoid avoid any intermediate resizes by disabling resizing while you insert.

The return value is true if a resize was actually done.

This default implementation should need customization seldom or never, but this function is virtual so the ability is there. If you write your own version please contact the author so that improvements to the algorithm can benefit everyone in future releases. Note that the hash table depends internally on the behaviors specified above, so client-specified versions should obey them carefully.

◆ GetCursor() [1/2]

template<class V >

JHashCursor< V > * JHashTable< V >::GetCursor ( )

inlineprotected

◆ GetCursor() [2/2]

template<class V >

JConstHashCursor< V > * JHashTable< V >::GetCursor ( ) const

inlineprotected

◆ GetFillFactor()

template<class V >

JFloat JHashTable< V >::GetFillFactor ( ) const

inline

◆ GetHashValue()

template<class V >

template<class V >

bool JHashTable< V >::IsEmpty ( const JSize index ) const

inlineprotected

Warning: no bounds checking, and the argument must be an index because it is not wrapped to the tablesize!

◆ IsFull()

template<class V >

bool JHashTable< V >::IsFull ( const JSize index ) const

inlineprotected

Warning: no bounds checking, and the argument must be an index because it is not wrapped to the tablesize!

◆ IsOK()

template<class V >

bool JHashTable< V >::IsOK ( ) const

Does a complete consistency check on the hash table. It is mainly for testing, but feel free to call it if you suspect bugs in the hash table code.

◆ MarkAllEmpty()

template<class V >

void JHashTable< V >::MarkAllEmpty ( )

protected

DANGER! Extreme memory leak hazard!

Since this unconditionally marks all records as empty, using this with any instantiation which stores references to dynamic memory in the records will leak unless you have the references somewhere else. This is very easy to misuse....

◆ MarkEmpty()

template<class V >

void JHashTable< V >::MarkEmpty ( const JSize index )

protected

DANGER! Extreme memory leak hazard! No dynamic memory will be freed, so make sure you know what you're doing.

◆ Remove()

template<class V >

void JHashTable< V >::Remove ( const JSize index )

protected

◆ ResizeTable()

template<class V >

void JHashTable< V >::ResizeTable ( const JSize lgTrialSize )

protected

Does not respect limits–use FitToLimits for that.

◆ Set() [1/3]

template<class V >

void JHashTable< V >::Set	(	const JSize	index,
		const JHashRecord< V > &	record
	)

protected

Warning: no bounds checking, and the argument must be an index because it is not wrapped to the tablesize! However, the Set...functions do invoke FitToLimits() as needed.

How a record may be set is restricted just as for individual records to make it harder to make errors.

The record being set must always be in a kFull state after the set. For this reason, Set() automatically sets the state to kFull. To help you keep this straight, the version which takes a record asserts that the record is full.

◆ Set() [2/3]

template<class V >

void JHashTable< V >::Set	(	const JSize	index,
		const JHashValue	hash,
		const V &	value
	)

protected

◆ Set() [3/3]

template<class V >

void JHashTable< V >::Set	(	const JSize	index,
		const V &	value
	)

protected

◆ SetHash()

template<class V >

void JHashTable< V >::SetHash	(	const JSize	index,
		const V &	value
	)

protected

◆ SetMaxLoadFactor()

template<class V >

void JHashTable< V >::SetMaxLoadFactor ( const JFloat newMax )

Will be silently coerced into the range [0.1, 1.0].

◆ SetMinFillFactor()

template<class V >

void JHashTable< V >::SetMinFillFactor ( const JFloat newMin )

Will be silently coerced into the range [0.0, 1.0]. Note that all other limits take precedence over this one.

◆ SetMinTableSize()

template<class V >

void JHashTable< V >::SetMinTableSize ( const JSize newMin )

Rounds to the next larger power of two, but never smaller than one.

◆ SetResizeEnabled()

template<class V >

void JHashTable< V >::SetResizeEnabled ( const bool enabled )

◆ TryResizeTable()

template<class V >

bool JHashTable< V >::TryResizeTable ( const JSize lgSize )

protected

Does not respect limits–use FitToLimits for that.

Friends And Related Symbol Documentation

◆ JConstHashCursor< V >

template<class V >

friend class JConstHashCursor< V >

friend

◆ JHashCursor< V >

template<class V >

friend class JHashCursor< V >

friend

The documentation for this class was generated from the following files:

libjcore/code/JConstHashCursor.h
libjcore/code/JHashTable.h
libjcore/code/JHashTable.tmpl

Public Member Functions

Protected Member Functions

Friends

Detailed Description

Constructor & Destructor Documentation

◆ JHashTable()

◆ ~JHashTable()

Member Function Documentation

◆ AllowCursors()

◆ DisallowCursors()

◆ FitToLimits()

◆ GetCursor() [1/2]

◆ GetCursor() [2/2]

◆ GetFillFactor()

◆ GetHashValue()

◆ GetItemCount()

◆ GetLgSize()

◆ GetLoadCount()

◆ GetLoadFactor()

◆ GetMaxLoadFactor()

◆ GetMinFillFactor()

◆ GetMinTableSize()

◆ GetRecord()

◆ GetResizeEnabled()

◆ GetState()

◆ GetTableSize()

◆ GetValue()

◆ HashToIndex()

◆ IsDeleted()

◆ IsEmpty() [1/2]

◆ IsEmpty() [2/2]

◆ IsFull()

◆ IsOK()

◆ MarkAllEmpty()

◆ MarkEmpty()

◆ Remove()

◆ ResizeTable()

◆ Set() [1/3]

◆ Set() [2/3]

◆ Set() [3/3]

◆ SetHash()

◆ SetMaxLoadFactor()

◆ SetMinFillFactor()

◆ SetMinTableSize()

◆ SetResizeEnabled()

◆ TryResizeTable()

Friends And Related Symbol Documentation

◆ JConstHashCursor< V >

◆ JHashCursor< V >