CWB
Data Structures | Defines | Typedefs | Functions

lexhash.c File Reference

#include "globals.h"
#include "macros.h"
#include "lexhash.h"

Data Structures

Defines

Typedefs

Functions


Define Documentation

#define DEFAULT_NR_OF_BUCKETS   250000

Defines the default number of buckets in a lexhash.

Referenced by cl_new_lexhash().

#define DEFAULT_PERFORMANCE_LIMIT   10

The default value for the performance limit (avg no of comparisons) before the hash is expanded.

Referenced by cl_lexhash_check_grow().

#define PERFORMANCE_COUNT   1000

The update interval for hash performance estimation.

Referenced by cl_lexhash_check_grow(), cl_lexhash_find_i(), and cl_new_lexhash().


Typedef Documentation

A function pointer type defining functions that can be used as the "cleanup" for a deleted cl_lexhash_entry.

See also:
cl_lexhash_set_cleanup_function

Function Documentation

void cl_delete_lexhash ( cl_lexhash  hash)

Deletes a cl_lexhash object.

This deletes all the entries in all the buckets in the lexhash, plus the cl_lexhash itself.

Parameters:
hashThe cl_lexhash to delete.

References _cl_lexhash::buckets, cl_delete_lexhash_entry(), cl_free, _cl_lexhash_entry::next, and _cl_lexhash::table.

Referenced by main().

void cl_delete_lexhash_entry ( cl_lexhash  hash,
cl_lexhash_entry  entry 
)

Deallocates a cl_lexhash_entry object and its key string.

Also, the cleanup function is run on the entry.

Usage: cl_delete_lexhash_entry(lexhash, entry);

This is a non-exported function.

See also:
cl_lexhash_set_cleanup_function
Parameters:
hashThe lexhash this entry belongs to (needed to locate the cleanup function, if any).
entryThe entry to delete.

References cl_free, _cl_lexhash::cleanup_func, and _cl_lexhash_entry::key.

Referenced by cl_delete_lexhash(), and cl_lexhash_del().

cl_lexhash_entry cl_lexhash_add ( cl_lexhash  hash,
char *  token 
)

Adds a token to a cl_lexhash table.

If the string is already in the hash, its frequency count is increased by 1.

Otherwise, a new entry is created, with an auto-assigned ID; note that the string is duplicated, so the original string that is passed to this function does not need ot be kept in memory.

Parameters:
hashThe hash table to add to.
tokenThe string to add.
Returns:
A pointer to a (new or existing) entry

References cl_lexhash_find_i(), cl_malloc(), cl_strdup(), _cl_lexhash_entry::data, _cl_lexhash::entries, _cl_lexhash_entry::freq, _cl_lexhash_entry::id, _cl_lexhash_entry::_cl_lexhash_entry_data::integer, _cl_lexhash_entry::key, _cl_lexhash_entry::next, _cl_lexhash::next_id, _cl_lexhash_entry::_cl_lexhash_entry_data::numeric, _cl_lexhash_entry::_cl_lexhash_entry_data::pointer, and _cl_lexhash::table.

Referenced by encode_add_wattr_line(), main(), range_close(), range_declare(), range_open(), and sencode_write_region().

void cl_lexhash_auto_grow ( cl_lexhash  hash,
int  flag 
)

Turns a cl_lexhash's ability to autogrow on or off.

When this setting is switched on, the lexhash will grow automatically to avoid performance degradation.

Note the default value for this setting is SWITCHED ON.

See also:
cl_lexhash_check_grow
Parameters:
hashThe hash that will be affected.
flagNew value for autogrow setting: boolean where true is on and false is off.

References _cl_lexhash::auto_grow.

int cl_lexhash_check_grow ( cl_lexhash  hash)

Grows a lexhash table, increasing the number of buckets, if necessary.

This function checks whether growing the hash is necessary by updating the performance estimate. If it is above the threshold, and auto_grow is enabled, then the hashes is expanded by increasing the number of buckets, such that the average fill rate is 1 (i.e. 1 lexhash_entry per bucket, 1 lexhash index == 1 key-string ... on average). This gives the hash better performance and makes it capable of absorbing more keys.

Note: this function also implements the hashing algorithm and must be consistent with cl_lexhash_find_i().

Usage: expanded = cl_lexhash_check_grow(cl_lexhash hash);

This is a non-exported function.

Parameters:
hashThe lexhash to autogrow.
Returns:
Always 0.

References _cl_lexhash::auto_grow, _cl_lexhash::buckets, cl_debug, cl_free, cl_new_lexhash(), _cl_lexhash::comparisons, DEFAULT_PERFORMANCE_LIMIT, _cl_lexhash::entries, hash_string(), _cl_lexhash_entry::key, _cl_lexhash::last_performance, _cl_lexhash_entry::next, PERFORMANCE_COUNT, and _cl_lexhash::table.

Referenced by cl_lexhash_find_i().

int cl_lexhash_del ( cl_lexhash  hash,
char *  token 
)

Deletes a string from a hash.

The entry corresponding to the specified string is removed from the lexhash. If the string is not in the lexhash to begin with, no action is taken.

Parameters:
hashThe hash to alter.
tokenThe string to remove.
Returns:
The frequency of the deleted entry.

References cl_delete_lexhash_entry(), cl_lexhash_find_i(), _cl_lexhash::entries, _cl_lexhash_entry::freq, _cl_lexhash_entry::next, and _cl_lexhash::table.

cl_lexhash_entry cl_lexhash_find ( cl_lexhash  hash,
char *  token 
)

Finds the entry corresponding to a particular string within a cl_lexhash.

Parameters:
hashThe hash to search.
tokenThe key-string to look for.
Returns:
The entry that is found (or NULL if the string is not in the hash).

References cl_lexhash_find_i().

Referenced by main(), range_close(), range_open(), range_print_registry_line(), and sencode_write_region().

cl_lexhash_entry cl_lexhash_find_i ( cl_lexhash  hash,
char *  token,
unsigned int *  ret_offset 
)

Finds the entry corresponding to a particular string in a cl_lexhash.

This function is the same as cl_lexhash_find(), but *ret_offset is set to the hashtable offset computed for token (i.e. the index of the bucket within the hashtable), unless *ret_offset == NULL.

Note that this function hides the hashing algorithm details from the rest of the lexhash implementation.

Usage: entry = cl_lexhash_find_i(cl_lexhash hash, char *token, unsigned int *ret_offset);

This is a non-exported function.

Parameters:
hashThe hash to search.
tokenThe key-string to look for.
ret_offsetThis integer address will be filled with the token's hashtable offset.
Returns:
The entry that is found (or NULL if the string is not in the hash).

References _cl_lexhash::buckets, cl_lexhash_check_grow(), _cl_lexhash::comparisons, hash_string(), _cl_lexhash_entry::key, _cl_lexhash_entry::next, PERFORMANCE_COUNT, _cl_lexhash::performance_counter, and _cl_lexhash::table.

Referenced by cl_lexhash_add(), cl_lexhash_del(), cl_lexhash_find(), cl_lexhash_freq(), and cl_lexhash_id().

int cl_lexhash_freq ( cl_lexhash  hash,
char *  token 
)

Gets the frequency of a particular string within a lexhash.

Parameters:
hashThe hash to look in.
tokenThe string to look for.
Returns:
The frrequency of that string, or 0 if the string is not in the hash (whgich is, of course, actually its frequency).

References cl_lexhash_find_i(), and _cl_lexhash_entry::freq.

Referenced by main(), and range_open().

int cl_lexhash_id ( cl_lexhash  hash,
char *  token 
)

Gets the ID of a particular string within a lexhash.

Note this is the ID integer that identifies THAT PARTICULAR STRING, not the hash value of that string - which only identifies the bucket the string is found in!

Parameters:
hashThe hash to look in.
tokenThe string to look for.
Returns:
The ID code of that string, or -1 if the string is not in the hash.

References cl_lexhash_find_i(), and _cl_lexhash_entry::id.

Referenced by encode_add_wattr_line(), and range_declare().

void cl_lexhash_set_cleanup_function ( cl_lexhash  hash,
cl_lexhash_cleanup_func  func 
)

Sets the cleanup function for a cl_lexhash.

The cleanup function is called with a cl_lexhash_entry argument; it should delete any objects assocated with the entry's data field.

The cleanup function is initially set to NULL, i.e. run no function.

Parameters:
hashThe cl_lexhash to work with.
funcPointer to the function to use for cleanup.

References _cl_lexhash::cleanup_func, and func.

int cl_lexhash_size ( cl_lexhash  hash)

Gets the number of different strings stored in a lexhash.

This returns the total number of entries in all the bucket linked-lists in the whole hashtable.

Parameters:
hashThe hash to size up.

References _cl_lexhash::buckets, _cl_lexhash_entry::next, and _cl_lexhash::table.

cl_lexhash cl_new_lexhash ( int  buckets)

Creates a new cl_lexhash object.

Parameters:
bucketsThe number of buckets in the newly-created cl_lexhash; set to 0 to use the default number of buckets.
Returns:
The new cl_lexhash.

References _cl_lexhash::auto_grow, _cl_lexhash::buckets, cl_calloc(), cl_malloc(), _cl_lexhash::cleanup_func, _cl_lexhash::comparisons, DEFAULT_NR_OF_BUCKETS, _cl_lexhash::entries, find_prime(), _cl_lexhash::last_performance, _cl_lexhash::next_id, PERFORMANCE_COUNT, _cl_lexhash::performance_counter, and _cl_lexhash::table.

Referenced by cl_lexhash_check_grow(), main(), range_declare(), sencode_write_region(), and wattr_declare().

int find_prime ( int  n)

Returns smallest prime >= n.

Referenced by cl_new_lexhash(), main(), make_attribute_hash(), and MakeMacroHash().

unsigned int hash_string ( char *  string)

Computes 32bit hash value for string.

Referenced by att_hash_lookup(), cl_lexhash_check_grow(), and cl_lexhash_find_i().

int is_prime ( int  n)

Returns True iff n is a prime.

Referenced by find_prime().