ELF: better symbol lookup via DT_GNU_HASH
DT_GNU_HASH
is a better hash table for the ELF used by GNU systems in GNU-compatible software, i.e. in almost every program compiled with gcc or clang for almost any Linux distribution.
The problem with it is that DT_GNU_HASH
is not documented anywhere other than in GNU binutils and glibc source code. You can either read source code to get some intel, or read emails with patches in mail list archives (Re: GNU_HASH section format is a pretty good one). Those are the only places you can try to find the truth on the matter.
Of course, there're some articles on the web where people try to break it down. Like this one.
This article does not aspire to be the ultimate truth either. But I'll try to cover everything about GNU Hash Table and explain all aspects of its work.
Before reading any further please ensure that you understand what Symbol Table and String Table are in the ELF. Also you may want to read my previous article ELF: symbol lookup via DT_HASH to know the standard (90s-ish) way of doing symbol lookup.
DT_GNU_HASH
has nothing in common with standard DT_HASH
, apart from serving the same purpose. It has its own hashing function, its own layout, it adds restrictions for the symbol table and contains an additional bloom filter to stop lookup for missing symbols early.
Hashing function
Let's start with the hashing function. It can be found in bfd_elf_gnu_hash or in dl_new_hash.
#include <stdint.h>
uint32_t gnu_hash(const uint8_t* name) {
uint32_t h = 5381;
for (; *name; name++) {
h = (h << 5) + h + *name;
}
return h;
}
gnu_hash("") == 0x00001505
gnu_hash("printf") == 0x156b2bb8
gnu_hash("exit") == 0x7c967e3f
gnu_hash("syscall") == 0xbac212a0
gnu_hash("flapenguin.me") == 0x8ae9f18e
Layout
Not a valid C code, but gives an idea:
struct gnu_hash_table {
uint32_t nbuckets;
uint32_t symoffset;
uint32_t bloom_size;
uint32_t bloom_shift;
uint64_t bloom[bloom_size]; /* uint32_t for 32-bit binaries */
uint32_t buckets[nbuckets];
uint32_t chain[];
};
Bloom filter
Bloom filter is used to stop the lookup for missing symbols early. bloom_size
, bloom_shift
, and bloom
are parts of the structure, as their names suggest.
Bloom filter behaves slightly differently for various ELFCLASS
binaries (defined by EI_CLASS
field in ELF Identification). Let's define ELFCLASS_BITS
to be 64
for 64-bit binaries (ELFCLASS64
) and 32
for 32-bit binaries (ELFCLASS32
).
Before doing symbol lookup, take bloom[(hash / ELFCLASS_BITS) % bloom_size]
. If bits hash % ELFCLASS_BITS
and (hash >> bloom_shift) % ELFCLASS_BITS
are set then a symbol may or may not be in the hash table, and you should proceed with a regular lookup through buckets and chains. But if at least one bit is not set then a symbol is certainly absent from the hash table.
Buckets and chains
DT_HASH
contains an element per symbol table's element. This leads to a waste of space because STN_UNDEF
and some other symbols are in the hash table but are never looked up. GNU hash table allows to skip first symoffset
symbols at the beginning of the symbol table.
Same as in DT_HASH
, symbols are put in one of nbuckets
buckets depending on their hashes. To be specific, each symbol should be placed into hash % nbuckets
bucket.
Chains in the GNU hash table are nothing like strange linked lists in DT_HASH
, they are contiguous sequences of hashes for symbols with the same index (remember that chains' indexes are shifted by symoffset
relatively to the symbol table). The last bit in chains' element is discarded and instead used for indicating the chain's end. If it is set then the element is the last one in the chain.
bucket
array holds indexes of the first symbols in the chains. Note that those are not indexes for the chain
array. Indexes for it will be bucket[foobar] - symoffset
.
Chains being contiguous sequences imply that symbols within the same bucket must be stored contiguously. Order of buckets in the symbol table does not really matter but usually they're stored in an ascending order.
While looking extraneous, creating such restriction over the symbol table gives great advantage: a hash table now can store almost full hash (without the lowest bit) of a symbol within the same 32 bits. This allows linkers to compare hashes before comparing strings. Also, because DT_GNU_HASH
requires symbol table ordering and DT_HASH
doesn't, you can fit both into a single binary. This way both standard and GNU linkers can look up symbols in it.
Example
Nothing is better than a visual representation of the rules. So, let's create one.
I took the same symbols as in ELF: symbol lookup via DT_HASH and created DT_GNU_HASH
table from them. The example is for 64-bit ELF binaries, for 32-bit you'll need to recalculate bloom word and bits.
nbuckets = 4 (because I decided that there will be four buckets) symoffset = 1 (STN_UNDEF is not a part of the hash table) bloom_size = 2 (because I decided that 16 byte bloom filter is sufficient) bloom_shift = 5 (again, just because I can) ix bucket[ix] name of first symbol in chain -- ---------- ----------------------------- 0 1 cfsetispeed 1 5 uselib 2 8 freelocal 3 13 getspen Note that: - symbol table is sorted by bucket - chain[ix] is the same as hash but with set/cleared lowest bit SYMBOL TABLE | GNU HASH TABLE | name = | hash % bloom bloom bits ix symtab[ix].st_name hash | ix nbuckets chain[ix] word #0 #1 -- ------------------ -------- | -- ------- ---------- ----- --- --- 0 <STN_UNDEF> | 1 cfsetispeed 830acc54 | 0 0 830acc54 1 20 34 2 strsigna 90f1e4b0 | 1 0 90f1e4b0 0 48 37 3 hcreate_ 4c7e3240 | 2 0 4c7e3240 1 0 18 4 endrpcen b6c44714 | 3 0 b6c44715 0 20 56 5 uselib 2124d3e9 | 4 1 2124d3e8 1 41 31 6 getttyen fff51839 | 5 1 fff51838 0 57 1 7 umoun 1081e019 | 6 1 1081e019 0 25 0 8 freelocal e3364372 | 7 2 e3364372 1 50 27 9 listxatt ced3d862 | 8 2 ced3d862 1 34 3 10 isnan 0fabfd7e | 9 2 0fabfd7e 1 62 43 11 isinf 0fabe9de | 10 2 0fabe9de 1 30 14 12 setrlimi 12e23bae | 11 2 12e23baf 0 46 29 13 getspen f07b2a7b | 12 3 f07b2a7a 1 59 19 14 pthread_mutex_lock 4f152227 | 13 3 4f152226 0 39 17 15 getopt_long_onl 57b1584f | 14 3 57b1584f 1 15 2 Bloom filter: bit # 56 48 40 32 24 16 8 0 xx..x.xx ...xxx.x xx...... x.x.xx.x ..x...x. ...x..x. ........ ......xx .x..x... .....x.. ....x.x. .....x.. xx..x.xx ...xxx.x xx...... x.x.xx.x Or as two `uint64_t` values: cb1dc0ad22120003 48040a04cb1dc0ad
Knowing the rules and having a built table, let's try to find some symbols by hand.
Note that when comparing hashes the lowest bit is set on both left and right hand sides.
Existing symbol.
strsigna
:looking for "strsigna" (hash = 0x90f1e4b0) checking word 0 in bloom filter for bits 48 and 37 hash table may contain symbol starting at ix = 1 compare hashes: (chain) 0x830acc55 == 0x90f1e4b1 wrong hash definitely not "strsigna" moving to the next symbol compare hashes: (chain) 0x90f1e4b1 == 0x90f1e4b1 hash matches. compare strings: "strsigna" == "strsigna" found at index 2
Missing symbol.
foobar
:looking for "foobar" (hash = 0xfde460be) checking word 0 in bloom filter for bits 62 and 5 not in bloom filter not found
Missing symbol with hash collision.
vLoun
:looking for "vLoun" (hash = 0x1081e019) checking word 0 in bloom filter for bits 25 and 0 hash table may contain symbol starting at ix = 5 compare hashes: (chain) 0x2124d3e9 == 0x1081e019 wrong hash definitely not "vLoun" moving to the next symbol compare hashes: (chain) 0xfff51839 == 0x1081e019 wrong hash definitely not "vLoun" moving to the next symbol compare hashes: (chain) 0x1081e019 == 0x1081e019 hash matches. compare strings: "umoun" == "vLoun" just hash collision that was last symbol in this bucket not found
Code
Original algorithm is implemented in do_lookup_x in ld.so
source code.
Implementation is a little trickier than DT_HASH
's one, but with the example above it should be self-explanatory.
/* Different architectures have different symbol structure size.
* Those actually should be selected depending on input binary's ELFCLASS,
* but for simplicity I've left them as typedefs and defines.
*/
typedef Elf64_Sym Elf_Sym;
typedef bloom_el_t uint64_t;
#define ELFCLASS_BITS 64
/* 32-bit binary:
typedef Elf32_Sym Elf_Sym;
typedef bloom_el_t uint32_t;
#define ELFCLASS_BITS 32
*/
const Elf_Sym* gnu_lookup(
const char* strtab, /* string table */
const Elf_Sym* symtab, /* symbol table */
const uint32_t* hashtab, /* hash table */
const char* name /* symbol to look up */
) {
const uint32_t namehash = gnu_hash(name);
const uint32_t nbuckets = hashtab[0];
const uint32_t symoffset = hashtab[1];
const uint32_t bloom_size = hashtab[2];
const uint32_t bloom_shift = hashtab[3];
const bloom_el_t* bloom = (void*)&hashtab[4];
const uint32_t* buckets = (void*)&bloom[bloom_size];
const uint32_t* chain = &buckets[nbuckets];
bloom_el_t word = bloom[(namehash / ELFCLASS_BITS) % bloom_size];
bloom_el_t mask = 0
| (bloom_el_t)1 << (namehash % ELFCLASS_BITS)
| (bloom_el_t)1 << ((namehash >> bloom_shift) % ELFCLASS_BITS);
/* If at least one bit is not set, a symbol is surely missing. */
if ((word & mask) != mask) {
return NULL;
}
uint32_t symix = buckets[namehash % nbuckets];
if (symix < symoffset) {
return NULL;
}
/* Loop through the chain. */
while (true) {
const char* symname = strtab + symtab[symix].st_name;
const uint32_t hash = chain[symix - symoffset];
if ((namehash|1) == (hash|1) && strcmp(name, symname) == 0) {
return &symtab[symix];
}
/* Chain ends with an element with the lowest bit set to 1. */
if (hash & 1) {
break;
}
symix++;
}
return NULL;
}
Total number of symbols
Total number of symbols is missing from DT_GNU_HASH
for reasons I do not know. To get a total number of symbols you'll have to find a chain element with the highest index. To do so find a chain that starts at the highest index (max(bucket)
) first and then walk the chain to its end (while ((chain[ix - symoffset] & 1) == 0) ix++;
).
This may be useful if you only have access to dynamic program information, e.g. if your program is a dynamic linker or an ELF loader. However, if you have access to section headers, you can simply calculate the total number of symbols from section's header: just divide the section size by an entry size.
Conclusion
DT_GNU_HASH
is a way better structure that DT_HASH
. It is a shame it is not standardized, because every system that uses ELF uses DT_GNU_HASH
to its advantage. As noted in the original GNU binutils patch, it improves linking time by up to 50%. This is twice as fast loading time! Considering how much symbols are exported and imported by an average C++ shared library and size of those symbols' names (mangling fits all namespaces, types, and template arguments into the final symbol name), obscure DT_GNU_HASH
is the thing that makes your applications start as fast as they can.