...one of the most highly
regarded and expertly designed C++ library projects in the
world.
—
Herb Sutter
and
Andrei
Alexandrescu
,
C++
Coding Standards
Boost.MultiIndex provides eight different index types, which can be classified as
shown in the table below.
Ordered
and
sequenced
indices,
which are the most commonly used, have been explained in the basics section;
the rest of index types can be regarded as variations of the former providing
some added benefits, functionally or in the area of performance.
Key-based indices, of which ordered indices are the usual example, provide
efficient lookup of elements based on some piece of information called the
element key
: there is an extensive suite of
key extraction
utility classes allowing for the specification of such keys. Fast lookup
imposes an internally managed order on these indices that the user is not
allowed to modify; non key-based indices, on the other hand, can be freely
rearranged at the expense of lacking lookup facilities. Sequenced indices,
modeled after the interface of
std::list
, are the customary
example of a non key-based index.
Suppose we have a
std::multiset
of numbers and we want to output
the values above the 75h
percentile
:
typedef
std
::
multiset
<
int
>
int_multiset
;
void
output_above_75th_percentile
(
const
int_multiset
&
s
)
int_multiset
::
const_iterator
it
=
s
.
begin
();
std
::
advance
(
it
,
s
.
size
()*
3
/
4
);
std
::
copy
(
it
,
s
.
end
(),
std
::
ostream_iterator
<
int
>(
std
::
cout
,
"\n"
));
The problem with this code is that getting to the beginning of the desired subsequence
involves a linear traversal of the container. Ranked indices provide the mechanisms to do this
much faster:
typedef
multi_index_container
<
int
,
indexed_by
<
ranked_non_unique
<
identity
<
int
>
>
>
int_multiset
;
void
output_above_75th_percentile
(
const
int_multiset
&
s
)
int_multiset
::
const_iterator
it
=
s
.
nth
(
s
.
size
()*
3
/
4
);
std
::
copy
(
it
,
s
.
end
(),
std
::
ostream_iterator
<
int
>(
std
::
cout
,
"\n"
));
nth(n)
returns an iterator to the element whose
rank
, i.e. its distance
from the beginning of the index, is
n
, and does so efficiently in logarithmic time.
Conversely,
rank(it)
computes in logarithmic time the rank of the element
pointed to by
it
, or
size()
if
it==end()
.
int_multiset
::
iterator
it
=
s
.
insert
(
10
).
first
;
int_multiset
::
size_type
r
=
s
.
rank
(
it
);
Ranked indices provide the same interface as ordered indices plus several rank-related operations.
The cost of this extra functionality is higher memory consumption due to some internal additional
data (one word per element) and somewhat longer execution times in insertion and erasure
—in particular, erasing an element takes time proportional to
log(n)
, where
n
is the number of elements in the index, whereas for ordered indices this time is
constant.
The
reference
describes these indices in complete detail.
The specification of ranked indices is done exactly as with
ordered indices
,
except that
ranked_unique
and
ranked_non_unique
are used instead.
(
ranked_unique
|
ranked_non_unique
)
<[
(tag)
[,
(key extractor)
[,
(comparison predicate)
]]]>
(
ranked_unique
|
ranked_non_unique
)
<[
(key extractor)
[,
(comparison predicate)
]]>
Besides
nth
and
rank
, ranked indices provide member functions
find_rank
,
lower_bound_rank
,
upper_bound_rank
,
equal_range_rank
and
range_rank
that behave as their normal
lookup
and
range retrieval
counterparts (
find
,
lower_bound
etc.) but return ranks rather than iterators.
void
percentile
(
int
n
,
const
int_multiset
&
s
)
std
::
cout
<<
n
<<
" lies in the "
<<
s
.
upper_bound_rank
(
n
)*
100.0
/
s
.
size
()<<
" percentile\n"
;
You might think that
upper_bound_rank(n)
is mere shorthand for
rank(upper_bound(n))
: in reality, though, you should prefer using
*_rank
operations directly as they run faster than the
alternative formulations.
Hashed indices constitute a trade-off with respect to ordered indices: if correctly used,
they provide much faster lookup of elements, at the expense of losing sorting
information.
Let us revisit our
employee_set
example: suppose a field for storing
the Social Security number is added, with the requisite that lookup by this
number should be as fast as possible. Instead of the usual ordered index, a
hashed index can be resorted to:
#include
<
boost
/
multi_index_container
.
hpp
>
#include
<
boost
/
multi_index
/
hashed_index
.
hpp
>
#include
<
boost
/
multi_index
/
ordered_index
.
hpp
>
#include
<
boost
/
multi_index
/
identity
.
hpp
>
#include
<
boost
/
multi_index
/
member
.
hpp
>
struct
employee
int
id
;
std
::
string
name
;
int
ssnumber
;
employee
(
int
id
,
const
std
::
string
&
name
,
int
ssnumber
):
id
(
id
),
name
(
name
),
ssnumber
(
ssnumber
){}
bool
operator
<(
const
employee
&
e
)
const
{
return
id
<
e
.
id
;}
typedef
multi_index_container
<
employee
,
indexed_by
<
ordered_unique
<
identity
<
employee
>
>,
ordered_non_unique
<
member
<
employee
,
std
::
string
,&
employee
::
name
>
>,
hashed_unique
<
member
<
employee
,
int
,&
employee
::
ssnumber
>
>
>
employee_set
Note that the hashed index does not guarantee any particular ordering of the
elements: so, for instance, we cannot efficiently query the employees whose SSN is
greater than a given number. Usually, you must consider these restrictions when
determining whether a hashed index is preferred over an ordered one.
Hashed indices replicate the interface of
std::unordered_set
and
std::unordered_multiset
, with only minor differences where required
by the general constraints of
multi_index_container
s, and provide
additional useful capabilities like in-place updating of elements.
Check the
reference
for a
complete specification of the interface of hashed indices,
and
example 8
and
example 9
for practical applications.
Just like ordered indices, hashed indices have unique and non-unique variants, selected
with the specifiers
hashed_unique
and
hashed_non_unique
,
respectively. In the latter case, elements with equivalent keys are kept together and can
be jointly retrieved by means of the
equal_range
member function.
Hashed indices specifiers have two alternative syntaxes, depending on whether
tags
are provided or not:
(
hashed_unique
|
hashed_non_unique
)
<[
(tag)
[,
(key extractor)
[,
(hash function)
[,
(equality predicate)
]]]]>
(
hashed_unique
|
hashed_non_unique
)
<[
(key extractor)
[,
(hash function)
[,
(equality predicate)
]]]>
The key extractor parameter works in exactly the same way as for
ordered indices
; lookup, insertion,
etc., are based on the key returned by the extractor rather than the whole
element.
The hash function is the very core of the fast lookup capabilities of this type of
indices: a hasher is just a unary function object
returning an
std::size_t
value for any given
key. In general, it is impossible that every key map to a different hash value, for
the space of keys can be greater than the number of permissible hash codes: what
makes for a good hasher is that the probability of a collision (two different
keys with the same hash value) is as close to zero as possible. This is a statistical
property depending on the typical distribution of keys in a given application, so
it is not feasible to have a general-purpose hash function with excellent results
in
every
possible scenario; the default value for this parameter uses
Boost.Hash
, which often provides good
enough results.
The equality predicate is used to determine whether two keys are to be treated
as the same. The default
value
std::equal_to<KeyFromValue::result_type>
is in most
cases exactly what is needed, so very rarely will you have to provide
your own predicate. Note that hashed indices require that two
equivalent keys have the same hash value, which
in practice greatly reduces the freedom in choosing an equality predicate.
The lookup interface of hashed indices consists in member functions
find
,
count
,
contains
and
equal_range
.
Note that
lower_bound
and
upper_bound
are not
provided, as there is no intrinsic ordering of keys in this type of indices.
Just as with ordered indices, these member functions take keys
as their search arguments, rather than entire objects. Remember that
ordered indices lookup operations are further augmented to accept
compatible keys
, which can roughly be regarded as "subkeys".
For hashed indices, a concept of
compatible key
is also
supported, though its usefulness is much more limited: basically,
a compatible key is an object which is entirely equivalent to
a native object of
key_type
value, though maybe with
a different internal representation:
struct
ssn
ssn
(
int
area_no
,
int
group_no
,
int
serial_no
):
area_no
(
area_no
),
group_no
(
group_no
),
serial_no
(
serial_no
)
int
to_int
()
const
return
serial_no
+
10000
*
group_no
+
1000000
*
area_no
;
private
:
int
area_no
;
int
group_no
;
int
serial_no
;
struct
ssn_equal
bool
operator
()(
const
ssn
&
x
,
int
y
)
const
return
x
.
to_int
()==
y
;
bool
operator
()(
int
x
,
const
ssn
&
y
)
const
return
x
==
y
.
to_int
();
struct
ssn_hash
std
::
size_t
operator
()(
const
ssn
&
x
)
const
return
boost
::
hash
<
int
>()(
x
.
to_int
());
std
::
size_t
operator
()(
int
x
)
const
return
boost
::
hash
<
int
>()(
x
);
typedef
employee_set
::
nth_index
<
2
>::
type
employee_set_by_ssn
;
employee_set
es
;
employee_set_by_ssn
&
ssn_index
=
es
.
get
<
2
>();
employee
e
=*(
ssn_index
.
find
(
ssn
(
12
,
1005
,
20678
),
ssn_hash
(),
ssn_equal
()));
In the example, we provided a hash functor
ssn_hash
and an
equality predicate
ssn_equal
allowing for interoperability
between
ssn
objects and the raw
int
s stored as
SSN
s in
employee_set
.
By far, the most useful application of compatible keys in the context
of hashed indices lies in the fact that they allow for seamless usage of
composite keys
.
Hashed indices have
replace
,
modify
and
modify_key
member functions, with the same functionality as in ordered indices.
Due to the internal constraints imposed by the Boost.MultiIndex framework,
hashed indices provide guarantees on iterator validity and
exception safety that are actually stronger than required by the
C++ standard with respect to unordered associative containers:
Iterator validity is preserved in any case during insertion or rehashing:
C++ unordered associative containers can invalidate iterators when a rehash (implicit or explicit)
is performed.
Erasing an element or range of elements via iterators does not throw ever,
as the internal hash function and equality predicate objects are not actually
invoked.
rehash
provides the strong exception safety guarantee
unconditionally. The standard only warrants it for unordered associative containers if the internal hash function and
equality predicate objects do not throw. The somewhat surprising consequence
is that a standard-compliant
std::unordered_set
might erase
elements if an exception is thrown during rehashing!
In general, these stronger guarantees play in favor of the user's convenience,
specially that which refers to iterator stability. A (hopefully minimal)
degradation in performance might result in exchange for these commodities,
though.
Random access indices offer the same kind of functionality as
sequenced indices
, with the extra advantages
that their iterators are random access, and
operator[]
and
at()
are provided for accessing
elements based on their position in the index. Let us rewrite a
container used in a previous
example
,
using random access instead of sequenced indices:
#include
<
boost
/
multi_index_container
.
hpp
>
#include
<
boost
/
multi_index
/
random_access_index
.
hpp
>
#include
<
boost
/
multi_index
/
ordered_index
.
hpp
>
#include
<
boost
/
multi_index
/
identity
.
hpp
>
typedef
multi_index_container
<
std
::
string
,
indexed_by
<
random_access
<>,
ordered_non_unique
<
identity
<
std
::
string
>
>
>
text_container
;
text_container
tc
;
Random access capabilities allow us to efficiently write code
like the following:
void
print_page
(
std
::
size_t
page_num
)
static
const
std
::
size_t
words_per_page
=
50
;
std
::
size_t
pos0
=
std
::
min
(
tc
.
size
(),
page_num
*
words_per_page
);
std
::
size_t
pos1
=
std
::
min
(
tc
.
size
(),
pos0
+
words_per_page
);
std
::
copy
(
tc
.
begin
()+
pos0
,
tc
.
begin
()+
pos1
,
std
::
ostream_iterator
<
std
::
string
>(
std
::
cout
));
void
print_random_word
()
std
::
cout
<<
tc
[
rand
()%
tc
.
size
()];
This added flexibility comes at a price: insertions and deletions at positions
other than the end of the index have linear complexity, whereas these operations
are constant time for sequenced indices. This situation is reminiscent of the
differences in complexity behavior between
std::list
and
std::vector
: in the case of random access indices, however,
insertions and deletions never incur any element copying, so the actual
performance of these operations can be acceptable, despite the theoretical
disadvantage with respect to sequenced indices.
Example 10
and
example 11
in the examples section put
random access indices in practice.
Random access indices are specified with the
random_access
construct,
where the
tag
parameter is, as usual, optional:
random_access
<[
(tag)
]>
All public functions offered by sequenced indices are also provided
by random access indices, so that the latter can act as a drop-in replacement
of the former (save with respect to their complexity bounds, as explained above).
Besides, random access
indices have
operator[]
and
at()
for positional
access to the elements, and member functions
capacity
and
reserve
that control internal reallocation in a similar manner as the homonym
facilities in
std::vector
. Check the
reference
for details.
It is tempting to see random access indices as an analogue of
std::vector
for use in Boost.MultiIndex, but this metaphor can be misleading, as both constructs,
though similar in many respects, show important semantic differences. An
advantage of random access indices is that their iterators, as well as references
to their elements, are
stable
, that is, they remain valid after any insertions
or deletions. On the other hand, random access indices have several disadvantages with
respect to
std::vector
s:
They do not provide
memory contiguity
, a property
of
std::vector
s by which elements are stored adjacent to one
another in a single block of memory.
As usual in Boost.MultiIndex, elements of random access indices are immutable
and can only be modified through member functions
replace
and
modify
.
This precludes the usage of many mutating
algorithms that are nonetheless applicable to
std::vector
s.
The latter shortcoming can be partially remedied by means of the
rearranging interface
these indices provide.
In general, it is more instructive to regard random access indices as
a variation of sequenced indices providing random access semantics, instead
of insisting on the
std::vector
analogy.
By design, index elements are immutable, i.e. iterators only grant
const
access to them, and only through the provided
updating interface (
replace
,
modify
and
modify_key
) can the elements be modified. This restriction
is set up so that the internal invariants of key-based indices are
not broken (for instance, ascending order traversal in ordered
indices), but induces important limitations in non key-based indices:
typedef
multi_index_container
<
int
,
indexed_by
<
random_access
<>,
ordered_unique
<
identity
<
int
>
>
>
container
;
container
c
;
std
::
mt19937
rng
;
std
::
shuffle
(
c
.
begin
(),
c
.
end
(),
rng
);
What is unfortunate about the previous example is that the operation
performed by
std::shuffle
is potentially compatible
with
multi_index_container
invariants, as its result can be
described by a permutation of the elements in the random access index
with no actual modifications to the elements themselves. There are many
more examples of such compatible algorithms in the C++ standard library,
like for instance all sorting and partition functions.
Sequenced and random access indices provide a means to take advantage
of such external algorithms. In order to introduce this facility we need
a preliminary concept: a
view
of an index is defined as
some iterator range [
first
,
last
) over the
elements of the index such that all its elements are contained in the
range exactly once. Continuing with our example, we can apply
std::shuffle
on an ad hoc view obtained from the
container:
std
::
vector
<
boost
::
reference_wrapper
<
const
int
>
>
v
;
BOOST_FOREACH
(
const
int
&
i
,
c
)
v
.
push_back
(
boost
::
cref
(
i
));
std
::
shuffle
(
v
.
begin
(),
v
.
end
(),
rng
);
Elements of
v
are
reference_wrapper
s (from
Boost.Ref
) to the actual elements
in the multi-index container. These objects still do not allow modification
of the referenced entities, but they are swappable,
which is the only requirement
std::shuffle
imposes. Once
we have our desired rearrange stored in the view, we can transfer it to
the container with
c
.
rearrange
(
v
.
begin
());
rearrange
accepts an input iterator signaling the beginning
of the external view (and end iterator is not needed since the length of
the view is the same as that of the index) and internally relocates the
elements of the index so that their traversal order matches the view.
Albeit with some circumventions,
rearrange
allows for the
application of a varied range of algorithms to non key-based indices.
Please note that the view concept is very general, and in no way tied
to the particular implementation example shown above. For instance, indices
of a
multi_index_container
are indeed views with respect to
its non key-based indices:
c
.
rearrange
(
c
.
get
<
1
>().
begin
());
c
.
rearrange
(
c
.
get
<
1
>().
rbegin
());
The only important requirement imposed on views is that they must be
free
, i.e. they are not affected by relocations on the base index:
thus,
rearrange
does not accept the following:
c
.
rearrange
(
c
.
rbegin
());
The view concept is defined in detail in the
reference
.
See
example 11
in the examples section
for a demonstration of use of
rearrange
.
All indices of Boost.MultiIndex provide a member function called
iterator_to
which returns an iterator to a given element of the container:
multi_index_container
<
int
,
indexed_by
<
sequenced
<>
>
c
.
erase
(
c
.
iterator_to
(
c
.
back
()));
int
x
=
c
.
back
();
c
.
erase
(
c
.
iterator_to
(
x
));
iterator_to
provides a way to retrieve an iterator to an element
from a pointer to the element, thus making iterators and pointers interchangeable
for the purposes of element pointing (not so for traversal) in many situations.
This notwithstanding, it is not the aim of
iterator_to
to
promote the usage of pointers as substitutes for real iterators: the latter are
specifically designed for handling the elements of a container,
and not only benefit from the iterator orientation of container interfaces,
but are also capable of exposing many more programming bugs than raw pointers, both
at compile and run time.
iterator_to
is thus meant to be used
in scenarios where access via iterators is not suitable or desireable:
Interoperability with preexisting APIs based on pointers or references.
Publication of pointer-based interfaces (for instance, when
designing a C-compatible library).
The exposure of pointers in place of iterators can act as a
type
erasure
barrier effectively decoupling the user of the code
from the implementation detail of which particular container is being
used. Similar techniques, like the famous Pimpl idiom, are used
in large projects to reduce dependencies and build times.
Self-referencing contexts where an element acts upon its owner
container and no iterator to itself is available.
void
move_to_retirement
(
int
ssnumber
,
employee_set
&
es
,
employee_set
&
archive
)
employee_set_by_ssn
::
node_type
node
=
es
.
get
<
ssn
>().
extract
(
ssnumber
);
if
(!
node
.
empty
()){
archive
.
insert
(
std
::
move
(
node
));
In the example, the internal node is transferred as-is from
es
to
archive
,
which is more efficient than erasing from the source and recreating in destination.
node_type
is a move-only class used to pass nodes around, and its interface follows
that of the
homonym type
for C++ associative containers (set containers version). Boost.MultiIndex provides node extraction
and insertion operations for all index types, including sequenced ones (by contrast,
std::list
does not have such features):
multi_index_container
<
int
,
indexed_by
<
sequenced
<>,
ordered_unique
<
identity
<
int
>
>
>
src
;
multi_index_container
<
int
,
indexed_by
<
sequenced
<>,
ordered_non_unique
<
identity
<
int
>,
std
::
greater
<
int
>
>
>
dst
;
for
(
auto
first
=
src
.
begin
(),
last
=
src
.
end
();
first
!=
last
;){
if
(*
first
%
2
==
0
)
dst
.
insert
(
dst
.
end
(),
src
.
extract
(
first
++));
else
++
first
;
Note that
src
and
dst
are of different types,
yet transfer is possible. Two
multi_index_container
s are
node-compatible (that is, they use the same
node_type
) if
they have the same element and allocator types and their respective indices match
one by one without regard to whether they are unique or non-unique or to
their particular configuration parameters: they are both ordered, or
both sequenced, etc.
Alternatively, direct node transfer between two containers can be done without
keeping intervening
node_type
s thanks to
merge
(key-based
indices) and
splice
(non key-based indices).
void
move_to_retirement_by_age
(
int
max_age
,
employee_set
&
es
,
employee_set
&
archive
)
employee_set_by_age
&
ea
=
es
.
get
<
age
>();
archive
.
merge
(
ea
,
ea
.
upper_bound
(
max_age
),
ea
.
end
());
for
(
auto
first
=
src
.
begin
(),
last
=
src
.
end
();
first
!=
last
;){
if
(*
first
%
2
==
0
)
dst
.
splice
(
dst
.
end
(),
src
,
first
++);
else
++
first
;
There are overloads of
merge
/
splice
for transferring a single element,
a range between two iterators and an entire container: for further details, consult
for instance the reference for
ordered indices
and for
sequenced indices
(the rest of indices provide one interface or the other).
Please note that sequenced and random access indices do also have an operation called
merge
,
but this follows the specification of
std::list::merge
, which has a somewhat
different behavior (source and destination are required to be ordered by the same criterion). This is
a rather confusing naming issue that Boost.MultiIndex simply inherits from the C++ standard.
Ordered and ranked indices are implemented by means of a data structure
known as a
red-black tree
. Nodes of a red-back tree contain pointers
to the parent and the two children nodes, plus a 1-bit field referred to as
the
node color
(hence the name of the structure). Due to alignment
issues, on most architectures the color field occupies one entire word, that is,
4 bytes in 32-bit systems and 8 bytes in 64-bit environments. This waste
of space can be avoided by embedding the color bit inside one of the
node pointers, provided not all the bits of the pointer representation contain
useful information: this is precisely the case in many architectures where
such nodes are aligned to even addresses, which implies that the least
significant bit of the address must always be zero.
Boost.MultiIndex ordered and ranked indices implement this type of node compression
whenever applicable. As compared with common implementations of the STL
container
std::set
, node compression can
result in a reduction of header overload by 25% (from 16 to 12 bytes on
typical 32-bit architectures, and from 32 to 24 bytes on 64-bit systems).
The impact on performance of this optimization has been checked to be negligible
for moderately sized containers, whereas containers with many elements (hundreds
of thousands or more) perform faster with this optimization, most likely due to
L1 and L2 cache effects.
Node compression can be disabled by globally setting the macro
BOOST_MULTI_INDEX_DISABLE_COMPRESSED_ORDERED_INDEX_NODES
.
Distributed under the Boost Software
License, Version 1.0. (See accompanying file