diff options
author | onpopov <onpopov@yandex-team.ru> | 2022-02-10 16:50:38 +0300 |
---|---|---|
committer | Daniil Cherednik <dcherednik@yandex-team.ru> | 2022-02-10 16:50:38 +0300 |
commit | 84a29dd4980d5b39615e453f289bd1a81213296d (patch) | |
tree | 5e320f10d6b5863e0d5ab1a8caa9eefbdaa5195f /library/cpp/containers | |
parent | 1717072c6635948128dad7b015a0ec05acbe913b (diff) | |
download | ydb-84a29dd4980d5b39615e453f289bd1a81213296d.tar.gz |
Restoring authorship annotation for <onpopov@yandex-team.ru>. Commit 1 of 2.
Diffstat (limited to 'library/cpp/containers')
30 files changed, 1149 insertions, 1149 deletions
diff --git a/library/cpp/containers/comptrie/README.md b/library/cpp/containers/comptrie/README.md index 43c298e2c8..13caa33c3d 100644 --- a/library/cpp/containers/comptrie/README.md +++ b/library/cpp/containers/comptrie/README.md @@ -1,6 +1,6 @@ Compact trie ============= - + The comptrie library is a fast and very tightly packed implementation of a prefix tree (Sedgewick's T-trie, that is a ternary tree, see https://www.cs.princeton.edu/~rs/strings/paper.pdf, @@ -9,10 +9,10 @@ various searches. Because it is template-based and performance-oriented, a signi part of the library consists of inline functions, so if you don't need all the features of the library, consider including a more specific header file instead of the top-level comptrie.h file. - + Description of the data structure --------------------------------- - + A prefix tree is an implementation of the map data structure for cases when keys are sequences of characters. The nodes on this tree contain characters and values. The key that corresponds to a @@ -24,11 +24,11 @@ balanced if possible (characters have a defined order that must be preserved). Thus, our tree is ternary: each node has a reference to a subtree of all children and two references to subtrees of siblings. One of these subtrees contains the younger siblings, and the other contains the elder siblings. - + The library implements tree optimization by merging identical subtrees, which means the tree becomes a DAG (Directed Acyclic Graph – an oriented graph without oriented cycles). - + The main class TCompactTrie is defined in comptrie_trie.h and is templatized: - The first parameter of the template is the character type. It should be an integer type, which means that arithmetical operations must be defined for it. @@ -38,32 +38,32 @@ serialize the value type to a continuous memory buffer, deserialize it back, and quickly determine its size using the pointer to the beginning of this memory buffer. Good packers have already been written for most types, and they are available in library/cpp/packers. For more information, please refer to the documentation for these packers. - + The set.h file defines a modification for cases when keys must be stored without values. - + When a tree is built from scratch, the value corresponding to an empty key is assigned to a single-character key '\0'. So in a tree with the 'char' character type, the empty key and the '\0' key are bound together. For a subtree received from a call to FindTails, this restriction no longer exists. - + Creating trees -------------- - + Building a tree from a list of key-value pairs is performed by the TCompactTrieBuilder class described in the comptrie_builder.h file. - + This class allows you to add words to a tree one at a time, merge a complete subtree, and also use an unfinished tree as a map. - + An important optimization is the prefix-grouped mode when you need to add keys in a certain order (for details, see the comments in the header file). The resulting tree is compactly packed while keys are being added, and the memory consumption is approximately the same as for the completed tree. For the default mode, compact stacking is turned on at the very end, and the data consumes quite a lot of memory up until that point. - + Optimizing trees ---------------- - + After a tree is created, there are two optimizing operations that can be applied: - Minimization to a DAG by merging equal subtrees. - Fast memory layout. @@ -71,94 +71,94 @@ The functions that implement these operations are declared in the comptrie_build optimization is implemented by the CompactTrieMinimize function, and the second is implemented by CompactTrieMakeFastLayout. You can perform both at once by calling the CompactTrieMinimizeAndMakeFastLayout function. - + ### Minimization ### - + Minimization to a DAG requires quite a lot of time and a large amount of memory to store an array of subtrees, but it can reduce the size of the tree several times over (an example is a language model that has many low-frequency phrases with repeated last words and frequency counts). However, if you know in advance that there are no duplicate values in the tree, you don't need to waste time on it, since the minimization won't have any effect on the tree. - + ### Fast memory layout ### - + The second optimization function results in fewer cache misses, but it causes the tree to grow in size. Our experience has shown a 5% gain in speed for some tries. The algorithm consumes about three times more memory than the amount required for the source tree. So if the machine has enough memory to assemble a tree, it does not neccessarily mean that it has enough memory to run the algorithm. To learn about the theory behind this algorithm, read the comments before the declaration of the CompactTrieMinimize function. - + Serializing trees ----------------- - + The tree resides in memory as a sequence of nodes. Links to other nodes are always counted relative to the position of the current node. This allows you to save a tree to disk as it is and then re-load it using mmap(). The TCompactTrie class has the TBlob constructor for reading a tree from disk. The TCompactTrieBuilder class has Save/SaveToFile methods for writing a built tree to a stream or a file. - + Accessing trees --------------- - + As a rule, all methods that accept a key as input have two variants: - One takes the key in the format: pointer to the beginning of the key, length. - The other takes a high-level type like TStringBuf. - + You can get a value for a key in the tree – TCompactTrie::Find returns false if there is no key, and TCompactTrie::Get throws an exception. You can use FindPrefix methods to find the longest key prefix in a tree and get the corresponding value for it. You can also use a single FindPhrases request to get values for all the beginnings of a phrase with a given word delimiter. - + An important operation that distinguishes a tree from a simple map is implemented in the FindTails method, which allows you to obtain a subtree consisting of all possible extensions of the given prefix. - + Iterators for trees ------------------- - + First of all, there is a typical map iterator over all key-value pairs called TConstIterator. A tree has three methods that return it: Begin, End, and UpperBound. The latter takes a key as input and returns an iterator to the smallest key that is not smaller than the input key. - + The rest of the iterators are not so widely used, and thus are located in separate files. - + TPrefixIterator is defined in the prefix_iterator.h file. It allows iterations over all the prefixes of this key available in the tree. - + TSearchIterator is defined in the search_iterator.h file. It allows you to enter a key in a tree one character at a time and see where it ends up. The following character can be selected depending on the current result. You can also copy the iterator and proceed on two different paths. You can actually achieve the same result with repeated use of the FindTails method, but the authors of this iterator claim that they obtained a performance gain with it. - + Appendix. Memory implementation details --------------------------------------- - + *If you are not going to modify the library, then you do not need to read further.* - + First, if the character type has a size larger than 1 byte, then all keys that use these characters are converted to byte strings in the big-endian way. This means that character bytes are written in a string from the most significant to the least significant from left to right. Thus it is reduced to the case when the character in use is 'char'. - + The tree resides in memory as a series of consecutive nodes. The nodes can have different sizes, so the only way to identify the boundaries of nodes is by passing the entire tree. - + ### Node structure ### - + The structure of a node, as can be understood from thoughtfully reading the LeapByte function in Comptrie_impl.h, is the following: - The first byte is for service flags. - The second byte is a character (unless it is the ε-link type of node described below, which has from 1 to 7 bytes of offset distance from the beginning of this node to the content node, and nothing else). - + Thus, the size of any node is at least 2 bytes. All other elements of a node are optional. Next there is from 0 to 7 bytes of the packed offset from the beginning of this node to the beginning of the root node of a subtree with the younger @@ -168,19 +168,19 @@ Next comes the packed value in this node. Its size is not limited, but you may recall that the packer allows you to quickly determine this size using a pointer to the beginning of the packed value. Then, if the service flags indicate that the tree has children, there is a root node of the subtree of children. - -The packed offset is restricted to 7 bytes, and this gives us a limit on the largest -possible size of a tree. You need to study the packer code to understand + +The packed offset is restricted to 7 bytes, and this gives us a limit on the largest +possible size of a tree. You need to study the packer code to understand the exact limit. - + All packed offsets are nonnegative, meaning that roots of subtrees with siblings and the node pointed to by the ε-link must be located strictly to the right of the current node in memory. This does not allow placement of finite state machines with oriented cycles in the comptrie. But it does allow you to effectively stack the comptrie from right to left. - + ### Service flags ### - + The byte of service flags contains (as shown by the constants at the beginning of the comptrie_impl.h file): - 1 bit of MT_NEXT, indicating whether this node has children. @@ -191,9 +191,9 @@ the comptrie_impl.h file): offset to a subtree with younger siblings. If one of these subtrees is not present, then the size of the corresponding packed offset is 0, and vice versa. - + ### ε-links ### - + These nodes only occur if we optimized a tree into a DAG and got two nodes with merged subtrees of children. Since the offset to the subtree of children can't be specified and the root of this subtree should lie just after the value, we have @@ -201,13 +201,13 @@ to add a node of the ε-link type, which contains the offset to the root subtree children and nothing more. This applies to all nodes that have equal subtrees of children, except the rightmost node. The size of this offset is set in 3 bits of MT_SIZEMASK flags for a node. - + As the implementation of the IsEpsilonLink function in comptrie_impl.h demonstrates, the ε-link differs from other nodes in that it does not have the MT_NEXT flag or the MT_FINAL flag, so it can always be identified by the flags. Of course, the best programming practice is to call the function itself instead of examining the flags. - + Note that the ε-link flags do not use the MT_SIZEMASK << MT_LEFTSHIFT` bits, which allows us to start using ε-links for some other purpose. diff --git a/library/cpp/containers/comptrie/chunked_helpers_trie.h b/library/cpp/containers/comptrie/chunked_helpers_trie.h index cfa35f5ba2..68c568f9f3 100644 --- a/library/cpp/containers/comptrie/chunked_helpers_trie.h +++ b/library/cpp/containers/comptrie/chunked_helpers_trie.h @@ -30,7 +30,7 @@ private: public: TTrieSetWriter(bool isSorted = sorted) - : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) + : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) { } @@ -118,7 +118,7 @@ private: public: TTrieMapWriter(bool isSorted = sorted) - : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) + : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) #ifndef NDEBUG , IsSorted(isSorted) #endif diff --git a/library/cpp/containers/comptrie/comptrie_builder.cpp b/library/cpp/containers/comptrie/comptrie_builder.cpp index 28a1e41dd2..6bc9b106fd 100644 --- a/library/cpp/containers/comptrie/comptrie_builder.cpp +++ b/library/cpp/containers/comptrie/comptrie_builder.cpp @@ -1 +1 @@ -#include "comptrie_builder.h" +#include "comptrie_builder.h" diff --git a/library/cpp/containers/comptrie/comptrie_builder.h b/library/cpp/containers/comptrie/comptrie_builder.h index cf7d2e39a3..239e1d856d 100644 --- a/library/cpp/containers/comptrie/comptrie_builder.h +++ b/library/cpp/containers/comptrie/comptrie_builder.h @@ -1,39 +1,39 @@ #pragma once -#include "comptrie_packer.h" -#include "minimize.h" -#include "key_selector.h" - -#include <util/stream/file.h> - +#include "comptrie_packer.h" +#include "minimize.h" +#include "key_selector.h" + +#include <util/stream/file.h> + // -------------------------------------------------------------------------------------- // Data Builder // To build the data buffer, we first create an automaton in memory. The automaton -// is created incrementally. It actually helps a lot to have the input data prefix-grouped +// is created incrementally. It actually helps a lot to have the input data prefix-grouped // by key; otherwise, memory consumption becomes a tough issue. // NOTE: building and serializing the automaton may be lengthy, and takes lots of memory. -// PREFIX_GROUPED means that if we, while constructing a trie, add to the builder two keys with the same prefix, -// then all the keys that we add between these two also have the same prefix. -// Actually in this mode the builder can accept even more freely ordered input, -// but for input as above it is guaranteed to work. +// PREFIX_GROUPED means that if we, while constructing a trie, add to the builder two keys with the same prefix, +// then all the keys that we add between these two also have the same prefix. +// Actually in this mode the builder can accept even more freely ordered input, +// but for input as above it is guaranteed to work. enum ECompactTrieBuilderFlags { - CTBF_NONE = 0, - CTBF_PREFIX_GROUPED = 1 << 0, - CTBF_VERBOSE = 1 << 1, - CTBF_UNIQUE = 1 << 2, -}; - + CTBF_NONE = 0, + CTBF_PREFIX_GROUPED = 1 << 0, + CTBF_VERBOSE = 1 << 1, + CTBF_UNIQUE = 1 << 2, +}; + using TCompactTrieBuilderFlags = ECompactTrieBuilderFlags; inline TCompactTrieBuilderFlags operator|(TCompactTrieBuilderFlags first, TCompactTrieBuilderFlags second) { - return static_cast<TCompactTrieBuilderFlags>(static_cast<int>(first) | second); -} - + return static_cast<TCompactTrieBuilderFlags>(static_cast<int>(first) | second); +} + inline TCompactTrieBuilderFlags& operator|=(TCompactTrieBuilderFlags& first, TCompactTrieBuilderFlags second) { - return first = first | second; -} - + return first = first | second; +} + template <typename T> class TArrayWithSizeHolder; @@ -43,20 +43,20 @@ public: typedef T TSymbol; typedef D TData; typedef S TPacker; - typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; - typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; + typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; + typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; explicit TCompactTrieBuilder(TCompactTrieBuilderFlags flags = CTBF_NONE, TPacker packer = TPacker(), IAllocator* alloc = TDefaultAllocator::Instance()); - // All Add.. methods return true if it was a new key, false if the key already existed. - - bool Add(const TSymbol* key, size_t keylen, const TData& value); + // All Add.. methods return true if it was a new key, false if the key already existed. + + bool Add(const TSymbol* key, size_t keylen, const TData& value); bool Add(const TKeyBuf& key, const TData& value) { return Add(key.data(), key.size(), value); } // add already serialized data - bool AddPtr(const TSymbol* key, size_t keylen, const char* data); + bool AddPtr(const TSymbol* key, size_t keylen, const char* data); bool AddPtr(const TKeyBuf& key, const char* data) { return AddPtr(key.data(), key.size(), data); } @@ -103,51 +103,51 @@ protected: THolder<TCompactTrieBuilderImpl> Impl; }; -//---------------------------------------------------------------------------------------------------------------------- -// Minimize the trie. The result is equivalent to the original -// trie, except that it takes less space (and has marginally lower -// performance, because of eventual epsilon links). -// The algorithm is as follows: starting from the largest pieces, we find -// nodes that have identical continuations (Daciuk's right language), -// and repack the trie. Repacking is done in-place, so memory is less -// of an issue; however, it may take considerable time. - -// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. -// Because of non-local structure and epsilon links, it won't work -// as you expect it to, and can destroy the trie in the making. +//---------------------------------------------------------------------------------------------------------------------- +// Minimize the trie. The result is equivalent to the original +// trie, except that it takes less space (and has marginally lower +// performance, because of eventual epsilon links). +// The algorithm is as follows: starting from the largest pieces, we find +// nodes that have identical continuations (Daciuk's right language), +// and repack the trie. Repacking is done in-place, so memory is less +// of an issue; however, it may take considerable time. + +// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. +// Because of non-local structure and epsilon links, it won't work +// as you expect it to, and can destroy the trie in the making. // If you want both minimization and fast layout, do the minimization first. - + template <class TPacker> size_t CompactTrieMinimize(IOutputStream& os, const char* data, size_t datalength, bool verbose = false, const TPacker& packer = TPacker(), NCompactTrie::EMinimizeMode mode = NCompactTrie::MM_DEFAULT); template <class TTrieBuilder> size_t CompactTrieMinimize(IOutputStream& os, const TTrieBuilder& builder, bool verbose = false); -//---------------------------------------------------------------------------------------------------------------- -// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. -// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. -// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. -// Calling it the second time on the same trie does nothing. -// -// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms -// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels -// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). -// The original paper (describing the layout in Section 2.1) is: +//---------------------------------------------------------------------------------------------------------------- +// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. +// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. +// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. +// Calling it the second time on the same trie does nothing. +// +// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms +// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels +// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). +// The original paper (describing the layout in Section 2.1) is: // Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341-358. -// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ +// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ // Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium // on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12-14, 2000, pages 399-409. -// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ -// (there is not much difference between these papers, actually). -// -template <class TPacker> +// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ +// (there is not much difference between these papers, actually). +// +template <class TPacker> size_t CompactTrieMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose = false, const TPacker& packer = TPacker()); - -template <class TTrieBuilder> + +template <class TTrieBuilder> size_t CompactTrieMakeFastLayout(IOutputStream& os, const TTrieBuilder& builder, bool verbose = false); - + // Composition of minimization and fast layout template <class TPacker> size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose = false, const TPacker& packer = TPacker()); @@ -155,5 +155,5 @@ size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const char* data, template <class TTrieBuilder> size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const TTrieBuilder& builder, bool verbose = false); -// Implementation details moved here. -#include "comptrie_builder.inl" +// Implementation details moved here. +#include "comptrie_builder.inl" diff --git a/library/cpp/containers/comptrie/comptrie_builder.inl b/library/cpp/containers/comptrie/comptrie_builder.inl index f273fa6571..161115edfe 100644 --- a/library/cpp/containers/comptrie/comptrie_builder.inl +++ b/library/cpp/containers/comptrie/comptrie_builder.inl @@ -2,26 +2,26 @@ #include "comptrie_impl.h" #include "comptrie_trie.h" -#include "make_fast_layout.h" -#include "array_with_size.h" - +#include "make_fast_layout.h" +#include "array_with_size.h" + #include <library/cpp/containers/compact_vector/compact_vector.h> - + #include <util/memory/alloc.h> -#include <util/memory/blob.h> -#include <util/memory/pool.h> -#include <util/memory/tempbuf.h> -#include <util/memory/smallobj.h> -#include <util/generic/algorithm.h> -#include <util/generic/buffer.h> +#include <util/memory/blob.h> +#include <util/memory/pool.h> +#include <util/memory/tempbuf.h> +#include <util/memory/smallobj.h> +#include <util/generic/algorithm.h> +#include <util/generic/buffer.h> #include <util/generic/strbuf.h> - -#include <util/system/align.h> + +#include <util/system/align.h> #include <util/stream/buffer.h> - -#define CONSTEXPR_MAX2(a, b) (a) > (b) ? (a) : (b) -#define CONSTEXPR_MAX3(a, b, c) CONSTEXPR_MAX2(CONSTEXPR_MAX2(a, b), c) - + +#define CONSTEXPR_MAX2(a, b) (a) > (b) ? (a) : (b) +#define CONSTEXPR_MAX3(a, b, c) CONSTEXPR_MAX2(CONSTEXPR_MAX2(a, b), c) + // TCompactTrieBuilder::TCompactTrieBuilderImpl template <class T, class D, class S> @@ -33,7 +33,7 @@ protected: class TNode; class TArc; TNode* Root; - TCompactTrieBuilderFlags Flags; + TCompactTrieBuilderFlags Flags; size_t EntryCount; size_t NodeCount; TPacker Packer; @@ -73,11 +73,11 @@ public: void DestroyNode(TNode* node); void NodeReleasePayload(TNode* thiz); - char* AddEntryForData(const TSymbol* key, size_t keylen, size_t dataLen, bool& isNewAddition); - TNode* AddEntryForSomething(const TSymbol* key, size_t keylen, bool& isNewAddition); + char* AddEntryForData(const TSymbol* key, size_t keylen, size_t dataLen, bool& isNewAddition); + TNode* AddEntryForSomething(const TSymbol* key, size_t keylen, bool& isNewAddition); - bool AddEntry(const TSymbol* key, size_t keylen, const TData& value); - bool AddEntryPtr(const TSymbol* key, size_t keylen, const char* value); + bool AddEntry(const TSymbol* key, size_t keylen, const TData& value); + bool AddEntryPtr(const TSymbol* key, size_t keylen, const char* value); bool AddSubtreeInFile(const TSymbol* key, size_t keylen, const TString& fileName); bool AddSubtreeInBuffer(const TSymbol* key, size_t keylen, TArrayWithSizeHolder<char>&& buffer); bool FindEntry(const TSymbol* key, size_t keylen, TData* value) const; @@ -131,7 +131,7 @@ public: class TArcSet: public ISubtree, public TCompactVector<TArc> { public: - typedef typename TCompactVector<TArc>::iterator iterator; + typedef typename TCompactVector<TArc>::iterator iterator; typedef typename TCompactVector<TArc>::const_iterator const_iterator; TArcSet() { @@ -334,7 +334,7 @@ public: }; union { - char ArcsData[CONSTEXPR_MAX3(sizeof(TArcSet), sizeof(TBufferedSubtree), sizeof(TSubtreeInFile))]; + char ArcsData[CONSTEXPR_MAX3(sizeof(TArcSet), sizeof(TBufferedSubtree), sizeof(TSubtreeInFile))]; union { void* Data1; long long int Data2; @@ -426,18 +426,18 @@ TCompactTrieBuilder<T, D, S>::TCompactTrieBuilder(TCompactTrieBuilderFlags flags } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::Add(const TSymbol* key, size_t keylen, const TData& value) { - return Impl->AddEntry(key, keylen, value); +bool TCompactTrieBuilder<T, D, S>::Add(const TSymbol* key, size_t keylen, const TData& value) { + return Impl->AddEntry(key, keylen, value); } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::AddPtr(const TSymbol* key, size_t keylen, const char* value) { - return Impl->AddEntryPtr(key, keylen, value); +bool TCompactTrieBuilder<T, D, S>::AddPtr(const TSymbol* key, size_t keylen, const char* value) { + return Impl->AddEntryPtr(key, keylen, value); } template <class T, class D, class S> bool TCompactTrieBuilder<T, D, S>::AddSubtreeInFile(const TSymbol* key, size_t keylen, const TString& fileName) { - return Impl->AddSubtreeInFile(key, keylen, fileName); + return Impl->AddSubtreeInFile(key, keylen, fileName); } template <class T, class D, class S> @@ -488,7 +488,7 @@ TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TCompactTrieBuilderImpl(T : Pool(1000000, TMemoryPool::TLinearGrow::Instance(), alloc) , PayloadSize(sizeof(void*)) // XXX: find better value , NodeAllocator(new TFixedSizeAllocator(sizeof(TNode) + PayloadSize, alloc)) - , Flags(flags) + , Flags(flags) , EntryCount(0) , NodeCount(1) , Packer(packer) @@ -543,40 +543,40 @@ void TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::NodeReleasePayload(T } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntry( +bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntry( const TSymbol* key, size_t keylen, const TData& value) { size_t datalen = Packer.MeasureLeaf(value); - bool isNewAddition = false; - char* place = AddEntryForData(key, keylen, datalen, isNewAddition); + bool isNewAddition = false; + char* place = AddEntryForData(key, keylen, datalen, isNewAddition); Packer.PackLeaf(place, value, datalen); - return isNewAddition; + return isNewAddition; } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryPtr( +bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryPtr( const TSymbol* key, size_t keylen, const char* value) { size_t datalen = Packer.SkipLeaf(value); - bool isNewAddition = false; - char* place = AddEntryForData(key, keylen, datalen, isNewAddition); + bool isNewAddition = false; + char* place = AddEntryForData(key, keylen, datalen, isNewAddition); memcpy(place, value, datalen); - return isNewAddition; + return isNewAddition; } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddSubtreeInFile( +bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddSubtreeInFile( const TSymbol* key, size_t keylen, const TString& fileName) { typedef typename TNode::ISubtree ISubtree; typedef typename TNode::TSubtreeInFile TSubtreeInFile; - bool isNewAddition = false; - TNode* node = AddEntryForSomething(key, keylen, isNewAddition); + bool isNewAddition = false; + TNode* node = AddEntryForSomething(key, keylen, isNewAddition); node->Subtree()->Destroy(this); node->Subtree()->~ISubtree(); new (node->Subtree()) TSubtreeInFile(fileName); - return isNewAddition; + return isNewAddition; } template <class T, class D, class S> @@ -599,56 +599,56 @@ bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddSubtreeInBuffer( template <class T, class D, class S> typename TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TNode* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryForSomething( - const TSymbol* key, size_t keylen, bool& isNewAddition) { + const TSymbol* key, size_t keylen, bool& isNewAddition) { using namespace NCompactTrie; EntryCount++; - if (Flags & CTBF_VERBOSE) + if (Flags & CTBF_VERBOSE) ShowProgress(EntryCount); TNode* current = Root; size_t passed; - // Special case of empty key: replace it by 1-byte "\0" key. - size_t ckeylen = keylen ? keylen * sizeof(TSymbol) : 1; - TTempBuf ckeybuf(ckeylen); + // Special case of empty key: replace it by 1-byte "\0" key. + size_t ckeylen = keylen ? keylen * sizeof(TSymbol) : 1; + TTempBuf ckeybuf(ckeylen); if (keylen == 0) { - ckeybuf.Append("\0", 1); + ckeybuf.Append("\0", 1); } else { ConvertSymbolArrayToChar(key, keylen, ckeybuf, ckeylen); - } + } - char* ckey = ckeybuf.Data(); + char* ckey = ckeybuf.Data(); - TNode* next; + TNode* next; while ((ckeylen > 0) && (next = NodeForwardAdd(current, ckey, ckeylen, passed, &NodeCount)) != nullptr) { - current = next; - ckeylen -= passed; - ckey += passed; - } - - if (ckeylen != 0) { - //new leaf - NodeCount++; - TNode* leaf = new (*NodeAllocator) TNode(); - NodeLinkTo(current, TBlob::Copy(ckey, ckeylen), leaf); - current = leaf; + current = next; + ckeylen -= passed; + ckey += passed; } - isNewAddition = (current->PayloadType == DATA_ABSENT); - if ((Flags & CTBF_UNIQUE) && !isNewAddition) - ythrow yexception() << "Duplicate key"; + + if (ckeylen != 0) { + //new leaf + NodeCount++; + TNode* leaf = new (*NodeAllocator) TNode(); + NodeLinkTo(current, TBlob::Copy(ckey, ckeylen), leaf); + current = leaf; + } + isNewAddition = (current->PayloadType == DATA_ABSENT); + if ((Flags & CTBF_UNIQUE) && !isNewAddition) + ythrow yexception() << "Duplicate key"; return current; } template <class T, class D, class S> -char* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryForData(const TSymbol* key, size_t keylen, - size_t datalen, bool& isNewAddition) { - TNode* current = AddEntryForSomething(key, keylen, isNewAddition); +char* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryForData(const TSymbol* key, size_t keylen, + size_t datalen, bool& isNewAddition) { + TNode* current = AddEntryForSomething(key, keylen, isNewAddition); NodeReleasePayload(current); if (datalen <= PayloadSize) { current->PayloadType = DATA_INSIDE; - } else if (Flags & CTBF_PREFIX_GROUPED) { + } else if (Flags & CTBF_PREFIX_GROUPED) { current->PayloadType = DATA_MALLOCED; current->PayloadAsPtr() = new char[datalen]; } else { @@ -782,9 +782,9 @@ typename TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TNode* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::NodeForwardAdd( TNode* thiz, const char* label, size_t len, size_t& passed, size_t* nodeCount) { typename TNode::TArcSet* arcSet = dynamic_cast<typename TNode::TArcSet*>(thiz->Subtree()); - if (!arcSet) - ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; - + if (!arcSet) + ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; + typename TNode::TArcSet::iterator it = arcSet->Find(*label); if (it != arcSet->end()) { @@ -814,10 +814,10 @@ template <class T, class D, class S> void TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::NodeLinkTo(TNode* thiz, const TBlob& label, TNode* node) { typename TNode::TArcSet* arcSet = dynamic_cast<typename TNode::TArcSet*>(thiz->Subtree()); if (!arcSet) - ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; + ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; // Buffer the node at the last arc - if ((Flags & CTBF_PREFIX_GROUPED) && !arcSet->empty()) + if ((Flags & CTBF_PREFIX_GROUPED) && !arcSet->empty()) NodeBufferSubtree(arcSet->back().Node); arcSet->Add(label, node); @@ -1047,19 +1047,19 @@ const typename TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TNode* // Different -//---------------------------------------------------------------------------------------------------------------------- -// Minimize the trie. The result is equivalent to the original -// trie, except that it takes less space (and has marginally lower -// performance, because of eventual epsilon links). -// The algorithm is as follows: starting from the largest pieces, we find -// nodes that have identical continuations (Daciuk's right language), -// and repack the trie. Repacking is done in-place, so memory is less -// of an issue; however, it may take considerable time. - -// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. -// Because of non-local structure and epsilon links, it won't work -// as you expect it to, and can destroy the trie in the making. - +//---------------------------------------------------------------------------------------------------------------------- +// Minimize the trie. The result is equivalent to the original +// trie, except that it takes less space (and has marginally lower +// performance, because of eventual epsilon links). +// The algorithm is as follows: starting from the largest pieces, we find +// nodes that have identical continuations (Daciuk's right language), +// and repack the trie. Repacking is done in-place, so memory is less +// of an issue; however, it may take considerable time. + +// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. +// Because of non-local structure and epsilon links, it won't work +// as you expect it to, and can destroy the trie in the making. + template <class TPacker> size_t CompactTrieMinimize(IOutputStream& os, const char* data, size_t datalength, bool verbose /*= false*/, const TPacker& packer /*= TPacker()*/, NCompactTrie::EMinimizeMode mode) { using namespace NCompactTrie; @@ -1072,38 +1072,38 @@ size_t CompactTrieMinimize(IOutputStream& os, const TTrieBuilder& builder, bool size_t len = builder.Save(buftmp); return CompactTrieMinimize<typename TTrieBuilder::TPacker>(os, buftmp.Buffer().Data(), len, verbose); } - -//---------------------------------------------------------------------------------------------------------------- -// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. -// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. -// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. -// Calling it the second time on the same trie does nothing. -// -// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms -// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels -// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). -// The original paper (describing the layout in Section 2.1) is: + +//---------------------------------------------------------------------------------------------------------------- +// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. +// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. +// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. +// Calling it the second time on the same trie does nothing. +// +// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms +// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels +// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). +// The original paper (describing the layout in Section 2.1) is: // Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341-358. -// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ +// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ // Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium // on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12-14, 2000, pages 399-409. -// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ -// (there is not much difference between these papers, actually). -// -template <class TPacker> +// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ +// (there is not much difference between these papers, actually). +// +template <class TPacker> size_t CompactTrieMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose /*= false*/, const TPacker& packer /*= TPacker()*/) { - using namespace NCompactTrie; - return CompactTrieMakeFastLayoutImpl(os, data, datalength, verbose, &packer); -} - -template <class TTrieBuilder> + using namespace NCompactTrie; + return CompactTrieMakeFastLayoutImpl(os, data, datalength, verbose, &packer); +} + +template <class TTrieBuilder> size_t CompactTrieMakeFastLayout(IOutputStream& os, const TTrieBuilder& builder, bool verbose /*=false*/) { - TBufferStream buftmp; - size_t len = builder.Save(buftmp); - return CompactTrieMakeFastLayout<typename TTrieBuilder::TPacker>(os, buftmp.Buffer().Data(), len, verbose); -} + TBufferStream buftmp; + size_t len = builder.Save(buftmp); + return CompactTrieMakeFastLayout<typename TTrieBuilder::TPacker>(os, buftmp.Buffer().Data(), len, verbose); +} template <class TPacker> size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose/*=false*/, const TPacker& packer/*= TPacker()*/) { diff --git a/library/cpp/containers/comptrie/comptrie_impl.cpp b/library/cpp/containers/comptrie/comptrie_impl.cpp index a116ab6d1e..0d48a8f5d3 100644 --- a/library/cpp/containers/comptrie/comptrie_impl.cpp +++ b/library/cpp/containers/comptrie/comptrie_impl.cpp @@ -1,7 +1,7 @@ -#include "comptrie_impl.h" +#include "comptrie_impl.h" #include <util/system/rusage.h> -#include <util/stream/output.h> +#include <util/stream/output.h> // Unpack the leaf value. The algorithm can store up to 8 full bytes in leafs. diff --git a/library/cpp/containers/comptrie/comptrie_impl.h b/library/cpp/containers/comptrie/comptrie_impl.h index f41c38311a..09a3554e2d 100644 --- a/library/cpp/containers/comptrie/comptrie_impl.h +++ b/library/cpp/containers/comptrie/comptrie_impl.h @@ -30,17 +30,17 @@ namespace NCompactTrie { return !(flags & (MT_FINAL | MT_NEXT)); } - static inline void TraverseEpsilon(const char*& datapos) { - const char flags = *datapos; - if (!IsEpsilonLink(flags)) { - return; - } - const size_t offsetlength = flags & MT_SIZEMASK; - const size_t offset = UnpackOffset(datapos + 1, offsetlength); + static inline void TraverseEpsilon(const char*& datapos) { + const char flags = *datapos; + if (!IsEpsilonLink(flags)) { + return; + } + const size_t offsetlength = flags & MT_SIZEMASK; + const size_t offset = UnpackOffset(datapos + 1, offsetlength); Y_ASSERT(offset); - datapos += offset; - } - + datapos += offset; + } + static inline size_t LeftOffsetLen(const char flags) { return (flags >> MT_LEFTSHIFT) & MT_SIZEMASK; } @@ -176,12 +176,12 @@ namespace NCompactTrie { return 0; } - // Auxiliary function: consumes one (multibyte) symbol from the input. - // Advances the data pointer to the root of the subtrie beginning after the symbol, - // zeroes it if this subtrie is empty. - // If there is a value associated with the symbol, makes the value pointer point to it, + // Auxiliary function: consumes one (multibyte) symbol from the input. + // Advances the data pointer to the root of the subtrie beginning after the symbol, + // zeroes it if this subtrie is empty. + // If there is a value associated with the symbol, makes the value pointer point to it, // otherwise sets it to nullptr. - // Returns true if the symbol was succesfully found in the trie, false otherwise. + // Returns true if the symbol was succesfully found in the trie, false otherwise. template <typename TSymbol, class TPacker> Y_FORCE_INLINE bool Advance(const char*& datapos, const char* const dataend, const char*& value, TSymbol label, TPacker packer) { diff --git a/library/cpp/containers/comptrie/comptrie_packer.h b/library/cpp/containers/comptrie/comptrie_packer.h index 0341eeeae3..b26335e76e 100644 --- a/library/cpp/containers/comptrie/comptrie_packer.h +++ b/library/cpp/containers/comptrie/comptrie_packer.h @@ -1,21 +1,21 @@ -#pragma once - +#pragma once + #include <library/cpp/packers/packers.h> - -template <class T> -class TCompactTriePacker { -public: - void UnpackLeaf(const char* p, T& t) const { - NPackers::TPacker<T>().UnpackLeaf(p, t); - } - void PackLeaf(char* buffer, const T& data, size_t computedSize) const { - NPackers::TPacker<T>().PackLeaf(buffer, data, computedSize); - } - size_t MeasureLeaf(const T& data) const { - return NPackers::TPacker<T>().MeasureLeaf(data); - } - size_t SkipLeaf(const char* p) const // this function better be fast because it is very frequently used - { - return NPackers::TPacker<T>().SkipLeaf(p); - } -}; + +template <class T> +class TCompactTriePacker { +public: + void UnpackLeaf(const char* p, T& t) const { + NPackers::TPacker<T>().UnpackLeaf(p, t); + } + void PackLeaf(char* buffer, const T& data, size_t computedSize) const { + NPackers::TPacker<T>().PackLeaf(buffer, data, computedSize); + } + size_t MeasureLeaf(const T& data) const { + return NPackers::TPacker<T>().MeasureLeaf(data); + } + size_t SkipLeaf(const char* p) const // this function better be fast because it is very frequently used + { + return NPackers::TPacker<T>().SkipLeaf(p); + } +}; diff --git a/library/cpp/containers/comptrie/comptrie_trie.h b/library/cpp/containers/comptrie/comptrie_trie.h index 40ec1e52b3..3875dca513 100644 --- a/library/cpp/containers/comptrie/comptrie_trie.h +++ b/library/cpp/containers/comptrie/comptrie_trie.h @@ -1,33 +1,33 @@ #pragma once #include "comptrie_impl.h" -#include "comptrie_packer.h" -#include "opaque_trie_iterator.h" -#include "leaf_skipper.h" -#include "key_selector.h" - -#include <util/generic/buffer.h> -#include <util/generic/ptr.h> -#include <util/generic/vector.h> -#include <util/generic/yexception.h> -#include <util/memory/blob.h> +#include "comptrie_packer.h" +#include "opaque_trie_iterator.h" +#include "leaf_skipper.h" +#include "key_selector.h" + +#include <util/generic/buffer.h> +#include <util/generic/ptr.h> +#include <util/generic/vector.h> +#include <util/generic/yexception.h> +#include <util/memory/blob.h> #include <util/stream/input.h> #include <utility> - + template <class T, class D, class S> class TCompactTrieBuilder; -namespace NCompactTrie { - template <class TTrie> - class TFirstSymbolIterator; -} - -template <class TTrie> -class TSearchIterator; - -template <class TTrie> -class TPrefixIterator; - +namespace NCompactTrie { + template <class TTrie> + class TFirstSymbolIterator; +} + +template <class TTrie> +class TSearchIterator; + +template <class TTrie> +class TPrefixIterator; + // in case of <char> specialization cannot distinguish between "" and "\0" keys template <class T = char, class D = ui64, class S = TCompactTriePacker<D>> class TCompactTrie { @@ -36,8 +36,8 @@ public: typedef D TData; typedef S TPacker; - typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; - typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; + typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; + typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; typedef std::pair<TKey, TData> TValueType; typedef std::pair<size_t, TData> TPhraseMatch; @@ -47,9 +47,9 @@ public: protected: TBlob DataHolder; - const char* EmptyValue = nullptr; + const char* EmptyValue = nullptr; TPacker Packer; - NCompactTrie::TPackerLeafSkipper<TPacker> Skipper = &Packer; // This should be true for every constructor. + NCompactTrie::TPackerLeafSkipper<TPacker> Skipper = &Packer; // This should be true for every constructor. public: TCompactTrie() = default; @@ -64,12 +64,12 @@ public: : TCompactTrie{data, TPacker{}} { } - // Skipper should be initialized with &Packer, not with &other.Packer, so you have to redefine these. - TCompactTrie(const TCompactTrie& other); + // Skipper should be initialized with &Packer, not with &other.Packer, so you have to redefine these. + TCompactTrie(const TCompactTrie& other); TCompactTrie(TCompactTrie&& other) noexcept; - TCompactTrie& operator=(const TCompactTrie& other); + TCompactTrie& operator=(const TCompactTrie& other); TCompactTrie& operator=(TCompactTrie&& other) noexcept; - + explicit operator bool() const { return !IsEmpty(); } @@ -106,18 +106,18 @@ public: return DataHolder; }; - const NCompactTrie::ILeafSkipper& GetSkipper() const { - return Skipper; - } - + const NCompactTrie::ILeafSkipper& GetSkipper() const { + return Skipper; + } + TPacker GetPacker() const { return Packer; } - bool HasCorrectSkipper() const { - return Skipper.GetPacker() == &Packer; - } - + bool HasCorrectSkipper() const { + return Skipper.GetPacker() == &Packer; + } + void FindPhrases(const TSymbol* key, size_t keylen, TPhraseMatchVector& matches, TSymbol separator = TSymbol(' ')) const; void FindPhrases(const TKeyBuf& key, TPhraseMatchVector& matches, TSymbol separator = TSymbol(' ')) const { return FindPhrases(key.data(), key.size(), matches, separator); @@ -131,11 +131,11 @@ public: inline TCompactTrie<T, D, S> FindTails(const TSymbol* key, size_t keylen) const; TCompactTrie<T, D, S> FindTails(const TKeyBuf& key) const { return FindTails(key.data(), key.size()); - } + } bool FindTails(const TSymbol* key, size_t keylen, TCompactTrie<T, D, S>& res) const; bool FindTails(const TKeyBuf& key, TCompactTrie<T, D, S>& res) const { return FindTails(key.data(), key.size(), res); - } + } // same as FindTails(&key, 1), a bit faster // return false, if no arc with @label exists @@ -143,11 +143,11 @@ public: class TConstIterator { private: - typedef NCompactTrie::TOpaqueTrieIterator TOpaqueTrieIterator; - typedef NCompactTrie::TOpaqueTrie TOpaqueTrie; + typedef NCompactTrie::TOpaqueTrieIterator TOpaqueTrieIterator; + typedef NCompactTrie::TOpaqueTrie TOpaqueTrie; friend class TCompactTrie; TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, TPacker packer); // only usable from Begin() and End() methods - TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer); // only usable from UpperBound() method + TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer); // only usable from UpperBound() method public: TConstIterator() = default; @@ -159,8 +159,8 @@ public: bool operator!=(const TConstIterator& other) const; TConstIterator& operator++(); TConstIterator operator++(int /*unused*/); - TConstIterator& operator--(); - TConstIterator operator--(int /*unused*/); + TConstIterator& operator--(); + TConstIterator operator--(int /*unused*/); TValueType operator*(); TKey GetKey() const; @@ -170,8 +170,8 @@ public: const char* GetValuePtr() const; private: - TPacker Packer; - TCopyPtr<TOpaqueTrieIterator> Impl; + TPacker Packer; + TCopyPtr<TOpaqueTrieIterator> Impl; }; TConstIterator Begin() const; @@ -179,20 +179,20 @@ public: TConstIterator End() const; TConstIterator end() const; - // Returns an iterator pointing to the smallest key in the trie >= the argument. + // Returns an iterator pointing to the smallest key in the trie >= the argument. // TODO: misleading name. Should be called LowerBound for consistency with stl. - // No. It is the STL that has a misleading name. - // LowerBound of X cannot be greater than X. + // No. It is the STL that has a misleading name. + // LowerBound of X cannot be greater than X. TConstIterator UpperBound(const TKeyBuf& key) const; - + void Print(IOutputStream& os); size_t Size() const; - friend class NCompactTrie::TFirstSymbolIterator<TCompactTrie>; - friend class TSearchIterator<TCompactTrie>; - friend class TPrefixIterator<TCompactTrie>; - + friend class NCompactTrie::TFirstSymbolIterator<TCompactTrie>; + friend class TSearchIterator<TCompactTrie>; + friend class TPrefixIterator<TCompactTrie>; + protected: explicit TCompactTrie(const char* emptyValue); TCompactTrie(const TBlob& data, const char* emptyValue, TPacker packer = TPacker()); @@ -231,7 +231,7 @@ TCompactTrie<T, D, S>::TCompactTrie(const TBlob& data, TPacker packer) template <class T, class D, class S> TCompactTrie<T, D, S>::TCompactTrie(const char* d, size_t len, TPacker packer) - : Packer(packer) + : Packer(packer) { Init(d, len, packer); } @@ -251,42 +251,42 @@ TCompactTrie<T, D, S>::TCompactTrie(const TBlob& data, const char* emptyValue, T } template <class T, class D, class S> -TCompactTrie<T, D, S>::TCompactTrie(const TCompactTrie& other) +TCompactTrie<T, D, S>::TCompactTrie(const TCompactTrie& other) : DataHolder(other.DataHolder) , EmptyValue(other.EmptyValue) , Packer(other.Packer) { } - -template <class T, class D, class S> + +template <class T, class D, class S> TCompactTrie<T, D, S>::TCompactTrie(TCompactTrie&& other) noexcept : DataHolder(std::move(other.DataHolder)) , EmptyValue(std::move(other.EmptyValue)) , Packer(std::move(other.Packer)) { } - -template <class T, class D, class S> + +template <class T, class D, class S> TCompactTrie<T, D, S>& TCompactTrie<T, D, S>::operator=(const TCompactTrie& other) { - if (this != &other) { - DataHolder = other.DataHolder; - EmptyValue = other.EmptyValue; - Packer = other.Packer; - } - return *this; -} - -template <class T, class D, class S> + if (this != &other) { + DataHolder = other.DataHolder; + EmptyValue = other.EmptyValue; + Packer = other.Packer; + } + return *this; +} + +template <class T, class D, class S> TCompactTrie<T, D, S>& TCompactTrie<T, D, S>::operator=(TCompactTrie&& other) noexcept { - if (this != &other) { + if (this != &other) { DataHolder = std::move(other.DataHolder); EmptyValue = std::move(other.EmptyValue); Packer = std::move(other.Packer); - } - return *this; -} - -template <class T, class D, class S> + } + return *this; +} + +template <class T, class D, class S> void TCompactTrie<T, D, S>::Init(const char* d, size_t len, TPacker packer) { Init(TBlob::NoCopy(d, len), packer); } @@ -298,7 +298,7 @@ void TCompactTrie<T, D, S>::Init(const TBlob& data, TPacker packer) { DataHolder = data; Packer = packer; - const char* datapos = DataHolder.AsCharPtr(); + const char* datapos = DataHolder.AsCharPtr(); size_t len = DataHolder.Length(); if (!len) return; @@ -337,7 +337,7 @@ bool TCompactTrie<T, D, S>::Find(const TSymbol* key, size_t keylen, TData* value template <class T, class D, class S> void TCompactTrie<T, D, S>::FindPhrases(const TSymbol* key, size_t keylen, TPhraseMatchVector& matches, TSymbol separator) const { - LookupPhrases(DataHolder.AsCharPtr(), DataHolder.Length(), key, keylen, matches, separator); + LookupPhrases(DataHolder.AsCharPtr(), DataHolder.Length(), key, keylen, matches, separator); } template <class T, class D, class S> @@ -397,7 +397,7 @@ inline bool TCompactTrie<T, D, S>::FindTails(TSymbol label, TCompactTrie<T, D, S if (!len) return false; - const char* datastart = DataHolder.AsCharPtr(); + const char* datastart = DataHolder.AsCharPtr(); const char* dataend = datastart + len; const char* datapos = datastart; const char* value = nullptr; @@ -417,8 +417,8 @@ inline bool TCompactTrie<T, D, S>::FindTails(TSymbol label, TCompactTrie<T, D, S template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::Begin() const { - NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); - return TConstIterator(self, EmptyValue, false, Packer); + NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); + return TConstIterator(self, EmptyValue, false, Packer); } template <class T, class D, class S> @@ -428,8 +428,8 @@ typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::begin() co template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::End() const { - NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); - return TConstIterator(self, EmptyValue, true, Packer); + NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); + return TConstIterator(self, EmptyValue, true, Packer); } template <class T, class D, class S> @@ -447,13 +447,13 @@ size_t TCompactTrie<T, D, S>::Size() const { template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::UpperBound(const TKeyBuf& key) const { - NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); - return TConstIterator(self, EmptyValue, key, Packer); -} - + NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); + return TConstIterator(self, EmptyValue, key, Packer); +} + template <class T, class D, class S> void TCompactTrie<T, D, S>::Print(IOutputStream& os) { - typedef typename ::TCompactTrieKeySelector<T>::TKeyBuf TSBuffer; + typedef typename ::TCompactTrieKeySelector<T>::TKeyBuf TSBuffer; for (TConstIterator it = Begin(); it != End(); ++it) { os << TSBuffer((*it).first.data(), (*it).first.size()) << "\t" << (*it).second << Endl; } @@ -478,7 +478,7 @@ template <class T, class D, class S> bool TCompactTrie<T, D, S>::LookupLongestPrefix(const TSymbol* key, size_t keylen, size_t& prefixLen, const char*& valuepos, bool& hasNext) const { using namespace NCompactTrie; - const char* datapos = DataHolder.AsCharPtr(); + const char* datapos = DataHolder.AsCharPtr(); size_t len = DataHolder.Length(); prefixLen = 0; @@ -540,18 +540,18 @@ void TCompactTrie<T, D, S>::LookupPhrases( const T* const keystart = key; const T* const keyend = key + keylen; const char* const dataend = datapos + len; - while (datapos && key != keyend) { + while (datapos && key != keyend) { T label = *(key++); - const char* value = nullptr; - if (!Advance(datapos, dataend, value, label, Packer)) { - return; - } - if (value && (key == keyend || *key == separator)) { - size_t matchlength = (size_t)(key - keystart); - D data; - Packer.UnpackLeaf(value, data); - matches.push_back(TPhraseMatch(matchlength, data)); + const char* value = nullptr; + if (!Advance(datapos, dataend, value, label, Packer)) { + return; } + if (value && (key == keyend || *key == separator)) { + size_t matchlength = (size_t)(key - keystart); + D data; + Packer.UnpackLeaf(value, data); + matches.push_back(TPhraseMatch(matchlength, data)); + } } } @@ -567,30 +567,30 @@ TCompactTrieHolder<T, D, S>::TCompactTrieHolder(IInputStream& is, size_t len) TBase::Init(Storage.Get(), len); } -//---------------------------------------------------------------------------------------------------------------- +//---------------------------------------------------------------------------------------------------------------- // TCompactTrie::TConstIterator template <class T, class D, class S> -TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, TPacker packer) +TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, TPacker packer) : Packer(packer) , Impl(new TOpaqueTrieIterator(trie, emptyValue, atend)) { } template <class T, class D, class S> -TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer) - : Packer(packer) - , Impl(new TOpaqueTrieIterator(trie, emptyValue, true)) -{ - Impl->UpperBound<TSymbol>(key); -} - +TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer) + : Packer(packer) + , Impl(new TOpaqueTrieIterator(trie, emptyValue, true)) +{ + Impl->UpperBound<TSymbol>(key); +} + template <class T, class D, class S> bool TCompactTrie<T, D, S>::TConstIterator::operator==(const TConstIterator& other) const { - if (!Impl) - return !other.Impl; - if (!other.Impl) - return false; + if (!Impl) + return !other.Impl; + if (!other.Impl) + return false; return *Impl == *other.Impl; } @@ -614,46 +614,46 @@ typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::TConstIter template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator& TCompactTrie<T, D, S>::TConstIterator::operator--() { - Impl->Backward(); - return *this; -} - + Impl->Backward(); + return *this; +} + template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::TConstIterator::operator--(int /*unused*/) { - TConstIterator copy(*this); - Impl->Backward(); - return copy; -} - + TConstIterator copy(*this); + Impl->Backward(); + return copy; +} + template <class T, class D, class S> typename TCompactTrie<T, D, S>::TValueType TCompactTrie<T, D, S>::TConstIterator::operator*() { return TValueType(GetKey(), GetValue()); } template <class T, class D, class S> -typename TCompactTrie<T, D, S>::TKey TCompactTrie<T, D, S>::TConstIterator::GetKey() const { - return Impl->GetKey<TSymbol>(); +typename TCompactTrie<T, D, S>::TKey TCompactTrie<T, D, S>::TConstIterator::GetKey() const { + return Impl->GetKey<TSymbol>(); } template <class T, class D, class S> -size_t TCompactTrie<T, D, S>::TConstIterator::GetKeySize() const { - return Impl->MeasureKey<TSymbol>(); +size_t TCompactTrie<T, D, S>::TConstIterator::GetKeySize() const { + return Impl->MeasureKey<TSymbol>(); } template <class T, class D, class S> -const char* TCompactTrie<T, D, S>::TConstIterator::GetValuePtr() const { - return Impl->GetValuePtr(); +const char* TCompactTrie<T, D, S>::TConstIterator::GetValuePtr() const { + return Impl->GetValuePtr(); } template <class T, class D, class S> -typename TCompactTrie<T, D, S>::TData TCompactTrie<T, D, S>::TConstIterator::GetValue() const { +typename TCompactTrie<T, D, S>::TData TCompactTrie<T, D, S>::TConstIterator::GetValue() const { D data; GetValue(data); return data; } template <class T, class D, class S> -void TCompactTrie<T, D, S>::TConstIterator::GetValue(typename TCompactTrie<T, D, S>::TData& data) const { +void TCompactTrie<T, D, S>::TConstIterator::GetValue(typename TCompactTrie<T, D, S>::TData& data) const { const char* ptr = GetValuePtr(); if (ptr) { Packer.UnpackLeaf(ptr, data); diff --git a/library/cpp/containers/comptrie/comptrie_ut.cpp b/library/cpp/containers/comptrie/comptrie_ut.cpp index 74bee09b5d..8ee798bc1e 100644 --- a/library/cpp/containers/comptrie/comptrie_ut.cpp +++ b/library/cpp/containers/comptrie/comptrie_ut.cpp @@ -22,14 +22,14 @@ #include "comptrie.h" #include "set.h" -#include "first_symbol_iterator.h" -#include "search_iterator.h" +#include "first_symbol_iterator.h" +#include "search_iterator.h" #include "pattern_searcher.h" #include <array> #include <iterator> - + class TCompactTrieTest: public TTestBase { private: UNIT_TEST_SUITE(TCompactTrieTest); @@ -40,7 +40,7 @@ private: UNIT_TEST(TestFastTrie8); UNIT_TEST(TestFastTrie16); UNIT_TEST(TestFastTrie32); - + UNIT_TEST(TestMinimizedTrie8); UNIT_TEST(TestMinimizedTrie16); UNIT_TEST(TestMinimizedTrie32); @@ -48,7 +48,7 @@ private: UNIT_TEST(TestFastMinimizedTrie8); UNIT_TEST(TestFastMinimizedTrie16); UNIT_TEST(TestFastMinimizedTrie32); - + UNIT_TEST(TestTrieIterator8); UNIT_TEST(TestTrieIterator16); UNIT_TEST(TestTrieIterator32); @@ -98,12 +98,12 @@ private: UNIT_TEST(TestSearchIterWchar32) UNIT_TEST(TestCopyAndAssignment); - + UNIT_TEST(TestFirstSymbolIterator8); UNIT_TEST(TestFirstSymbolIterator16); UNIT_TEST(TestFirstSymbolIterator32); UNIT_TEST(TestFirstSymbolIteratorChar32); - + UNIT_TEST(TestArrayPacker); UNIT_TEST(TestBuilderFindLongestPrefix); @@ -124,13 +124,13 @@ private: void CheckData(const char* src, size_t len); template <class T> - void CheckUpperBound(const char* src, size_t len); - - template <class T> + void CheckUpperBound(const char* src, size_t len); + + template <class T> void CheckIterator(const char* src, size_t len); template <class T> - void TestTrie(bool minimize, bool useFastLayout); + void TestTrie(bool minimize, bool useFastLayout); template <class T> void TestTrieIterator(bool minimize); @@ -140,8 +140,8 @@ private: void TestFindTailsImpl(const TString& prefix); - void TestUniqueImpl(bool isPrefixGrouped); - + void TestUniqueImpl(bool isPrefixGrouped); + TVector<TUtf16String> GetSampleKeys(size_t nKeys) const; template <class TContainer> TVector<TContainer> GetSampleVectorData(size_t nValues); @@ -155,12 +155,12 @@ private: template <typename TChar> void TestSearchIterImpl(); - template <class TTrie> - void TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers); - - template <typename TSymbol> - void TestFirstSymbolIterator(); - + template <class TTrie> + void TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers); + + template <typename TSymbol> + void TestFirstSymbolIterator(); + template <class T> class TIntPacker; template <class T> @@ -174,18 +174,18 @@ public: void TestTrie16(); void TestTrie32(); - void TestFastTrie8(); - void TestFastTrie16(); - void TestFastTrie32(); - + void TestFastTrie8(); + void TestFastTrie16(); + void TestFastTrie32(); + void TestMinimizedTrie8(); void TestMinimizedTrie16(); void TestMinimizedTrie32(); - void TestFastMinimizedTrie8(); - void TestFastMinimizedTrie16(); - void TestFastMinimizedTrie32(); - + void TestFastMinimizedTrie8(); + void TestFastMinimizedTrie16(); + void TestFastMinimizedTrie32(); + void TestTrieIterator8(); void TestTrieIterator16(); void TestTrieIterator32(); @@ -197,15 +197,15 @@ public: void TestPhraseSearch(); void TestAddGet(); void TestEmpty(); - void TestUninitializedNonEmpty(); + void TestUninitializedNonEmpty(); void TestRandom(); void TestFindTails(); - void TestPrefixGrouped(); - void CrashTestPrefixGrouped(); + void TestPrefixGrouped(); + void CrashTestPrefixGrouped(); void TestMergeFromFile(); void TestMergeFromBuffer(); - void TestUnique(); - void TestAddRetValue(); + void TestUnique(); + void TestAddRetValue(); void TestClear(); void TestIterateEmptyKey(); @@ -227,20 +227,20 @@ public: void TestTrieForListVectorInt64(); void TestTrieForPairWtrokaVectorInt64(); - void TestEmptyValueOutOfOrder(); + void TestEmptyValueOutOfOrder(); void TestFindLongestPrefixWithEmptyValue(); void TestSearchIterChar(); void TestSearchIterWchar(); void TestSearchIterWchar32(); - - void TestCopyAndAssignment(); - - void TestFirstSymbolIterator8(); - void TestFirstSymbolIterator16(); - void TestFirstSymbolIterator32(); + + void TestCopyAndAssignment(); + + void TestFirstSymbolIterator8(); + void TestFirstSymbolIterator16(); + void TestFirstSymbolIterator32(); void TestFirstSymbolIteratorChar32(); - + void TestArrayPacker(); void TestBuilderFindLongestPrefix(); @@ -278,7 +278,7 @@ const char* TCompactTrieTest::SampleData[] = { template <class T> typename TCompactTrie<T>::TKey MakeWideKey(const char* str, size_t len) { - typename TCompactTrie<T>::TKey buffer; + typename TCompactTrie<T>::TKey buffer; for (size_t i = 0; i < len; i++) { unsigned int ch = (str[i] & 0xFF); buffer.push_back((T)(ch | ch << 8 | ch << 16 | ch << 24)); @@ -288,17 +288,17 @@ typename TCompactTrie<T>::TKey MakeWideKey(const char* str, size_t len) { template <class T> typename TCompactTrie<T>::TKey MakeWideKey(const TString& str) { - return MakeWideKey<T>(str.c_str(), str.length()); -} - + return MakeWideKey<T>(str.c_str(), str.length()); +} + template <class T> -typename TCompactTrie<T>::TKey MakeWideKey(const TStringBuf& buf) { - return MakeWideKey<T>(buf.data(), buf.length()); -} - +typename TCompactTrie<T>::TKey MakeWideKey(const TStringBuf& buf) { + return MakeWideKey<T>(buf.data(), buf.length()); +} + template <class T> void TCompactTrieTest::CreateTrie(IOutputStream& out, bool minimize, bool useFastLayout) { - TCompactTrieBuilder<T> builder; + TCompactTrieBuilder<T> builder; for (auto& i : SampleData) { size_t len = strlen(i); @@ -306,57 +306,57 @@ void TCompactTrieTest::CreateTrie(IOutputStream& out, bool minimize, bool useFas builder.Add(MakeWideKey<T>(i, len), len * 2); } - TBufferOutput tmp2; + TBufferOutput tmp2; IOutputStream& currentOutput = useFastLayout ? tmp2 : out; if (minimize) { TBufferOutput buftmp; builder.Save(buftmp); CompactTrieMinimize<TCompactTriePacker<ui64>>(currentOutput, buftmp.Buffer().Data(), buftmp.Buffer().Size(), false); } else { - builder.Save(currentOutput); + builder.Save(currentOutput); } - if (useFastLayout) { + if (useFastLayout) { CompactTrieMakeFastLayout<TCompactTriePacker<T>>(out, tmp2.Buffer().Data(), tmp2.Buffer().Size(), false); - } + } } -// Iterates over all strings of length <= 4 made of letters a-g. +// Iterates over all strings of length <= 4 made of letters a-g. static bool LexicographicStep(TString& s) { - if (s.length() < 4) { - s += "a"; - return true; - } - while (!s.empty() && s.back() == 'g') - s.pop_back(); - if (s.empty()) - return false; - char last = s.back(); - last++; - s.pop_back(); - s.push_back(last); - return true; -} - + if (s.length() < 4) { + s += "a"; + return true; + } + while (!s.empty() && s.back() == 'g') + s.pop_back(); + if (s.empty()) + return false; + char last = s.back(); + last++; + s.pop_back(); + s.push_back(last); + return true; +} + template <class T> -void TCompactTrieTest::CheckUpperBound(const char* data, size_t datalen) { - TCompactTrie<T> trie(data, datalen); +void TCompactTrieTest::CheckUpperBound(const char* data, size_t datalen) { + TCompactTrie<T> trie(data, datalen); typedef typename TCompactTrie<T>::TKey TKey; typedef typename TCompactTrie<T>::TData TData; - + TString key; - do { - const TKey wideKey = MakeWideKey<T>(key); + do { + const TKey wideKey = MakeWideKey<T>(key); typename TCompactTrie<T>::TConstIterator it = trie.UpperBound(wideKey); - UNIT_ASSERT_C(it == trie.End() || it.GetKey() >= wideKey, "key=" + key); - TData data; - const bool found = trie.Find(wideKey, &data); - if (found) - UNIT_ASSERT_C(it.GetKey() == wideKey && it.GetValue() == data, "key=" + key); - if (it != trie.Begin()) - UNIT_ASSERT_C((--it).GetKey() < wideKey, "key=" + key); - } while (LexicographicStep(key)); -} - + UNIT_ASSERT_C(it == trie.End() || it.GetKey() >= wideKey, "key=" + key); + TData data; + const bool found = trie.Find(wideKey, &data); + if (found) + UNIT_ASSERT_C(it.GetKey() == wideKey && it.GetValue() == data, "key=" + key); + if (it != trie.Begin()) + UNIT_ASSERT_C((--it).GetKey() < wideKey, "key=" + key); + } while (LexicographicStep(key)); +} + template <class T> void TCompactTrieTest::CheckData(const char* data, size_t datalen) { TCompactTrie<T> trie(data, datalen); @@ -377,7 +377,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { TString badkey("bb"); badkey += i; - key = MakeWideKey<T>(badkey); + key = MakeWideKey<T>(badkey); UNIT_ASSERT(!trie.Find(key)); value = 123; UNIT_ASSERT(!trie.Find(key, &value)); @@ -388,7 +388,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { badkey = i; badkey += "x"; - key = MakeWideKey<T>(badkey); + key = MakeWideKey<T>(badkey); UNIT_ASSERT(!trie.Find(key)); value = 1234; UNIT_ASSERT(!trie.Find(key, &value)); @@ -401,7 +401,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { } TString testkey("fbbaa"); - typename TCompactTrie<T>::TKey key = MakeWideKey<T>(testkey); + typename TCompactTrie<T>::TKey key = MakeWideKey<T>(testkey); ui64 value = 0; size_t prefixLen = 0; UNIT_ASSERT(trie.FindLongestPrefix(key.data(), testkey.length() - 1, &prefixLen, &value)); @@ -409,7 +409,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { UNIT_ASSERT_EQUAL(6, value); testkey = "fbbax"; - key = MakeWideKey<T>(testkey); + key = MakeWideKey<T>(testkey); UNIT_ASSERT(trie.FindLongestPrefix(key, &prefixLen, &value)); UNIT_ASSERT_EQUAL(prefixLen, 3); UNIT_ASSERT_EQUAL(6, value); @@ -422,7 +422,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { template <class T> void TCompactTrieTest::CheckIterator(const char* data, size_t datalen) { typedef typename TCompactTrie<T>::TKey TKey; - typedef typename TCompactTrie<T>::TValueType TValue; + typedef typename TCompactTrie<T>::TValueType TValue; TMap<TKey, ui64> stored; for (auto& i : SampleData) { @@ -439,7 +439,7 @@ void TCompactTrieTest::CheckIterator(const char* data, size_t datalen) { while (it != trie.End()) { UNIT_ASSERT_VALUES_EQUAL(it.GetKeySize(), it.GetKey().size()); received.insert(*it); - items.push_back(*it); + items.push_back(*it); entry_count++; it++; } @@ -450,53 +450,53 @@ void TCompactTrieTest::CheckIterator(const char* data, size_t datalen) { UNIT_ASSERT(entry_count == stored.size()); UNIT_ASSERT(received == stored); UNIT_ASSERT(received2 == stored); - - std::reverse(items.begin(), items.end()); - typename TCompactTrie<T>::TConstIterator revIt = trie.End(); - typename TCompactTrie<T>::TConstIterator const begin = trie.Begin(); + + std::reverse(items.begin(), items.end()); + typename TCompactTrie<T>::TConstIterator revIt = trie.End(); + typename TCompactTrie<T>::TConstIterator const begin = trie.Begin(); typename TCompactTrie<T>::TConstIterator emptyIt; - size_t pos = 0; - while (revIt != begin) { - revIt--; - UNIT_ASSERT(*revIt == items[pos]); - pos++; - } - // Checking the assignment operator. - revIt = begin; - UNIT_ASSERT(revIt == trie.Begin()); - UNIT_ASSERT(!revIt.IsEmpty()); + size_t pos = 0; + while (revIt != begin) { + revIt--; + UNIT_ASSERT(*revIt == items[pos]); + pos++; + } + // Checking the assignment operator. + revIt = begin; + UNIT_ASSERT(revIt == trie.Begin()); + UNIT_ASSERT(!revIt.IsEmpty()); UNIT_ASSERT(revIt != emptyIt); - UNIT_ASSERT(revIt != trie.End()); - ++revIt; // Call a method that uses Skipper. + UNIT_ASSERT(revIt != trie.End()); + ++revIt; // Call a method that uses Skipper. revIt = emptyIt; UNIT_ASSERT(revIt == emptyIt); - UNIT_ASSERT(revIt.IsEmpty()); - UNIT_ASSERT(revIt != trie.End()); - // Checking the move assignment operator. - revIt = trie.Begin(); - UNIT_ASSERT(revIt == trie.Begin()); - UNIT_ASSERT(!revIt.IsEmpty()); - UNIT_ASSERT(revIt != emptyIt); - UNIT_ASSERT(revIt != trie.End()); - ++revIt; // Call a method that uses Skipper. - revIt = typename TCompactTrie<T>::TConstIterator(); - UNIT_ASSERT(revIt == emptyIt); - UNIT_ASSERT(revIt.IsEmpty()); - UNIT_ASSERT(revIt != trie.End()); + UNIT_ASSERT(revIt.IsEmpty()); + UNIT_ASSERT(revIt != trie.End()); + // Checking the move assignment operator. + revIt = trie.Begin(); + UNIT_ASSERT(revIt == trie.Begin()); + UNIT_ASSERT(!revIt.IsEmpty()); + UNIT_ASSERT(revIt != emptyIt); + UNIT_ASSERT(revIt != trie.End()); + ++revIt; // Call a method that uses Skipper. + revIt = typename TCompactTrie<T>::TConstIterator(); + UNIT_ASSERT(revIt == emptyIt); + UNIT_ASSERT(revIt.IsEmpty()); + UNIT_ASSERT(revIt != trie.End()); } template <class T> -void TCompactTrieTest::TestTrie(bool minimize, bool useFastLayout) { +void TCompactTrieTest::TestTrie(bool minimize, bool useFastLayout) { TBufferOutput bufout; - CreateTrie<T>(bufout, minimize, useFastLayout); + CreateTrie<T>(bufout, minimize, useFastLayout); CheckData<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); - CheckUpperBound<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); + CheckUpperBound<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); } template <class T> void TCompactTrieTest::TestTrieIterator(bool minimize) { TBufferOutput bufout; - CreateTrie<T>(bufout, minimize, false); + CreateTrie<T>(bufout, minimize, false); CheckIterator<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); } @@ -539,7 +539,7 @@ void TCompactTrieTest::TestFastMinimizedTrie16() { void TCompactTrieTest::TestFastMinimizedTrie32() { TestTrie<wchar32>(true, true); } - + void TCompactTrieTest::TestTrieIterator8() { TestTrieIterator<char>(false); } @@ -549,7 +549,7 @@ void TCompactTrieTest::TestTrieIterator16() { void TCompactTrieTest::TestTrieIterator32() { TestTrieIterator<wchar32>(false); } - + void TCompactTrieTest::TestMinimizedTrieIterator8() { TestTrieIterator<char>(true); } @@ -566,7 +566,7 @@ void TCompactTrieTest::TestPhraseSearch() { static const char* const badphrase = "cd ef gh ab"; TBufferOutput bufout; - TCompactTrieBuilder<char> builder; + TCompactTrieBuilder<char> builder; for (size_t i = 0; i < Y_ARRAY_SIZE(phrases); i++) { builder.Add(phrases[i], strlen(phrases[i]), i); } @@ -626,26 +626,26 @@ void TCompactTrieTest::TestEmpty() { UNIT_ASSERT(trieNull.Begin() == trieNull.End()); } -void TCompactTrieTest::TestUninitializedNonEmpty() { - TBufferOutput bufout; - CreateTrie<char>(bufout, false, false); - TCompactTrie<char> trie(bufout.Buffer().Data(), bufout.Buffer().Size()); - typedef TCompactTrie<char>::TKey TKey; - typedef TCompactTrie<char>::TConstIterator TIter; - - TCompactTrie<char> tails = trie.FindTails("abd", 3); // A trie that has empty value and no data. - UNIT_ASSERT(!tails.IsEmpty()); - UNIT_ASSERT(!tails.IsInitialized()); - const TKey wideKey = MakeWideKey<char>("c", 1); - TIter it = tails.UpperBound(wideKey); - UNIT_ASSERT(it == tails.End()); - UNIT_ASSERT(it != tails.Begin()); - --it; - UNIT_ASSERT(it == tails.Begin()); - ++it; - UNIT_ASSERT(it == tails.End()); -} - +void TCompactTrieTest::TestUninitializedNonEmpty() { + TBufferOutput bufout; + CreateTrie<char>(bufout, false, false); + TCompactTrie<char> trie(bufout.Buffer().Data(), bufout.Buffer().Size()); + typedef TCompactTrie<char>::TKey TKey; + typedef TCompactTrie<char>::TConstIterator TIter; + + TCompactTrie<char> tails = trie.FindTails("abd", 3); // A trie that has empty value and no data. + UNIT_ASSERT(!tails.IsEmpty()); + UNIT_ASSERT(!tails.IsInitialized()); + const TKey wideKey = MakeWideKey<char>("c", 1); + TIter it = tails.UpperBound(wideKey); + UNIT_ASSERT(it == tails.End()); + UNIT_ASSERT(it != tails.Begin()); + --it; + UNIT_ASSERT(it == tails.Begin()); + ++it; + UNIT_ASSERT(it == tails.End()); +} + static char RandChar() { return char(RandomNumber<size_t>() % 256); } @@ -688,7 +688,7 @@ void TCompactTrieTest::TestRandom(const size_t n, const size_t maxKeySize) { } TCompactTrie<char, typename T::TData, T> trieMin(buftmp.Buffer().Data(), buftmp.Buffer().Size()); - TCompactTrieBuilder<char, typename T::TData, T> prefixGroupedBuilder(CTBF_PREFIX_GROUPED); + TCompactTrieBuilder<char, typename T::TData, T> prefixGroupedBuilder(CTBF_PREFIX_GROUPED); for (typename TKeys::const_iterator i = keys.begin(), mi = keys.end(); i != mi; ++i) { UNIT_ASSERT(!prefixGroupedBuilder.Find(i->first.c_str(), i->first.size(), &dummy)); @@ -699,7 +699,7 @@ void TCompactTrieTest::TestRandom(const size_t n, const size_t maxKeySize) { UNIT_ASSERT(dummy == i->second); } - prefixGroupedBuilder.Add(i->first.c_str(), i->first.size(), dummy); + prefixGroupedBuilder.Add(i->first.c_str(), i->first.size(), dummy); UNIT_ASSERT(prefixGroupedBuilder.Find(i->first.c_str(), i->first.size(), &dummy)); for (typename TKeys::const_iterator j = keys.begin(), end = keys.end(); j != end; ++j) { @@ -713,11 +713,11 @@ void TCompactTrieTest::TestRandom(const size_t n, const size_t maxKeySize) { } } - TBufferStream prefixGroupedBuffer; - prefixGroupedBuilder.Save(prefixGroupedBuffer); + TBufferStream prefixGroupedBuffer; + prefixGroupedBuilder.Save(prefixGroupedBuffer); - UNIT_ASSERT_VALUES_EQUAL(stream.Buffer().Size(), prefixGroupedBuffer.Buffer().Size()); - UNIT_ASSERT(0 == memcmp(stream.Buffer().Data(), prefixGroupedBuffer.Buffer().Data(), stream.Buffer().Size())); + UNIT_ASSERT_VALUES_EQUAL(stream.Buffer().Size(), prefixGroupedBuffer.Buffer().Size()); + UNIT_ASSERT(0 == memcmp(stream.Buffer().Data(), prefixGroupedBuffer.Buffer().Data(), stream.Buffer().Size())); } void TCompactTrieTest::TestRandom() { @@ -730,7 +730,7 @@ void TCompactTrieTest::TestRandom() { } void TCompactTrieTest::TestFindTailsImpl(const TString& prefix) { - TCompactTrieBuilder<> builder; + TCompactTrieBuilder<> builder; TMap<TString, ui64> input; @@ -775,9 +775,9 @@ void TCompactTrieTest::TestFindTailsImpl(const TString& prefix) { UNIT_ASSERT(input == output); } -void TCompactTrieTest::TestPrefixGrouped() { +void TCompactTrieTest::TestPrefixGrouped() { TBuffer b1b; - TCompactTrieBuilder<char, ui32> b1(CTBF_PREFIX_GROUPED); + TCompactTrieBuilder<char, ui32> b1(CTBF_PREFIX_GROUPED); const char* data[] = { "Kazan", "Moscow", @@ -820,27 +820,27 @@ void TCompactTrieTest::TestPrefixGrouped() { } } -void TCompactTrieTest::CrashTestPrefixGrouped() { - TCompactTrieBuilder<char, ui32> builder(CTBF_PREFIX_GROUPED); - const char* data[] = { - "Fryazino", - "Fryanovo", - "Monino", - "", - "Fryazevo", - }; - bool wasException = false; - try { +void TCompactTrieTest::CrashTestPrefixGrouped() { + TCompactTrieBuilder<char, ui32> builder(CTBF_PREFIX_GROUPED); + const char* data[] = { + "Fryazino", + "Fryanovo", + "Monino", + "", + "Fryazevo", + }; + bool wasException = false; + try { for (size_t i = 0; i < Y_ARRAY_SIZE(data); ++i) { - builder.Add(data[i], strlen(data[i]), i + 1); - } - } catch (const yexception& e) { - wasException = true; - UNIT_ASSERT(strstr(e.what(), "Bad input order - expected input strings to be prefix-grouped.")); - } - UNIT_ASSERT_C(wasException, "CrashTestPrefixGrouped"); -} - + builder.Add(data[i], strlen(data[i]), i + 1); + } + } catch (const yexception& e) { + wasException = true; + UNIT_ASSERT(strstr(e.what(), "Bad input order - expected input strings to be prefix-grouped.")); + } + UNIT_ASSERT_C(wasException, "CrashTestPrefixGrouped"); +} + void TCompactTrieTest::TestMergeFromFile() { { TCompactTrieBuilder<> b; @@ -863,7 +863,7 @@ void TCompactTrieTest::TestMergeFromFile() { { TCompactTrieBuilder<> b; UNIT_ASSERT(b.AddSubtreeInFile("com.", GetSystemTempDir() + "/TCompactTrieTest-TestMerge-com")); - UNIT_ASSERT(b.Add("org.kernel", 22)); + UNIT_ASSERT(b.Add("org.kernel", 22)); UNIT_ASSERT(b.AddSubtreeInFile("ru.", GetSystemTempDir() + "/TCompactTrieTest-TestMerge-ru")); TUnbufferedFileOutput out(GetSystemTempDir() + "/TCompactTrieTest-TestMerge-res"); b.Save(out); @@ -929,14 +929,14 @@ void TCompactTrieTest::TestMergeFromBuffer() { unlink((GetSystemTempDir() + "/TCompactTrieTest-TestMergeFromBuffer-res").data()); } -void TCompactTrieTest::TestUnique() { - TestUniqueImpl(false); - TestUniqueImpl(true); -} - -void TCompactTrieTest::TestUniqueImpl(bool isPrefixGrouped) { - TCompactTrieBuilder<char, ui32> builder(CTBF_UNIQUE | (isPrefixGrouped ? CTBF_PREFIX_GROUPED : CTBF_NONE)); - const char* data[] = { +void TCompactTrieTest::TestUnique() { + TestUniqueImpl(false); + TestUniqueImpl(true); +} + +void TCompactTrieTest::TestUniqueImpl(bool isPrefixGrouped) { + TCompactTrieBuilder<char, ui32> builder(CTBF_UNIQUE | (isPrefixGrouped ? CTBF_PREFIX_GROUPED : CTBF_NONE)); + const char* data[] = { "Kazan", "Moscow", "Monino", @@ -946,23 +946,23 @@ void TCompactTrieTest::TestUniqueImpl(bool isPrefixGrouped) { "Fryazevo", "Fry", "Tumen", - }; + }; for (size_t i = 0; i < Y_ARRAY_SIZE(data); ++i) { - UNIT_ASSERT_C(builder.Add(data[i], strlen(data[i]), i + 1), i); - } - bool wasException = false; - try { - builder.Add(data[4], strlen(data[4]), 20); - } catch (const yexception& e) { - wasException = true; - UNIT_ASSERT(strstr(e.what(), "Duplicate key")); - } - UNIT_ASSERT_C(wasException, "TestUnique"); -} - -void TCompactTrieTest::TestAddRetValue() { - TCompactTrieBuilder<char, ui32> builder; - const char* data[] = { + UNIT_ASSERT_C(builder.Add(data[i], strlen(data[i]), i + 1), i); + } + bool wasException = false; + try { + builder.Add(data[4], strlen(data[4]), 20); + } catch (const yexception& e) { + wasException = true; + UNIT_ASSERT(strstr(e.what(), "Duplicate key")); + } + UNIT_ASSERT_C(wasException, "TestUnique"); +} + +void TCompactTrieTest::TestAddRetValue() { + TCompactTrieBuilder<char, ui32> builder; + const char* data[] = { "Kazan", "Moscow", "Monino", @@ -972,16 +972,16 @@ void TCompactTrieTest::TestAddRetValue() { "Fryazevo", "Fry", "Tumen", - }; + }; for (size_t i = 0; i < Y_ARRAY_SIZE(data); ++i) { - UNIT_ASSERT(builder.Add(data[i], strlen(data[i]), i + 1)); - UNIT_ASSERT(!builder.Add(data[i], strlen(data[i]), i + 2)); - ui32 value; - UNIT_ASSERT(builder.Find(data[i], strlen(data[i]), &value)); - UNIT_ASSERT(value == i + 2); - } -} - + UNIT_ASSERT(builder.Add(data[i], strlen(data[i]), i + 1)); + UNIT_ASSERT(!builder.Add(data[i], strlen(data[i]), i + 2)); + ui32 value; + UNIT_ASSERT(builder.Find(data[i], strlen(data[i]), &value)); + UNIT_ASSERT(value == i + 2); + } +} + void TCompactTrieTest::TestClear() { TCompactTrieBuilder<char, ui32> builder; const char* data[] = { @@ -1253,22 +1253,22 @@ void TCompactTrieTest::TestTrieForPairWtrokaVectorInt64() { TestTrieWithContainers<std::pair<TUtf16String, TVector<i64>>>(keys, data, "pair-str-v-i64"); } -void TCompactTrieTest::TestEmptyValueOutOfOrder() { - TBufferOutput buffer; - using TSymbol = ui32; - { - TCompactTrieBuilder<TSymbol, ui32> builder; - TSymbol key = 1; - builder.Add(&key, 1, 10); - builder.Add(nullptr, 0, 14); - builder.Save(buffer); - } - { - TCompactTrie<TSymbol, ui32> trie(buffer.Buffer().Data(), buffer.Buffer().Size()); - UNIT_ASSERT(trie.Find(nullptr, 0)); - } -} - +void TCompactTrieTest::TestEmptyValueOutOfOrder() { + TBufferOutput buffer; + using TSymbol = ui32; + { + TCompactTrieBuilder<TSymbol, ui32> builder; + TSymbol key = 1; + builder.Add(&key, 1, 10); + builder.Add(nullptr, 0, 14); + builder.Save(buffer); + } + { + TCompactTrie<TSymbol, ui32> trie(buffer.Buffer().Data(), buffer.Buffer().Size()); + UNIT_ASSERT(trie.Find(nullptr, 0)); + } +} + void TCompactTrieTest::TestFindLongestPrefixWithEmptyValue() { TBufferOutput buffer; { @@ -1340,25 +1340,25 @@ void TCompactTrieTest::TestSearchIterImpl() { TCompactTrie<TChar, ui32> trie(buffer.Buffer().Data(), buffer.Buffer().Size()); ui32 value = 0; - auto iter(MakeSearchIterator(trie)); + auto iter(MakeSearchIterator(trie)); MoveIter(iter, TConvertKey<TChar>::Convert(TStringBuf("abc"))); UNIT_ASSERT(!iter.GetValue(&value)); - iter = MakeSearchIterator(trie); + iter = MakeSearchIterator(trie); MoveIter(iter, TConvertKey<TChar>::Convert(TStringBuf("abbbc"))); UNIT_ASSERT(iter.GetValue(&value)); UNIT_ASSERT_EQUAL(value, 3); - iter = MakeSearchIterator(trie); + iter = MakeSearchIterator(trie); UNIT_ASSERT(iter.Advance(TConvertKey<TChar>::Convert(TStringBuf("bdfa")))); UNIT_ASSERT(!iter.GetValue(&value)); - iter = MakeSearchIterator(trie); + iter = MakeSearchIterator(trie); UNIT_ASSERT(iter.Advance(TConvertKey<TChar>::Convert(TStringBuf("bdfaa")))); UNIT_ASSERT(iter.GetValue(&value)); UNIT_ASSERT_EQUAL(value, 4); - UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TChar('z'))); + UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TChar('z'))); UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TConvertKey<TChar>::Convert(TStringBuf("cdf")))); UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TConvertKey<TChar>::Convert(TStringBuf("abca")))); } @@ -1370,69 +1370,69 @@ void TCompactTrieTest::TestSearchIterChar() { void TCompactTrieTest::TestSearchIterWchar() { TestSearchIterImpl<wchar16>(); } - + void TCompactTrieTest::TestSearchIterWchar32() { TestSearchIterImpl<wchar32>(); } -void TCompactTrieTest::TestCopyAndAssignment() { - TBufferOutput bufout; - typedef TCompactTrie<> TTrie; - CreateTrie<char>(bufout, false, false); - TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); - TTrie copy(trie); - UNIT_ASSERT(copy.HasCorrectSkipper()); - TTrie assign; - assign = trie; - UNIT_ASSERT(assign.HasCorrectSkipper()); +void TCompactTrieTest::TestCopyAndAssignment() { + TBufferOutput bufout; + typedef TCompactTrie<> TTrie; + CreateTrie<char>(bufout, false, false); + TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); + TTrie copy(trie); + UNIT_ASSERT(copy.HasCorrectSkipper()); + TTrie assign; + assign = trie; + UNIT_ASSERT(assign.HasCorrectSkipper()); TTrie move(std::move(trie)); - UNIT_ASSERT(move.HasCorrectSkipper()); - TTrie moveAssign; - moveAssign = TTrie(bufout.Buffer().Data(), bufout.Buffer().Size()); - UNIT_ASSERT(moveAssign.HasCorrectSkipper()); -} - -template <class TTrie> -void TCompactTrieTest::TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers) { - NCompactTrie::TFirstSymbolIterator<TTrie> it; - it.SetTrie(trie, trie.GetSkipper()); - typename TTrie::TKey answers = MakeWideKey<typename TTrie::TSymbol>(narrowAnswers); - auto answer = answers.begin(); - for (; !it.AtEnd(); it.MakeStep(), ++answer) { - UNIT_ASSERT(answer != answers.end()); - UNIT_ASSERT(it.GetKey() == *answer); - } - UNIT_ASSERT(answer == answers.end()); -} - -template <class TSymbol> -void TCompactTrieTest::TestFirstSymbolIterator() { - TBufferOutput bufout; - typedef TCompactTrie<TSymbol> TTrie; - CreateTrie<TSymbol>(bufout, false, false); - TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); + UNIT_ASSERT(move.HasCorrectSkipper()); + TTrie moveAssign; + moveAssign = TTrie(bufout.Buffer().Data(), bufout.Buffer().Size()); + UNIT_ASSERT(moveAssign.HasCorrectSkipper()); +} + +template <class TTrie> +void TCompactTrieTest::TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers) { + NCompactTrie::TFirstSymbolIterator<TTrie> it; + it.SetTrie(trie, trie.GetSkipper()); + typename TTrie::TKey answers = MakeWideKey<typename TTrie::TSymbol>(narrowAnswers); + auto answer = answers.begin(); + for (; !it.AtEnd(); it.MakeStep(), ++answer) { + UNIT_ASSERT(answer != answers.end()); + UNIT_ASSERT(it.GetKey() == *answer); + } + UNIT_ASSERT(answer == answers.end()); +} + +template <class TSymbol> +void TCompactTrieTest::TestFirstSymbolIterator() { + TBufferOutput bufout; + typedef TCompactTrie<TSymbol> TTrie; + CreateTrie<TSymbol>(bufout, false, false); + TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); TStringBuf rootAnswers = "abcdf"; - TestFirstSymbolIteratorForTrie(trie, rootAnswers); + TestFirstSymbolIteratorForTrie(trie, rootAnswers); TStringBuf aAnswers = "abcd"; - TestFirstSymbolIteratorForTrie(trie.FindTails(MakeWideKey<TSymbol>("a", 1)), aAnswers); -} - -void TCompactTrieTest::TestFirstSymbolIterator8() { - TestFirstSymbolIterator<char>(); -} - -void TCompactTrieTest::TestFirstSymbolIterator16() { - TestFirstSymbolIterator<wchar16>(); -} - -void TCompactTrieTest::TestFirstSymbolIterator32() { - TestFirstSymbolIterator<ui32>(); -} - + TestFirstSymbolIteratorForTrie(trie.FindTails(MakeWideKey<TSymbol>("a", 1)), aAnswers); +} + +void TCompactTrieTest::TestFirstSymbolIterator8() { + TestFirstSymbolIterator<char>(); +} + +void TCompactTrieTest::TestFirstSymbolIterator16() { + TestFirstSymbolIterator<wchar16>(); +} + +void TCompactTrieTest::TestFirstSymbolIterator32() { + TestFirstSymbolIterator<ui32>(); +} + void TCompactTrieTest::TestFirstSymbolIteratorChar32() { TestFirstSymbolIterator<wchar32>(); } - + void TCompactTrieTest::TestArrayPacker() { using TDataInt = std::array<int, 2>; diff --git a/library/cpp/containers/comptrie/first_symbol_iterator.h b/library/cpp/containers/comptrie/first_symbol_iterator.h index d06135f06f..6dbd2f9296 100644 --- a/library/cpp/containers/comptrie/first_symbol_iterator.h +++ b/library/cpp/containers/comptrie/first_symbol_iterator.h @@ -1,61 +1,61 @@ -#pragma once - -#include "opaque_trie_iterator.h" -#include <util/generic/ptr.h> - -namespace NCompactTrie { - // Iterates over possible first symbols in a trie. - // Allows one to get the symbol and the subtrie starting from it. +#pragma once + +#include "opaque_trie_iterator.h" +#include <util/generic/ptr.h> + +namespace NCompactTrie { + // Iterates over possible first symbols in a trie. + // Allows one to get the symbol and the subtrie starting from it. template <class TTrie> - class TFirstSymbolIterator { - public: - using TSymbol = typename TTrie::TSymbol; - using TData = typename TTrie::TData; - - void SetTrie(const TTrie& trie, const ILeafSkipper& skipper) { - Trie = trie; - Impl.Reset(new TOpaqueTrieIterator( - TOpaqueTrie(Trie.Data().AsCharPtr(), Trie.Data().Size(), skipper), - nullptr, - false, - sizeof(TSymbol))); - if (Impl->MeasureKey<TSymbol>() == 0) { - MakeStep(); - } - } - - const TTrie& GetTrie() const { - return Trie; - } - - bool AtEnd() const { - return *Impl == TOpaqueTrieIterator(Impl->GetTrie(), nullptr, true, sizeof(TSymbol)); - } - - TSymbol GetKey() const { - return Impl->GetKey<TSymbol>()[0]; - } - - TTrie GetTails() const { - const TNode& node = Impl->GetNode(); - const size_t forwardOffset = node.GetForwardOffset(); - const char* emptyValue = node.IsFinal() ? Trie.Data().AsCharPtr() + node.GetLeafOffset() : nullptr; - if (forwardOffset) { - const char* start = Trie.Data().AsCharPtr() + forwardOffset; - TBlob body = TBlob::NoCopy(start, Trie.Data().Size() - forwardOffset); - return TTrie(body, emptyValue, Trie.GetPacker()); - } else { - return TTrie(emptyValue); - } - } - - void MakeStep() { - Impl->Forward(); - } - - private: - TTrie Trie; - TCopyPtr<TOpaqueTrieIterator> Impl; - }; - + class TFirstSymbolIterator { + public: + using TSymbol = typename TTrie::TSymbol; + using TData = typename TTrie::TData; + + void SetTrie(const TTrie& trie, const ILeafSkipper& skipper) { + Trie = trie; + Impl.Reset(new TOpaqueTrieIterator( + TOpaqueTrie(Trie.Data().AsCharPtr(), Trie.Data().Size(), skipper), + nullptr, + false, + sizeof(TSymbol))); + if (Impl->MeasureKey<TSymbol>() == 0) { + MakeStep(); + } + } + + const TTrie& GetTrie() const { + return Trie; + } + + bool AtEnd() const { + return *Impl == TOpaqueTrieIterator(Impl->GetTrie(), nullptr, true, sizeof(TSymbol)); + } + + TSymbol GetKey() const { + return Impl->GetKey<TSymbol>()[0]; + } + + TTrie GetTails() const { + const TNode& node = Impl->GetNode(); + const size_t forwardOffset = node.GetForwardOffset(); + const char* emptyValue = node.IsFinal() ? Trie.Data().AsCharPtr() + node.GetLeafOffset() : nullptr; + if (forwardOffset) { + const char* start = Trie.Data().AsCharPtr() + forwardOffset; + TBlob body = TBlob::NoCopy(start, Trie.Data().Size() - forwardOffset); + return TTrie(body, emptyValue, Trie.GetPacker()); + } else { + return TTrie(emptyValue); + } + } + + void MakeStep() { + Impl->Forward(); + } + + private: + TTrie Trie; + TCopyPtr<TOpaqueTrieIterator> Impl; + }; + } diff --git a/library/cpp/containers/comptrie/key_selector.h b/library/cpp/containers/comptrie/key_selector.h index 60466cef71..01387947b4 100644 --- a/library/cpp/containers/comptrie/key_selector.h +++ b/library/cpp/containers/comptrie/key_selector.h @@ -1,28 +1,28 @@ -#pragma once - -#include <util/generic/vector.h> +#pragma once + +#include <util/generic/vector.h> #include <util/generic/string.h> -#include <util/generic/strbuf.h> - +#include <util/generic/strbuf.h> + template <class T> -struct TCompactTrieKeySelector { +struct TCompactTrieKeySelector { typedef TVector<T> TKey; typedef TVector<T> TKeyBuf; -}; - +}; + template <class TChar> -struct TCompactTrieCharKeySelector { +struct TCompactTrieCharKeySelector { typedef TBasicString<TChar> TKey; typedef TBasicStringBuf<TChar> TKeyBuf; -}; - +}; + template <> struct TCompactTrieKeySelector<char>: public TCompactTrieCharKeySelector<char> { -}; - +}; + template <> struct TCompactTrieKeySelector<wchar16>: public TCompactTrieCharKeySelector<wchar16> { -}; +}; template <> struct TCompactTrieKeySelector<wchar32>: public TCompactTrieCharKeySelector<wchar32> { diff --git a/library/cpp/containers/comptrie/leaf_skipper.h b/library/cpp/containers/comptrie/leaf_skipper.h index 3959258948..9218735a2b 100644 --- a/library/cpp/containers/comptrie/leaf_skipper.h +++ b/library/cpp/containers/comptrie/leaf_skipper.h @@ -1,6 +1,6 @@ #pragma once -#include <cstddef> +#include <cstddef> namespace NCompactTrie { class ILeafSkipper { @@ -23,34 +23,34 @@ namespace NCompactTrie { size_t SkipLeaf(const char* p) const override { return Packer->SkipLeaf(p); } - - // For test purposes. + + // For test purposes. const TPacker* GetPacker() const { return Packer; } }; - - // The data you need to traverse the trie without unpacking the values. - struct TOpaqueTrie { - const char* Data; - size_t Length; - const ILeafSkipper& SkipFunction; - - TOpaqueTrie(const char* data, size_t dataLength, const ILeafSkipper& skipFunction) - : Data(data) - , Length(dataLength) - , SkipFunction(skipFunction) + + // The data you need to traverse the trie without unpacking the values. + struct TOpaqueTrie { + const char* Data; + size_t Length; + const ILeafSkipper& SkipFunction; + + TOpaqueTrie(const char* data, size_t dataLength, const ILeafSkipper& skipFunction) + : Data(data) + , Length(dataLength) + , SkipFunction(skipFunction) { } - - bool operator==(const TOpaqueTrie& other) const { - return Data == other.Data && + + bool operator==(const TOpaqueTrie& other) const { + return Data == other.Data && Length == other.Length && &SkipFunction == &other.SkipFunction; - } - - bool operator!=(const TOpaqueTrie& other) const { - return !(*this == other); - } - }; + } + + bool operator!=(const TOpaqueTrie& other) const { + return !(*this == other); + } + }; } diff --git a/library/cpp/containers/comptrie/make_fast_layout.cpp b/library/cpp/containers/comptrie/make_fast_layout.cpp index ade78d7899..3dea56b7b9 100644 --- a/library/cpp/containers/comptrie/make_fast_layout.cpp +++ b/library/cpp/containers/comptrie/make_fast_layout.cpp @@ -1,70 +1,70 @@ -#include "make_fast_layout.h" -#include "node.h" -#include "writeable_node.h" -#include "write_trie_backwards.h" -#include "comptrie_impl.h" +#include "make_fast_layout.h" +#include "node.h" +#include "writeable_node.h" +#include "write_trie_backwards.h" +#include "comptrie_impl.h" #include <util/generic/hash.h> -#include <util/generic/utility.h> - -// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. -// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. -// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. -// Calling it the second time on the same trie does nothing. -// -// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms -// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels -// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). -// The original paper (describing the layout in Section 2.1) is: -// Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341–358. -// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ -// Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium -// on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12–14, 2000, pages 399–409. -// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ -// (there is not much difference between these papers, actually). -// -namespace NCompactTrie { +#include <util/generic/utility.h> + +// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. +// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. +// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. +// Calling it the second time on the same trie does nothing. +// +// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms +// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels +// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). +// The original paper (describing the layout in Section 2.1) is: +// Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341–358. +// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ +// Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium +// on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12–14, 2000, pages 399–409. +// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ +// (there is not much difference between these papers, actually). +// +namespace NCompactTrie { static size_t FindSupportingPowerOf2(size_t n) { size_t result = 1ull << (8 * sizeof(size_t) - 1); while (result > n) { result >>= 1; } return result; - } - + } + namespace { class TTrieNodeSet { public: TTrieNodeSet() = default; - + explicit TTrieNodeSet(const TOpaqueTrie& trie) : Body(trie.Length / (8 * MinNodeSize) + 1, 0) { } - + bool Has(size_t offset) const { const size_t reducedOffset = ReducedOffset(offset); return OffsetCell(reducedOffset) & OffsetMask(reducedOffset); } - + void Add(size_t offset) { const size_t reducedOffset = ReducedOffset(offset); OffsetCell(reducedOffset) |= OffsetMask(reducedOffset); } - + void Remove(size_t offset) { const size_t reducedOffset = ReducedOffset(offset); OffsetCell(reducedOffset) &= ~OffsetMask(reducedOffset); } - + void Swap(TTrieNodeSet& other) { Body.swap(other.Body); } - + private: static const size_t MinNodeSize = 2; TVector<ui8> Body; - + static size_t ReducedOffset(size_t offset) { return offset / MinNodeSize; } @@ -78,23 +78,23 @@ namespace NCompactTrie { return Body.at(reducedOffset / 8); } }; - + //--------------------------------------------------------------------- - + class TTrieNodeCounts { public: TTrieNodeCounts() = default; - + explicit TTrieNodeCounts(const TOpaqueTrie& trie) : Body(trie.Length / MinNodeSize, 0) , IsTree(false) { } - + size_t Get(size_t offset) const { return IsTree ? 1 : Body.at(offset / MinNodeSize); } - + void Inc(size_t offset) { if (IsTree) { return; @@ -104,7 +104,7 @@ namespace NCompactTrie { ++count; } } - + size_t Dec(size_t offset) { if (IsTree) { return 0; @@ -116,33 +116,33 @@ namespace NCompactTrie { } return count; } - + void Swap(TTrieNodeCounts& other) { Body.swap(other.Body); ::DoSwap(IsTree, other.IsTree); } - + void SetTreeMode() { IsTree = true; Body = TVector<ui8>(); } - + private: static const size_t MinNodeSize = 2; static const ui8 MaxCount = 255; - + TVector<ui8> Body; bool IsTree = false; }; - + //---------------------------------------------------------- - + class TOffsetIndex { public: // In all methods: // Key --- offset from the beginning of the old trie. // Value --- offset from the end of the new trie. - + explicit TOffsetIndex(TTrieNodeCounts& counts) { ParentCounts.Swap(counts); } @@ -150,7 +150,7 @@ namespace NCompactTrie { void Add(size_t key, size_t value) { Data[key] = value; } - + size_t Size() const { return Data.size(); } @@ -172,14 +172,14 @@ namespace NCompactTrie { TTrieNodeCounts ParentCounts; THashMap<size_t, size_t> Data; }; - + //--------------------------------------------------------------------------------------- class TTrieMeasurer { public: TTrieMeasurer(const TOpaqueTrie& trie, bool verbose); void Measure(); - + size_t GetDepth() const { return Depth; } @@ -187,23 +187,23 @@ namespace NCompactTrie { size_t GetNodeCount() const { return NodeCount; } - + size_t GetUnminimizedNodeCount() const { return UnminimizedNodeCount; } - + bool IsTree() const { return NodeCount == UnminimizedNodeCount; } - + TTrieNodeCounts& GetParentCounts() { return ParentCounts; } - + const TOpaqueTrie& GetTrie() const { return Trie; } - + private: const TOpaqueTrie& Trie; size_t Depth; @@ -211,11 +211,11 @@ namespace NCompactTrie { size_t NodeCount; size_t UnminimizedNodeCount; const bool Verbose; - + // returns depth, increments NodeCount. size_t MeasureSubtrie(size_t rootOffset, bool isNewPath); }; - + TTrieMeasurer::TTrieMeasurer(const TOpaqueTrie& trie, bool verbose) : Trie(trie) , Depth(0) @@ -243,7 +243,7 @@ namespace NCompactTrie { Cerr << "Node count: " << NodeCount << Endl; } } - + // A chain of nodes linked by forward links // is considered one node with many left and right children // for depth measuring here and in @@ -276,14 +276,14 @@ namespace NCompactTrie { } else { break; } - } + } return depth; - } + } //-------------------------------------------------------------------------------------- - + using TLevelNodes = TVector<size_t>; - + struct TLevel { size_t Depth; TLevelNodes Nodes; @@ -295,7 +295,7 @@ namespace NCompactTrie { }; //---------------------------------------------------------------------------------------- - + class TVanEmdeBoasReverseNodeEnumerator: public TReverseNodeEnumerator { public: TVanEmdeBoasReverseNodeEnumerator(TTrieMeasurer& measurer, bool verbose) @@ -326,7 +326,7 @@ namespace NCompactTrie { size_t GetLeafLength() const override { return Get().GetLeafLength(); } - + // Returns recalculated offset from the end of the current node. size_t PrepareOffset(size_t absoffset, size_t resultLength) { if (!absoffset) @@ -352,19 +352,19 @@ namespace NCompactTrie { TOpaqueTrie Trie; size_t Depth; size_t MaxIndexSize; - + TVector<TLevel> Trace; TOffsetIndex BackIndex; TVector<TNode> CentralWord; TTrieNodeSet ProcessedNodes; - + const bool Verbose; - + private: bool IsVisited(size_t offset) const { return ProcessedNodes.Has(offset); } - + void AddCascade(size_t step) { Y_ASSERT(!(step & (step - 1))); // Should be a power of 2. while (step > 0) { @@ -381,7 +381,7 @@ namespace NCompactTrie { step /= 2; } } - } + } void FillCentralWord() { CentralWord.clear(); @@ -391,7 +391,7 @@ namespace NCompactTrie { CentralWord.push_back(TNode(Trie.Data, CentralWord.back().GetForwardOffset(), Trie.SkipFunction)); } } - + bool MoveCentralWordStart() { do { if (Fresh) { @@ -420,7 +420,7 @@ namespace NCompactTrie { } while (IsVisited(Trace.back().Nodes.back())); FillCentralWord(); return true; - } + } // Returns the maximal depth it has reached while searching. // This is a method because it needs OffsetIndex to skip processed nodes. @@ -435,7 +435,7 @@ namespace NCompactTrie { if (node.GetLeftOffset() && !IsVisited(node.GetLeftOffset())) { actualDepth = Max(actualDepth, 1 + FindDescendants(node.GetLeftOffset(), depth - 1, result)); - } + } if (node.GetRightOffset() && !IsVisited(node.GetRightOffset())) { actualDepth = Max(actualDepth, 1 + FindDescendants(node.GetRightOffset(), depth - 1, result)); @@ -445,23 +445,23 @@ namespace NCompactTrie { } else { break; } - } + } return actualDepth; - } + } }; - + } // Anonymous namespace. //----------------------------------------------------------------------------------- size_t RawCompactTrieFastLayoutImpl(IOutputStream& os, const TOpaqueTrie& trie, bool verbose) { if (!trie.Data || !trie.Length) { - return 0; - } + return 0; + } TTrieMeasurer mes(trie, verbose); mes.Measure(); TVanEmdeBoasReverseNodeEnumerator enumerator(mes, verbose); return WriteTrieBackwards(os, enumerator, verbose); } -} +} diff --git a/library/cpp/containers/comptrie/make_fast_layout.h b/library/cpp/containers/comptrie/make_fast_layout.h index b8fab5d65b..65d128a767 100644 --- a/library/cpp/containers/comptrie/make_fast_layout.h +++ b/library/cpp/containers/comptrie/make_fast_layout.h @@ -1,7 +1,7 @@ #pragma once -#include "leaf_skipper.h" -#include <cstddef> +#include "leaf_skipper.h" +#include <cstddef> class IOutputStream; diff --git a/library/cpp/containers/comptrie/minimize.cpp b/library/cpp/containers/comptrie/minimize.cpp index 80d0b25217..85260619bc 100644 --- a/library/cpp/containers/comptrie/minimize.cpp +++ b/library/cpp/containers/comptrie/minimize.cpp @@ -1,11 +1,11 @@ -#include "minimize.h" -#include "node.h" -#include "writeable_node.h" -#include "write_trie_backwards.h" -#include "comptrie_impl.h" +#include "minimize.h" +#include "node.h" +#include "writeable_node.h" +#include "write_trie_backwards.h" +#include "comptrie_impl.h" #include <util/generic/hash.h> -#include <util/generic/algorithm.h> +#include <util/generic/algorithm.h> namespace NCompactTrie { // Minimize the trie. The result is equivalent to the original @@ -27,7 +27,7 @@ namespace NCompactTrie { using TSizePair = std::pair<size_t, size_t>; using TSizePairVector = TVector<TSizePair>; using TSizePairVectorVector = TVector<TSizePairVector>; - + class TOffsetMap { protected: TSizePairVectorVector Data; @@ -180,7 +180,7 @@ namespace NCompactTrie { if (Node.GetRightOffset()) nextOffset = Node.GetRightOffset(); break; - + case 2: if (Node.GetLeftOffset()) nextOffset = Node.GetLeftOffset(); @@ -280,7 +280,7 @@ namespace NCompactTrie { static void AddPiece(TPieceIndex& index, size_t offset, size_t len) { index[len].push_back(offset); } - + static TOffsetMap FindEquivalentSubtries(const TOpaqueTrie& trie, bool verbose, size_t minMergeSize) { // Tree nodes, arranged by span length. // When all nodes of a given size are considered, they pop off the queue. @@ -354,6 +354,6 @@ namespace NCompactTrie { return WriteTrieBackwards(os, enumerator, verbose); else return WriteTrieBackwardsNoAlloc(os, enumerator, trie, mode); - } - -} + } + +} diff --git a/library/cpp/containers/comptrie/minimize.h b/library/cpp/containers/comptrie/minimize.h index baaa431d04..6d2ef30920 100644 --- a/library/cpp/containers/comptrie/minimize.h +++ b/library/cpp/containers/comptrie/minimize.h @@ -1,7 +1,7 @@ #pragma once -#include "leaf_skipper.h" -#include <cstddef> +#include "leaf_skipper.h" +#include <cstddef> class IOutputStream; diff --git a/library/cpp/containers/comptrie/node.cpp b/library/cpp/containers/comptrie/node.cpp index 5fd22f15ec..c3e8e807ce 100644 --- a/library/cpp/containers/comptrie/node.cpp +++ b/library/cpp/containers/comptrie/node.cpp @@ -1,9 +1,9 @@ -#include "node.h" -#include "leaf_skipper.h" +#include "node.h" +#include "leaf_skipper.h" #include "comptrie_impl.h" -#include <util/system/yassert.h> -#include <util/generic/yexception.h> +#include <util/system/yassert.h> +#include <util/generic/yexception.h> namespace NCompactTrie { TNode::TNode() @@ -15,7 +15,7 @@ namespace NCompactTrie { for (auto& offset : Offsets) { offset = 0; } - } + } // We believe that epsilon links are found only on the forward-position and that afer jumping an epsilon link you come to an ordinary node. diff --git a/library/cpp/containers/comptrie/node.h b/library/cpp/containers/comptrie/node.h index d6f4317db0..0f14048982 100644 --- a/library/cpp/containers/comptrie/node.h +++ b/library/cpp/containers/comptrie/node.h @@ -1,6 +1,6 @@ -#pragma once +#pragma once -#include <cstddef> +#include <cstddef> namespace NCompactTrie { class ILeafSkipper; @@ -17,18 +17,18 @@ namespace NCompactTrie { direction = static_cast<TDirection>(direction + 1); return direction; } - + inline TDirection& operator--(TDirection& direction) { direction = static_cast<TDirection>(direction - 1); return direction; } - + class TNode { public: TNode(); // Processes epsilon links and sets ForwardOffset to correct value. Assumes an epsilon link doesn't point to an epsilon link. TNode(const char* data, size_t offset, const ILeafSkipper& skipFunction); - + size_t GetOffset() const { return Offset; } @@ -59,7 +59,7 @@ namespace NCompactTrie { char GetLabel() const { return Label; } - + bool IsFinal() const { return GetLeafOffset() != 0; } @@ -73,7 +73,7 @@ namespace NCompactTrie { size_t Offset; size_t LeafLength; size_t CoreLength; - + char Label; }; diff --git a/library/cpp/containers/comptrie/opaque_trie_iterator.cpp b/library/cpp/containers/comptrie/opaque_trie_iterator.cpp index 5fd3914be6..2fef2230cb 100644 --- a/library/cpp/containers/comptrie/opaque_trie_iterator.cpp +++ b/library/cpp/containers/comptrie/opaque_trie_iterator.cpp @@ -1,8 +1,8 @@ -#include "opaque_trie_iterator.h" +#include "opaque_trie_iterator.h" #include "comptrie_impl.h" -#include "node.h" +#include "node.h" -namespace NCompactTrie { +namespace NCompactTrie { TOpaqueTrieIterator::TOpaqueTrieIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, size_t maxKeyLength) : Trie(trie) @@ -69,20 +69,20 @@ namespace NCompactTrie { bool TOpaqueTrieIterator::Backward() { if (AtEmptyValue) - return false; - + return false; + if (!Trie.Length) { if (EmptyValue) { // A trie that has only the empty value; // we are not at the empty value, so move to it. - AtEmptyValue = true; - return true; + AtEmptyValue = true; + return true; } else { // Empty trie. return false; - } - } - + } + } + if (Forks.Empty()) { TFork fork(Trie.Data, 0, Trie.Length, Trie.SkipFunction); fork.LastDirection(); @@ -123,8 +123,8 @@ namespace NCompactTrie { Forks.Clear(); } return true; - } - + } + const char* TOpaqueTrieIterator::GetValuePtr() const { if (!Forks.Empty()) { const TFork& lastFork = Forks.Top(); @@ -133,10 +133,10 @@ namespace NCompactTrie { } Y_ASSERT(AtEmptyValue); return EmptyValue; - } - + } + //------------------------------------------------------------------------- - + TString TForkStack::GetKey() const { if (HasEmptyKey()) { return TString(); @@ -147,7 +147,7 @@ namespace NCompactTrie { result.append(Top().GetLabel()); } return result; - } + } bool TForkStack::HasEmptyKey() const { // Special case: if we get a single zero label, treat it as an empty key @@ -175,14 +175,14 @@ namespace NCompactTrie { , Limit(limit) , CurrentDirection(TDirection(0)) { -#if COMPTRIE_DATA_CHECK +#if COMPTRIE_DATA_CHECK if (Node.GetOffset() >= Limit - 1) ythrow yexception() << "gone beyond the limit, data is corrupted"; -#endif +#endif while (CurrentDirection < D_MAX && !HasDirection(CurrentDirection)) { ++CurrentDirection; } - } + } bool TFork::operator==(const TFork& rhs) const { return (Data == rhs.Data && @@ -206,12 +206,12 @@ namespace NCompactTrie { } while (CurrentDirection > 0 && !HasDirection(CurrentDirection)); return HasDirection(CurrentDirection); } - + void TFork::LastDirection() { CurrentDirection = D_MAX; PrevDirection(); - } - + } + bool TFork::SetDirection(TDirection direction) { if (!HasDirection(direction)) { return false; @@ -219,13 +219,13 @@ namespace NCompactTrie { CurrentDirection = direction; return true; } - + char TFork::GetLabel() const { return Node.GetLabel(); } - + size_t TFork::GetValueOffset() const { return Node.GetLeafOffset(); - } - + } + } diff --git a/library/cpp/containers/comptrie/opaque_trie_iterator.h b/library/cpp/containers/comptrie/opaque_trie_iterator.h index 195da3c191..958c226fea 100644 --- a/library/cpp/containers/comptrie/opaque_trie_iterator.h +++ b/library/cpp/containers/comptrie/opaque_trie_iterator.h @@ -1,14 +1,14 @@ #pragma once #include "comptrie_impl.h" -#include "node.h" -#include "key_selector.h" -#include "leaf_skipper.h" - -#include <util/generic/vector.h> -#include <util/generic/yexception.h> - -namespace NCompactTrie { +#include "node.h" +#include "key_selector.h" +#include "leaf_skipper.h" + +#include <util/generic/vector.h> +#include <util/generic/yexception.h> + +namespace NCompactTrie { class ILeafSkipper; class TFork { // Auxiliary class for a branching point in the iterator @@ -30,7 +30,7 @@ namespace NCompactTrie { bool NextDirection(); bool PrevDirection(); void LastDirection(); - + bool HasDirection(TDirection direction) const { return Node.GetOffsetByDirection(direction); } @@ -48,8 +48,8 @@ namespace NCompactTrie { Y_ASSERT(CurrentDirection != D_FINAL); size_t offset = Node.GetOffsetByDirection(CurrentDirection); return TFork(Data, offset, Limit, skipper); - } - + } + //------------------------------------------------------------------------------------------------ class TForkStack { public: @@ -66,10 +66,10 @@ namespace NCompactTrie { Key.pop_back(); } } - + TFork& Top() { return Forks.back(); - } + } const TFork& Top() const { return Forks.back(); } @@ -77,35 +77,35 @@ namespace NCompactTrie { bool Empty() const { return Forks.empty(); } - + void Clear() { Forks.clear(); Key.clear(); - } - + } + bool operator==(const TForkStack& other) const { return Forks == other.Forks; } bool operator!=(const TForkStack& other) const { return !(*this == other); } - + TString GetKey() const; size_t MeasureKey() const; - + private: TVector<TFork> Forks; TString Key; - + private: bool TopHasLabelInKey() const { return !Empty() && Top().HasLabelInKey(); } bool HasEmptyKey() const; }; - + //------------------------------------------------------------------------------------------------ - + template <class TSymbol> struct TConvertRawKey { typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; @@ -127,37 +127,37 @@ namespace NCompactTrie { } return result; } - + static size_t Size(size_t rawsize) { return rawsize / sizeof(TSymbol); } }; - + template <> struct TConvertRawKey<char> { static TString Get(const TString& rawkey) { return rawkey; } - + static size_t Size(size_t rawsize) { return rawsize; - } + } }; - + //------------------------------------------------------------------------------------------------ class TOpaqueTrieIterator { // Iterator stuff. Stores a stack of visited forks. public: TOpaqueTrieIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, size_t maxKeyLength = size_t(-1)); - + bool operator==(const TOpaqueTrieIterator& rhs) const; bool operator!=(const TOpaqueTrieIterator& rhs) const { return !(*this == rhs); } - + bool Forward(); bool Backward(); - + template <class TSymbol> bool UpperBound(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key); // True if matched exactly. @@ -185,22 +185,22 @@ namespace NCompactTrie { const TOpaqueTrie& GetTrie() const { return Trie; } - + private: TOpaqueTrie Trie; TForkStack Forks; const char* const EmptyValue; bool AtEmptyValue; const size_t MaxKeyLength; - + private: bool HasMaxKeyLength() const; - + template <class TSymbol> int LongestPrefix(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key); // Used in UpperBound. }; - template <class TSymbol> + template <class TSymbol> int TOpaqueTrieIterator::LongestPrefix(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key) { Forks.Clear(); TFork next(Trie.Data, 0, Trie.Length, Trie.SkipFunction); @@ -230,17 +230,17 @@ namespace NCompactTrie { } } else if (!top.SetDirection(D_NEXT)) { top.SetDirection(D_FINAL); - return -1; - } + return -1; + } next = top.NextFork(Trie.SkipFunction); } while (Forks.Top().CurrentDirection != D_NEXT); // Proceed to the next byte. } - } + } // We get here only if the key was empty. Forks.Push(next); return 1; - } - + } + template <class TSymbol> bool TOpaqueTrieIterator::UpperBound(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key) { Forks.Clear(); @@ -249,7 +249,7 @@ namespace NCompactTrie { return true; } else { AtEmptyValue = false; - } + } const int defect = LongestPrefix<TSymbol>(key); if (defect > 0) { // Continue the constructed forks with the smallest key possible. @@ -261,6 +261,6 @@ namespace NCompactTrie { Forward(); } return defect == 0; - } + } -} +} diff --git a/library/cpp/containers/comptrie/prefix_iterator.cpp b/library/cpp/containers/comptrie/prefix_iterator.cpp index 5d4dfa3500..aec8c6ed28 100644 --- a/library/cpp/containers/comptrie/prefix_iterator.cpp +++ b/library/cpp/containers/comptrie/prefix_iterator.cpp @@ -1 +1 @@ -#include "prefix_iterator.h" +#include "prefix_iterator.h" diff --git a/library/cpp/containers/comptrie/prefix_iterator.h b/library/cpp/containers/comptrie/prefix_iterator.h index b369bb4f42..23b9cbf7c6 100644 --- a/library/cpp/containers/comptrie/prefix_iterator.h +++ b/library/cpp/containers/comptrie/prefix_iterator.h @@ -1,88 +1,88 @@ -#pragma once +#pragma once -#include "comptrie_trie.h" - -// Iterates over all prefixes of the given key in the trie. -template <class TTrie> -class TPrefixIterator { -public: - using TSymbol = typename TTrie::TSymbol; - using TPacker = typename TTrie::TPacker; - using TData = typename TTrie::TData; - -private: - const TTrie& Trie; - const TSymbol* key; - size_t keylen; - const TSymbol* keyend; - size_t prefixLen; - const char* valuepos; - const char* datapos; - const char* dataend; - TPacker Packer; - const char* EmptyValue; - bool result; - - bool Next(); - -public: +#include "comptrie_trie.h" + +// Iterates over all prefixes of the given key in the trie. +template <class TTrie> +class TPrefixIterator { +public: + using TSymbol = typename TTrie::TSymbol; + using TPacker = typename TTrie::TPacker; + using TData = typename TTrie::TData; + +private: + const TTrie& Trie; + const TSymbol* key; + size_t keylen; + const TSymbol* keyend; + size_t prefixLen; + const char* valuepos; + const char* datapos; + const char* dataend; + TPacker Packer; + const char* EmptyValue; + bool result; + + bool Next(); + +public: TPrefixIterator(const TTrie& trie, const TSymbol* aKey, size_t aKeylen) - : Trie(trie) + : Trie(trie) , key(aKey) , keylen(aKeylen) , keyend(aKey + aKeylen) - , prefixLen(0) - , valuepos(nullptr) - , datapos(trie.DataHolder.AsCharPtr()) - , dataend(datapos + trie.DataHolder.Length()) - { - result = Next(); - } - + , prefixLen(0) + , valuepos(nullptr) + , datapos(trie.DataHolder.AsCharPtr()) + , dataend(datapos + trie.DataHolder.Length()) + { + result = Next(); + } + operator bool() const { - return result; - } - + return result; + } + TPrefixIterator& operator++() { - result = Next(); - return *this; - } - - size_t GetPrefixLen() const { - return prefixLen; - } - - void GetValue(TData& to) const { - Trie.Packer.UnpackLeaf(valuepos, to); - } -}; - -template <class TTrie> -bool TPrefixIterator<TTrie>::Next() { - using namespace NCompactTrie; - if (!key || datapos == dataend) - return false; - - if ((key == keyend - keylen) && !valuepos && Trie.EmptyValue) { - valuepos = Trie.EmptyValue; - return true; - } - - while (datapos && key != keyend) { - TSymbol label = *(key++); - if (!Advance(datapos, dataend, valuepos, label, Packer)) { - return false; - } - if (valuepos) { // There is a value at the end of this symbol. - prefixLen = keylen - (keyend - key); - return true; - } - } - - return false; -} - -template <class TTrie> -TPrefixIterator<TTrie> MakePrefixIterator(const TTrie& trie, const typename TTrie::TSymbol* key, size_t keylen) { - return TPrefixIterator<TTrie>(trie, key, keylen); -} + result = Next(); + return *this; + } + + size_t GetPrefixLen() const { + return prefixLen; + } + + void GetValue(TData& to) const { + Trie.Packer.UnpackLeaf(valuepos, to); + } +}; + +template <class TTrie> +bool TPrefixIterator<TTrie>::Next() { + using namespace NCompactTrie; + if (!key || datapos == dataend) + return false; + + if ((key == keyend - keylen) && !valuepos && Trie.EmptyValue) { + valuepos = Trie.EmptyValue; + return true; + } + + while (datapos && key != keyend) { + TSymbol label = *(key++); + if (!Advance(datapos, dataend, valuepos, label, Packer)) { + return false; + } + if (valuepos) { // There is a value at the end of this symbol. + prefixLen = keylen - (keyend - key); + return true; + } + } + + return false; +} + +template <class TTrie> +TPrefixIterator<TTrie> MakePrefixIterator(const TTrie& trie, const typename TTrie::TSymbol* key, size_t keylen) { + return TPrefixIterator<TTrie>(trie, key, keylen); +} diff --git a/library/cpp/containers/comptrie/search_iterator.cpp b/library/cpp/containers/comptrie/search_iterator.cpp index eb91523574..cbeac7b2bc 100644 --- a/library/cpp/containers/comptrie/search_iterator.cpp +++ b/library/cpp/containers/comptrie/search_iterator.cpp @@ -1 +1 @@ -#include "search_iterator.h" +#include "search_iterator.h" diff --git a/library/cpp/containers/comptrie/search_iterator.h b/library/cpp/containers/comptrie/search_iterator.h index 247f7e5936..17f6c2bb00 100644 --- a/library/cpp/containers/comptrie/search_iterator.h +++ b/library/cpp/containers/comptrie/search_iterator.h @@ -1,38 +1,38 @@ -#pragma once +#pragma once -#include "comptrie_trie.h" +#include "comptrie_trie.h" #include "first_symbol_iterator.h" - + #include <util/str_stl.h> #include <util/digest/numeric.h> #include <util/digest/multi.h> -// Iterator for incremental searching. -// All Advance() methods shift the iterator using specifed key/char. -// The subsequent Advance() call starts searching from the previous state. -// The Advance() returns 'true' if specified key part exists in the trie and -// returns 'false' for unsuccessful search. In case of 'false' result -// all subsequent calls also will return 'false'. -// If current iterator state is final then GetValue() method returns 'true' and -// associated value. - -template <class TTrie> -class TSearchIterator { -public: - using TData = typename TTrie::TData; - using TSymbol = typename TTrie::TSymbol; - using TKeyBuf = typename TTrie::TKeyBuf; - +// Iterator for incremental searching. +// All Advance() methods shift the iterator using specifed key/char. +// The subsequent Advance() call starts searching from the previous state. +// The Advance() returns 'true' if specified key part exists in the trie and +// returns 'false' for unsuccessful search. In case of 'false' result +// all subsequent calls also will return 'false'. +// If current iterator state is final then GetValue() method returns 'true' and +// associated value. + +template <class TTrie> +class TSearchIterator { +public: + using TData = typename TTrie::TData; + using TSymbol = typename TTrie::TSymbol; + using TKeyBuf = typename TTrie::TKeyBuf; + TSearchIterator() = default; - explicit TSearchIterator(const TTrie& trie) - : Trie(&trie) - , DataPos(trie.DataHolder.AsCharPtr()) - , DataEnd(DataPos + trie.DataHolder.Length()) + explicit TSearchIterator(const TTrie& trie) + : Trie(&trie) + , DataPos(trie.DataHolder.AsCharPtr()) + , DataEnd(DataPos + trie.DataHolder.Length()) , ValuePos(trie.EmptyValue) - { - } - + { + } + explicit TSearchIterator(const TTrie& trie, const TTrie& subTrie) : Trie(&trie) , DataPos(subTrie.Data().AsCharPtr()) @@ -52,33 +52,33 @@ public: return !(*this == other); } - inline bool Advance(TSymbol label) { + inline bool Advance(TSymbol label) { Y_ASSERT(Trie); - if (DataPos == nullptr || DataPos >= DataEnd) { - return false; - } - return NCompactTrie::Advance(DataPos, DataEnd, ValuePos, label, Trie->Packer); - } - inline bool Advance(const TKeyBuf& key) { + if (DataPos == nullptr || DataPos >= DataEnd) { + return false; + } + return NCompactTrie::Advance(DataPos, DataEnd, ValuePos, label, Trie->Packer); + } + inline bool Advance(const TKeyBuf& key) { return Advance(key.data(), key.size()); - } - bool Advance(const TSymbol* key, size_t keylen); - bool GetValue(TData* value = nullptr) const; + } + bool Advance(const TSymbol* key, size_t keylen); + bool GetValue(TData* value = nullptr) const; bool HasValue() const; inline size_t GetHash() const; - -private: + +private: const TTrie* Trie = nullptr; const char* DataPos = nullptr; const char* DataEnd = nullptr; const char* ValuePos = nullptr; -}; - +}; + template <class TTrie> -inline TSearchIterator<TTrie> MakeSearchIterator(const TTrie& trie) { - return TSearchIterator<TTrie>(trie); -} - +inline TSearchIterator<TTrie> MakeSearchIterator(const TTrie& trie) { + return TSearchIterator<TTrie>(trie); +} + template <class TTrie> struct THash<TSearchIterator<TTrie>> { inline size_t operator()(const TSearchIterator<TTrie>& item) { @@ -86,42 +86,42 @@ struct THash<TSearchIterator<TTrie>> { } }; -//---------------------------------------------------------------------------- - -template <class TTrie> -bool TSearchIterator<TTrie>::Advance(const TSymbol* key, size_t keylen) { +//---------------------------------------------------------------------------- + +template <class TTrie> +bool TSearchIterator<TTrie>::Advance(const TSymbol* key, size_t keylen) { Y_ASSERT(Trie); - if (!key || DataPos == nullptr || DataPos >= DataEnd) { - return false; - } - if (!keylen) { - return true; - } - - const TSymbol* keyend = key + keylen; - while (key != keyend && DataPos != nullptr) { - if (!NCompactTrie::Advance(DataPos, DataEnd, ValuePos, *(key++), Trie->Packer)) { - return false; - } - if (key == keyend) { - return true; - } - } - return false; -} - -template <class TTrie> -bool TSearchIterator<TTrie>::GetValue(TData* value) const { + if (!key || DataPos == nullptr || DataPos >= DataEnd) { + return false; + } + if (!keylen) { + return true; + } + + const TSymbol* keyend = key + keylen; + while (key != keyend && DataPos != nullptr) { + if (!NCompactTrie::Advance(DataPos, DataEnd, ValuePos, *(key++), Trie->Packer)) { + return false; + } + if (key == keyend) { + return true; + } + } + return false; +} + +template <class TTrie> +bool TSearchIterator<TTrie>::GetValue(TData* value) const { Y_ASSERT(Trie); - bool result = false; - if (value) { - if (ValuePos) { - result = true; - Trie->Packer.UnpackLeaf(ValuePos, *value); - } - } - return result; -} + bool result = false; + if (value) { + if (ValuePos) { + result = true; + Trie->Packer.UnpackLeaf(ValuePos, *value); + } + } + return result; +} template <class TTrie> bool TSearchIterator<TTrie>::HasValue() const { diff --git a/library/cpp/containers/comptrie/write_trie_backwards.cpp b/library/cpp/containers/comptrie/write_trie_backwards.cpp index fd8c28b0ed..93155490c4 100644 --- a/library/cpp/containers/comptrie/write_trie_backwards.cpp +++ b/library/cpp/containers/comptrie/write_trie_backwards.cpp @@ -1,10 +1,10 @@ -#include "write_trie_backwards.h" - -#include "comptrie_impl.h" +#include "write_trie_backwards.h" + +#include "comptrie_impl.h" #include "leaf_skipper.h" -#include <util/generic/buffer.h> -#include <util/generic/vector.h> +#include <util/generic/buffer.h> +#include <util/generic/vector.h> namespace NCompactTrie { size_t WriteTrieBackwards(IOutputStream& os, TReverseNodeEnumerator& enumerator, bool verbose) { diff --git a/library/cpp/containers/comptrie/write_trie_backwards.h b/library/cpp/containers/comptrie/write_trie_backwards.h index 634e6b811a..b398f9607f 100644 --- a/library/cpp/containers/comptrie/write_trie_backwards.h +++ b/library/cpp/containers/comptrie/write_trie_backwards.h @@ -1,23 +1,23 @@ -#pragma once - +#pragma once + #include "minimize.h" #include <util/generic/vector.h> -#include <util/stream/output.h> -#include <cstddef> - -namespace NCompactTrie { - class TReverseNodeEnumerator { - public: +#include <util/stream/output.h> +#include <cstddef> + +namespace NCompactTrie { + class TReverseNodeEnumerator { + public: virtual ~TReverseNodeEnumerator() = default; - virtual bool Move() = 0; - virtual size_t GetLeafLength() const = 0; - virtual size_t RecreateNode(char* buffer, size_t resultLength) = 0; - }; - + virtual bool Move() = 0; + virtual size_t GetLeafLength() const = 0; + virtual size_t RecreateNode(char* buffer, size_t resultLength) = 0; + }; + struct TOpaqueTrie; size_t WriteTrieBackwards(IOutputStream& os, TReverseNodeEnumerator& enumerator, bool verbose); size_t WriteTrieBackwardsNoAlloc(IOutputStream& os, TReverseNodeEnumerator& enumerator, TOpaqueTrie& trie, EMinimizeMode mode); - -} + +} diff --git a/library/cpp/containers/comptrie/writeable_node.cpp b/library/cpp/containers/comptrie/writeable_node.cpp index 404003dbbd..749e4fc8b9 100644 --- a/library/cpp/containers/comptrie/writeable_node.cpp +++ b/library/cpp/containers/comptrie/writeable_node.cpp @@ -1,6 +1,6 @@ -#include "writeable_node.h" -#include "node.h" -#include "comptrie_impl.h" +#include "writeable_node.h" +#include "node.h" +#include "comptrie_impl.h" namespace NCompactTrie { TWriteableNode::TWriteableNode() @@ -26,7 +26,7 @@ namespace NCompactTrie { , Label(node.GetLabel()) { } - + size_t TWriteableNode::Measure() const { size_t len = 2 + LeafLength; size_t fwdLen = 0; @@ -77,7 +77,7 @@ namespace NCompactTrie { size_t usedLen = 2; usedLen += PackOffset(buffer + usedLen, leftOffset); usedLen += PackOffset(buffer + usedLen, rightOffset); - + if (LeafPos && LeafLength) { memcpy(buffer + usedLen, LeafPos, LeafLength); usedLen += LeafLength; diff --git a/library/cpp/containers/comptrie/writeable_node.h b/library/cpp/containers/comptrie/writeable_node.h index 5454e579ef..a5878176e3 100644 --- a/library/cpp/containers/comptrie/writeable_node.h +++ b/library/cpp/containers/comptrie/writeable_node.h @@ -1,26 +1,26 @@ -#pragma once - -#include <cstddef> - -namespace NCompactTrie { +#pragma once + +#include <cstddef> + +namespace NCompactTrie { class TNode; - + class TWriteableNode { public: const char* LeafPos; size_t LeafLength; - + size_t ForwardOffset; size_t LeftOffset; size_t RightOffset; char Label; - + TWriteableNode(); TWriteableNode(const TNode& node, const char* data); - + // When you call this, the offsets should be relative to the end of the node. Use NPOS to indicate an absent offset. size_t Pack(char* buffer) const; size_t Measure() const; }; - -} + +} diff --git a/library/cpp/containers/comptrie/ya.make b/library/cpp/containers/comptrie/ya.make index 81352da4b2..b24b9787c5 100644 --- a/library/cpp/containers/comptrie/ya.make +++ b/library/cpp/containers/comptrie/ya.make @@ -1,28 +1,28 @@ LIBRARY() -OWNER(velavokr) +OWNER(velavokr) SRCS( array_with_size.h chunked_helpers_trie.h comptrie.h - comptrie_packer.h + comptrie_packer.h comptrie_trie.h - first_symbol_iterator.h - key_selector.h - leaf_skipper.h + first_symbol_iterator.h + key_selector.h + leaf_skipper.h set.h comptrie.cpp - comptrie_builder.cpp + comptrie_builder.cpp comptrie_impl.cpp - make_fast_layout.cpp - minimize.cpp - node.cpp - opaque_trie_iterator.cpp - prefix_iterator.cpp - search_iterator.cpp - write_trie_backwards.cpp - writeable_node.cpp + make_fast_layout.cpp + minimize.cpp + node.cpp + opaque_trie_iterator.cpp + prefix_iterator.cpp + search_iterator.cpp + write_trie_backwards.cpp + writeable_node.cpp ) PEERDIR( |