diff options
author | onpopov <onpopov@yandex-team.ru> | 2022-02-10 16:50:38 +0300 |
---|---|---|
committer | Daniil Cherednik <dcherednik@yandex-team.ru> | 2022-02-10 16:50:38 +0300 |
commit | 8773f7661194d4c0bdb1e3937b2ff7ae01dd13f8 (patch) | |
tree | 5d5cb817648f650d76cf1076100726fd9b8448e8 | |
parent | 84a29dd4980d5b39615e453f289bd1a81213296d (diff) | |
download | ydb-8773f7661194d4c0bdb1e3937b2ff7ae01dd13f8.tar.gz |
Restoring authorship annotation for <onpopov@yandex-team.ru>. Commit 2 of 2.
44 files changed, 1302 insertions, 1302 deletions
diff --git a/library/cpp/containers/comptrie/README.md b/library/cpp/containers/comptrie/README.md index 13caa33c3d..43c298e2c8 100644 --- a/library/cpp/containers/comptrie/README.md +++ b/library/cpp/containers/comptrie/README.md @@ -1,6 +1,6 @@ Compact trie ============= - + The comptrie library is a fast and very tightly packed implementation of a prefix tree (Sedgewick's T-trie, that is a ternary tree, see https://www.cs.princeton.edu/~rs/strings/paper.pdf, @@ -9,10 +9,10 @@ various searches. Because it is template-based and performance-oriented, a signi part of the library consists of inline functions, so if you don't need all the features of the library, consider including a more specific header file instead of the top-level comptrie.h file. - + Description of the data structure --------------------------------- - + A prefix tree is an implementation of the map data structure for cases when keys are sequences of characters. The nodes on this tree contain characters and values. The key that corresponds to a @@ -24,11 +24,11 @@ balanced if possible (characters have a defined order that must be preserved). Thus, our tree is ternary: each node has a reference to a subtree of all children and two references to subtrees of siblings. One of these subtrees contains the younger siblings, and the other contains the elder siblings. - + The library implements tree optimization by merging identical subtrees, which means the tree becomes a DAG (Directed Acyclic Graph – an oriented graph without oriented cycles). - + The main class TCompactTrie is defined in comptrie_trie.h and is templatized: - The first parameter of the template is the character type. It should be an integer type, which means that arithmetical operations must be defined for it. @@ -38,32 +38,32 @@ serialize the value type to a continuous memory buffer, deserialize it back, and quickly determine its size using the pointer to the beginning of this memory buffer. Good packers have already been written for most types, and they are available in library/cpp/packers. For more information, please refer to the documentation for these packers. - + The set.h file defines a modification for cases when keys must be stored without values. - + When a tree is built from scratch, the value corresponding to an empty key is assigned to a single-character key '\0'. So in a tree with the 'char' character type, the empty key and the '\0' key are bound together. For a subtree received from a call to FindTails, this restriction no longer exists. - + Creating trees -------------- - + Building a tree from a list of key-value pairs is performed by the TCompactTrieBuilder class described in the comptrie_builder.h file. - + This class allows you to add words to a tree one at a time, merge a complete subtree, and also use an unfinished tree as a map. - + An important optimization is the prefix-grouped mode when you need to add keys in a certain order (for details, see the comments in the header file). The resulting tree is compactly packed while keys are being added, and the memory consumption is approximately the same as for the completed tree. For the default mode, compact stacking is turned on at the very end, and the data consumes quite a lot of memory up until that point. - + Optimizing trees ---------------- - + After a tree is created, there are two optimizing operations that can be applied: - Minimization to a DAG by merging equal subtrees. - Fast memory layout. @@ -71,94 +71,94 @@ The functions that implement these operations are declared in the comptrie_build optimization is implemented by the CompactTrieMinimize function, and the second is implemented by CompactTrieMakeFastLayout. You can perform both at once by calling the CompactTrieMinimizeAndMakeFastLayout function. - + ### Minimization ### - + Minimization to a DAG requires quite a lot of time and a large amount of memory to store an array of subtrees, but it can reduce the size of the tree several times over (an example is a language model that has many low-frequency phrases with repeated last words and frequency counts). However, if you know in advance that there are no duplicate values in the tree, you don't need to waste time on it, since the minimization won't have any effect on the tree. - + ### Fast memory layout ### - + The second optimization function results in fewer cache misses, but it causes the tree to grow in size. Our experience has shown a 5% gain in speed for some tries. The algorithm consumes about three times more memory than the amount required for the source tree. So if the machine has enough memory to assemble a tree, it does not neccessarily mean that it has enough memory to run the algorithm. To learn about the theory behind this algorithm, read the comments before the declaration of the CompactTrieMinimize function. - + Serializing trees ----------------- - + The tree resides in memory as a sequence of nodes. Links to other nodes are always counted relative to the position of the current node. This allows you to save a tree to disk as it is and then re-load it using mmap(). The TCompactTrie class has the TBlob constructor for reading a tree from disk. The TCompactTrieBuilder class has Save/SaveToFile methods for writing a built tree to a stream or a file. - + Accessing trees --------------- - + As a rule, all methods that accept a key as input have two variants: - One takes the key in the format: pointer to the beginning of the key, length. - The other takes a high-level type like TStringBuf. - + You can get a value for a key in the tree – TCompactTrie::Find returns false if there is no key, and TCompactTrie::Get throws an exception. You can use FindPrefix methods to find the longest key prefix in a tree and get the corresponding value for it. You can also use a single FindPhrases request to get values for all the beginnings of a phrase with a given word delimiter. - + An important operation that distinguishes a tree from a simple map is implemented in the FindTails method, which allows you to obtain a subtree consisting of all possible extensions of the given prefix. - + Iterators for trees ------------------- - + First of all, there is a typical map iterator over all key-value pairs called TConstIterator. A tree has three methods that return it: Begin, End, and UpperBound. The latter takes a key as input and returns an iterator to the smallest key that is not smaller than the input key. - + The rest of the iterators are not so widely used, and thus are located in separate files. - + TPrefixIterator is defined in the prefix_iterator.h file. It allows iterations over all the prefixes of this key available in the tree. - + TSearchIterator is defined in the search_iterator.h file. It allows you to enter a key in a tree one character at a time and see where it ends up. The following character can be selected depending on the current result. You can also copy the iterator and proceed on two different paths. You can actually achieve the same result with repeated use of the FindTails method, but the authors of this iterator claim that they obtained a performance gain with it. - + Appendix. Memory implementation details --------------------------------------- - + *If you are not going to modify the library, then you do not need to read further.* - + First, if the character type has a size larger than 1 byte, then all keys that use these characters are converted to byte strings in the big-endian way. This means that character bytes are written in a string from the most significant to the least significant from left to right. Thus it is reduced to the case when the character in use is 'char'. - + The tree resides in memory as a series of consecutive nodes. The nodes can have different sizes, so the only way to identify the boundaries of nodes is by passing the entire tree. - + ### Node structure ### - + The structure of a node, as can be understood from thoughtfully reading the LeapByte function in Comptrie_impl.h, is the following: - The first byte is for service flags. - The second byte is a character (unless it is the ε-link type of node described below, which has from 1 to 7 bytes of offset distance from the beginning of this node to the content node, and nothing else). - + Thus, the size of any node is at least 2 bytes. All other elements of a node are optional. Next there is from 0 to 7 bytes of the packed offset from the beginning of this node to the beginning of the root node of a subtree with the younger @@ -168,19 +168,19 @@ Next comes the packed value in this node. Its size is not limited, but you may recall that the packer allows you to quickly determine this size using a pointer to the beginning of the packed value. Then, if the service flags indicate that the tree has children, there is a root node of the subtree of children. - -The packed offset is restricted to 7 bytes, and this gives us a limit on the largest -possible size of a tree. You need to study the packer code to understand + +The packed offset is restricted to 7 bytes, and this gives us a limit on the largest +possible size of a tree. You need to study the packer code to understand the exact limit. - + All packed offsets are nonnegative, meaning that roots of subtrees with siblings and the node pointed to by the ε-link must be located strictly to the right of the current node in memory. This does not allow placement of finite state machines with oriented cycles in the comptrie. But it does allow you to effectively stack the comptrie from right to left. - + ### Service flags ### - + The byte of service flags contains (as shown by the constants at the beginning of the comptrie_impl.h file): - 1 bit of MT_NEXT, indicating whether this node has children. @@ -191,9 +191,9 @@ the comptrie_impl.h file): offset to a subtree with younger siblings. If one of these subtrees is not present, then the size of the corresponding packed offset is 0, and vice versa. - + ### ε-links ### - + These nodes only occur if we optimized a tree into a DAG and got two nodes with merged subtrees of children. Since the offset to the subtree of children can't be specified and the root of this subtree should lie just after the value, we have @@ -201,13 +201,13 @@ to add a node of the ε-link type, which contains the offset to the root subtree children and nothing more. This applies to all nodes that have equal subtrees of children, except the rightmost node. The size of this offset is set in 3 bits of MT_SIZEMASK flags for a node. - + As the implementation of the IsEpsilonLink function in comptrie_impl.h demonstrates, the ε-link differs from other nodes in that it does not have the MT_NEXT flag or the MT_FINAL flag, so it can always be identified by the flags. Of course, the best programming practice is to call the function itself instead of examining the flags. - + Note that the ε-link flags do not use the MT_SIZEMASK << MT_LEFTSHIFT` bits, which allows us to start using ε-links for some other purpose. diff --git a/library/cpp/containers/comptrie/chunked_helpers_trie.h b/library/cpp/containers/comptrie/chunked_helpers_trie.h index 68c568f9f3..cfa35f5ba2 100644 --- a/library/cpp/containers/comptrie/chunked_helpers_trie.h +++ b/library/cpp/containers/comptrie/chunked_helpers_trie.h @@ -30,7 +30,7 @@ private: public: TTrieSetWriter(bool isSorted = sorted) - : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) + : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) { } @@ -118,7 +118,7 @@ private: public: TTrieMapWriter(bool isSorted = sorted) - : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) + : Builder(isSorted ? CTBF_PREFIX_GROUPED : CTBF_NONE) #ifndef NDEBUG , IsSorted(isSorted) #endif diff --git a/library/cpp/containers/comptrie/comptrie_builder.cpp b/library/cpp/containers/comptrie/comptrie_builder.cpp index 6bc9b106fd..28a1e41dd2 100644 --- a/library/cpp/containers/comptrie/comptrie_builder.cpp +++ b/library/cpp/containers/comptrie/comptrie_builder.cpp @@ -1 +1 @@ -#include "comptrie_builder.h" +#include "comptrie_builder.h" diff --git a/library/cpp/containers/comptrie/comptrie_builder.h b/library/cpp/containers/comptrie/comptrie_builder.h index 239e1d856d..cf7d2e39a3 100644 --- a/library/cpp/containers/comptrie/comptrie_builder.h +++ b/library/cpp/containers/comptrie/comptrie_builder.h @@ -1,39 +1,39 @@ #pragma once -#include "comptrie_packer.h" -#include "minimize.h" -#include "key_selector.h" - -#include <util/stream/file.h> - +#include "comptrie_packer.h" +#include "minimize.h" +#include "key_selector.h" + +#include <util/stream/file.h> + // -------------------------------------------------------------------------------------- // Data Builder // To build the data buffer, we first create an automaton in memory. The automaton -// is created incrementally. It actually helps a lot to have the input data prefix-grouped +// is created incrementally. It actually helps a lot to have the input data prefix-grouped // by key; otherwise, memory consumption becomes a tough issue. // NOTE: building and serializing the automaton may be lengthy, and takes lots of memory. -// PREFIX_GROUPED means that if we, while constructing a trie, add to the builder two keys with the same prefix, -// then all the keys that we add between these two also have the same prefix. -// Actually in this mode the builder can accept even more freely ordered input, -// but for input as above it is guaranteed to work. +// PREFIX_GROUPED means that if we, while constructing a trie, add to the builder two keys with the same prefix, +// then all the keys that we add between these two also have the same prefix. +// Actually in this mode the builder can accept even more freely ordered input, +// but for input as above it is guaranteed to work. enum ECompactTrieBuilderFlags { - CTBF_NONE = 0, - CTBF_PREFIX_GROUPED = 1 << 0, - CTBF_VERBOSE = 1 << 1, - CTBF_UNIQUE = 1 << 2, -}; - + CTBF_NONE = 0, + CTBF_PREFIX_GROUPED = 1 << 0, + CTBF_VERBOSE = 1 << 1, + CTBF_UNIQUE = 1 << 2, +}; + using TCompactTrieBuilderFlags = ECompactTrieBuilderFlags; inline TCompactTrieBuilderFlags operator|(TCompactTrieBuilderFlags first, TCompactTrieBuilderFlags second) { - return static_cast<TCompactTrieBuilderFlags>(static_cast<int>(first) | second); -} - + return static_cast<TCompactTrieBuilderFlags>(static_cast<int>(first) | second); +} + inline TCompactTrieBuilderFlags& operator|=(TCompactTrieBuilderFlags& first, TCompactTrieBuilderFlags second) { - return first = first | second; -} - + return first = first | second; +} + template <typename T> class TArrayWithSizeHolder; @@ -43,20 +43,20 @@ public: typedef T TSymbol; typedef D TData; typedef S TPacker; - typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; - typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; + typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; + typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; explicit TCompactTrieBuilder(TCompactTrieBuilderFlags flags = CTBF_NONE, TPacker packer = TPacker(), IAllocator* alloc = TDefaultAllocator::Instance()); - // All Add.. methods return true if it was a new key, false if the key already existed. - - bool Add(const TSymbol* key, size_t keylen, const TData& value); + // All Add.. methods return true if it was a new key, false if the key already existed. + + bool Add(const TSymbol* key, size_t keylen, const TData& value); bool Add(const TKeyBuf& key, const TData& value) { return Add(key.data(), key.size(), value); } // add already serialized data - bool AddPtr(const TSymbol* key, size_t keylen, const char* data); + bool AddPtr(const TSymbol* key, size_t keylen, const char* data); bool AddPtr(const TKeyBuf& key, const char* data) { return AddPtr(key.data(), key.size(), data); } @@ -103,51 +103,51 @@ protected: THolder<TCompactTrieBuilderImpl> Impl; }; -//---------------------------------------------------------------------------------------------------------------------- -// Minimize the trie. The result is equivalent to the original -// trie, except that it takes less space (and has marginally lower -// performance, because of eventual epsilon links). -// The algorithm is as follows: starting from the largest pieces, we find -// nodes that have identical continuations (Daciuk's right language), -// and repack the trie. Repacking is done in-place, so memory is less -// of an issue; however, it may take considerable time. - -// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. -// Because of non-local structure and epsilon links, it won't work -// as you expect it to, and can destroy the trie in the making. +//---------------------------------------------------------------------------------------------------------------------- +// Minimize the trie. The result is equivalent to the original +// trie, except that it takes less space (and has marginally lower +// performance, because of eventual epsilon links). +// The algorithm is as follows: starting from the largest pieces, we find +// nodes that have identical continuations (Daciuk's right language), +// and repack the trie. Repacking is done in-place, so memory is less +// of an issue; however, it may take considerable time. + +// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. +// Because of non-local structure and epsilon links, it won't work +// as you expect it to, and can destroy the trie in the making. // If you want both minimization and fast layout, do the minimization first. - + template <class TPacker> size_t CompactTrieMinimize(IOutputStream& os, const char* data, size_t datalength, bool verbose = false, const TPacker& packer = TPacker(), NCompactTrie::EMinimizeMode mode = NCompactTrie::MM_DEFAULT); template <class TTrieBuilder> size_t CompactTrieMinimize(IOutputStream& os, const TTrieBuilder& builder, bool verbose = false); -//---------------------------------------------------------------------------------------------------------------- -// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. -// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. -// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. -// Calling it the second time on the same trie does nothing. -// -// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms -// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels -// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). -// The original paper (describing the layout in Section 2.1) is: +//---------------------------------------------------------------------------------------------------------------- +// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. +// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. +// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. +// Calling it the second time on the same trie does nothing. +// +// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms +// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels +// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). +// The original paper (describing the layout in Section 2.1) is: // Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341-358. -// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ +// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ // Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium // on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12-14, 2000, pages 399-409. -// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ -// (there is not much difference between these papers, actually). -// -template <class TPacker> +// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ +// (there is not much difference between these papers, actually). +// +template <class TPacker> size_t CompactTrieMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose = false, const TPacker& packer = TPacker()); - -template <class TTrieBuilder> + +template <class TTrieBuilder> size_t CompactTrieMakeFastLayout(IOutputStream& os, const TTrieBuilder& builder, bool verbose = false); - + // Composition of minimization and fast layout template <class TPacker> size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose = false, const TPacker& packer = TPacker()); @@ -155,5 +155,5 @@ size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const char* data, template <class TTrieBuilder> size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const TTrieBuilder& builder, bool verbose = false); -// Implementation details moved here. -#include "comptrie_builder.inl" +// Implementation details moved here. +#include "comptrie_builder.inl" diff --git a/library/cpp/containers/comptrie/comptrie_builder.inl b/library/cpp/containers/comptrie/comptrie_builder.inl index 161115edfe..f273fa6571 100644 --- a/library/cpp/containers/comptrie/comptrie_builder.inl +++ b/library/cpp/containers/comptrie/comptrie_builder.inl @@ -2,26 +2,26 @@ #include "comptrie_impl.h" #include "comptrie_trie.h" -#include "make_fast_layout.h" -#include "array_with_size.h" - +#include "make_fast_layout.h" +#include "array_with_size.h" + #include <library/cpp/containers/compact_vector/compact_vector.h> - + #include <util/memory/alloc.h> -#include <util/memory/blob.h> -#include <util/memory/pool.h> -#include <util/memory/tempbuf.h> -#include <util/memory/smallobj.h> -#include <util/generic/algorithm.h> -#include <util/generic/buffer.h> +#include <util/memory/blob.h> +#include <util/memory/pool.h> +#include <util/memory/tempbuf.h> +#include <util/memory/smallobj.h> +#include <util/generic/algorithm.h> +#include <util/generic/buffer.h> #include <util/generic/strbuf.h> - -#include <util/system/align.h> + +#include <util/system/align.h> #include <util/stream/buffer.h> - -#define CONSTEXPR_MAX2(a, b) (a) > (b) ? (a) : (b) -#define CONSTEXPR_MAX3(a, b, c) CONSTEXPR_MAX2(CONSTEXPR_MAX2(a, b), c) - + +#define CONSTEXPR_MAX2(a, b) (a) > (b) ? (a) : (b) +#define CONSTEXPR_MAX3(a, b, c) CONSTEXPR_MAX2(CONSTEXPR_MAX2(a, b), c) + // TCompactTrieBuilder::TCompactTrieBuilderImpl template <class T, class D, class S> @@ -33,7 +33,7 @@ protected: class TNode; class TArc; TNode* Root; - TCompactTrieBuilderFlags Flags; + TCompactTrieBuilderFlags Flags; size_t EntryCount; size_t NodeCount; TPacker Packer; @@ -73,11 +73,11 @@ public: void DestroyNode(TNode* node); void NodeReleasePayload(TNode* thiz); - char* AddEntryForData(const TSymbol* key, size_t keylen, size_t dataLen, bool& isNewAddition); - TNode* AddEntryForSomething(const TSymbol* key, size_t keylen, bool& isNewAddition); + char* AddEntryForData(const TSymbol* key, size_t keylen, size_t dataLen, bool& isNewAddition); + TNode* AddEntryForSomething(const TSymbol* key, size_t keylen, bool& isNewAddition); - bool AddEntry(const TSymbol* key, size_t keylen, const TData& value); - bool AddEntryPtr(const TSymbol* key, size_t keylen, const char* value); + bool AddEntry(const TSymbol* key, size_t keylen, const TData& value); + bool AddEntryPtr(const TSymbol* key, size_t keylen, const char* value); bool AddSubtreeInFile(const TSymbol* key, size_t keylen, const TString& fileName); bool AddSubtreeInBuffer(const TSymbol* key, size_t keylen, TArrayWithSizeHolder<char>&& buffer); bool FindEntry(const TSymbol* key, size_t keylen, TData* value) const; @@ -131,7 +131,7 @@ public: class TArcSet: public ISubtree, public TCompactVector<TArc> { public: - typedef typename TCompactVector<TArc>::iterator iterator; + typedef typename TCompactVector<TArc>::iterator iterator; typedef typename TCompactVector<TArc>::const_iterator const_iterator; TArcSet() { @@ -334,7 +334,7 @@ public: }; union { - char ArcsData[CONSTEXPR_MAX3(sizeof(TArcSet), sizeof(TBufferedSubtree), sizeof(TSubtreeInFile))]; + char ArcsData[CONSTEXPR_MAX3(sizeof(TArcSet), sizeof(TBufferedSubtree), sizeof(TSubtreeInFile))]; union { void* Data1; long long int Data2; @@ -426,18 +426,18 @@ TCompactTrieBuilder<T, D, S>::TCompactTrieBuilder(TCompactTrieBuilderFlags flags } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::Add(const TSymbol* key, size_t keylen, const TData& value) { - return Impl->AddEntry(key, keylen, value); +bool TCompactTrieBuilder<T, D, S>::Add(const TSymbol* key, size_t keylen, const TData& value) { + return Impl->AddEntry(key, keylen, value); } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::AddPtr(const TSymbol* key, size_t keylen, const char* value) { - return Impl->AddEntryPtr(key, keylen, value); +bool TCompactTrieBuilder<T, D, S>::AddPtr(const TSymbol* key, size_t keylen, const char* value) { + return Impl->AddEntryPtr(key, keylen, value); } template <class T, class D, class S> bool TCompactTrieBuilder<T, D, S>::AddSubtreeInFile(const TSymbol* key, size_t keylen, const TString& fileName) { - return Impl->AddSubtreeInFile(key, keylen, fileName); + return Impl->AddSubtreeInFile(key, keylen, fileName); } template <class T, class D, class S> @@ -488,7 +488,7 @@ TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TCompactTrieBuilderImpl(T : Pool(1000000, TMemoryPool::TLinearGrow::Instance(), alloc) , PayloadSize(sizeof(void*)) // XXX: find better value , NodeAllocator(new TFixedSizeAllocator(sizeof(TNode) + PayloadSize, alloc)) - , Flags(flags) + , Flags(flags) , EntryCount(0) , NodeCount(1) , Packer(packer) @@ -543,40 +543,40 @@ void TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::NodeReleasePayload(T } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntry( +bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntry( const TSymbol* key, size_t keylen, const TData& value) { size_t datalen = Packer.MeasureLeaf(value); - bool isNewAddition = false; - char* place = AddEntryForData(key, keylen, datalen, isNewAddition); + bool isNewAddition = false; + char* place = AddEntryForData(key, keylen, datalen, isNewAddition); Packer.PackLeaf(place, value, datalen); - return isNewAddition; + return isNewAddition; } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryPtr( +bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryPtr( const TSymbol* key, size_t keylen, const char* value) { size_t datalen = Packer.SkipLeaf(value); - bool isNewAddition = false; - char* place = AddEntryForData(key, keylen, datalen, isNewAddition); + bool isNewAddition = false; + char* place = AddEntryForData(key, keylen, datalen, isNewAddition); memcpy(place, value, datalen); - return isNewAddition; + return isNewAddition; } template <class T, class D, class S> -bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddSubtreeInFile( +bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddSubtreeInFile( const TSymbol* key, size_t keylen, const TString& fileName) { typedef typename TNode::ISubtree ISubtree; typedef typename TNode::TSubtreeInFile TSubtreeInFile; - bool isNewAddition = false; - TNode* node = AddEntryForSomething(key, keylen, isNewAddition); + bool isNewAddition = false; + TNode* node = AddEntryForSomething(key, keylen, isNewAddition); node->Subtree()->Destroy(this); node->Subtree()->~ISubtree(); new (node->Subtree()) TSubtreeInFile(fileName); - return isNewAddition; + return isNewAddition; } template <class T, class D, class S> @@ -599,56 +599,56 @@ bool TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddSubtreeInBuffer( template <class T, class D, class S> typename TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TNode* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryForSomething( - const TSymbol* key, size_t keylen, bool& isNewAddition) { + const TSymbol* key, size_t keylen, bool& isNewAddition) { using namespace NCompactTrie; EntryCount++; - if (Flags & CTBF_VERBOSE) + if (Flags & CTBF_VERBOSE) ShowProgress(EntryCount); TNode* current = Root; size_t passed; - // Special case of empty key: replace it by 1-byte "\0" key. - size_t ckeylen = keylen ? keylen * sizeof(TSymbol) : 1; - TTempBuf ckeybuf(ckeylen); + // Special case of empty key: replace it by 1-byte "\0" key. + size_t ckeylen = keylen ? keylen * sizeof(TSymbol) : 1; + TTempBuf ckeybuf(ckeylen); if (keylen == 0) { - ckeybuf.Append("\0", 1); + ckeybuf.Append("\0", 1); } else { ConvertSymbolArrayToChar(key, keylen, ckeybuf, ckeylen); - } + } - char* ckey = ckeybuf.Data(); + char* ckey = ckeybuf.Data(); - TNode* next; + TNode* next; while ((ckeylen > 0) && (next = NodeForwardAdd(current, ckey, ckeylen, passed, &NodeCount)) != nullptr) { - current = next; - ckeylen -= passed; - ckey += passed; + current = next; + ckeylen -= passed; + ckey += passed; + } + + if (ckeylen != 0) { + //new leaf + NodeCount++; + TNode* leaf = new (*NodeAllocator) TNode(); + NodeLinkTo(current, TBlob::Copy(ckey, ckeylen), leaf); + current = leaf; } - - if (ckeylen != 0) { - //new leaf - NodeCount++; - TNode* leaf = new (*NodeAllocator) TNode(); - NodeLinkTo(current, TBlob::Copy(ckey, ckeylen), leaf); - current = leaf; - } - isNewAddition = (current->PayloadType == DATA_ABSENT); - if ((Flags & CTBF_UNIQUE) && !isNewAddition) - ythrow yexception() << "Duplicate key"; + isNewAddition = (current->PayloadType == DATA_ABSENT); + if ((Flags & CTBF_UNIQUE) && !isNewAddition) + ythrow yexception() << "Duplicate key"; return current; } template <class T, class D, class S> -char* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryForData(const TSymbol* key, size_t keylen, - size_t datalen, bool& isNewAddition) { - TNode* current = AddEntryForSomething(key, keylen, isNewAddition); +char* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::AddEntryForData(const TSymbol* key, size_t keylen, + size_t datalen, bool& isNewAddition) { + TNode* current = AddEntryForSomething(key, keylen, isNewAddition); NodeReleasePayload(current); if (datalen <= PayloadSize) { current->PayloadType = DATA_INSIDE; - } else if (Flags & CTBF_PREFIX_GROUPED) { + } else if (Flags & CTBF_PREFIX_GROUPED) { current->PayloadType = DATA_MALLOCED; current->PayloadAsPtr() = new char[datalen]; } else { @@ -782,9 +782,9 @@ typename TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TNode* TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::NodeForwardAdd( TNode* thiz, const char* label, size_t len, size_t& passed, size_t* nodeCount) { typename TNode::TArcSet* arcSet = dynamic_cast<typename TNode::TArcSet*>(thiz->Subtree()); - if (!arcSet) - ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; - + if (!arcSet) + ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; + typename TNode::TArcSet::iterator it = arcSet->Find(*label); if (it != arcSet->end()) { @@ -814,10 +814,10 @@ template <class T, class D, class S> void TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::NodeLinkTo(TNode* thiz, const TBlob& label, TNode* node) { typename TNode::TArcSet* arcSet = dynamic_cast<typename TNode::TArcSet*>(thiz->Subtree()); if (!arcSet) - ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; + ythrow yexception() << "Bad input order - expected input strings to be prefix-grouped."; // Buffer the node at the last arc - if ((Flags & CTBF_PREFIX_GROUPED) && !arcSet->empty()) + if ((Flags & CTBF_PREFIX_GROUPED) && !arcSet->empty()) NodeBufferSubtree(arcSet->back().Node); arcSet->Add(label, node); @@ -1047,19 +1047,19 @@ const typename TCompactTrieBuilder<T, D, S>::TCompactTrieBuilderImpl::TNode* // Different -//---------------------------------------------------------------------------------------------------------------------- -// Minimize the trie. The result is equivalent to the original -// trie, except that it takes less space (and has marginally lower -// performance, because of eventual epsilon links). -// The algorithm is as follows: starting from the largest pieces, we find -// nodes that have identical continuations (Daciuk's right language), -// and repack the trie. Repacking is done in-place, so memory is less -// of an issue; however, it may take considerable time. - -// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. -// Because of non-local structure and epsilon links, it won't work -// as you expect it to, and can destroy the trie in the making. - +//---------------------------------------------------------------------------------------------------------------------- +// Minimize the trie. The result is equivalent to the original +// trie, except that it takes less space (and has marginally lower +// performance, because of eventual epsilon links). +// The algorithm is as follows: starting from the largest pieces, we find +// nodes that have identical continuations (Daciuk's right language), +// and repack the trie. Repacking is done in-place, so memory is less +// of an issue; however, it may take considerable time. + +// IMPORTANT: never try to reminimize an already minimized trie or a trie with fast layout. +// Because of non-local structure and epsilon links, it won't work +// as you expect it to, and can destroy the trie in the making. + template <class TPacker> size_t CompactTrieMinimize(IOutputStream& os, const char* data, size_t datalength, bool verbose /*= false*/, const TPacker& packer /*= TPacker()*/, NCompactTrie::EMinimizeMode mode) { using namespace NCompactTrie; @@ -1072,38 +1072,38 @@ size_t CompactTrieMinimize(IOutputStream& os, const TTrieBuilder& builder, bool size_t len = builder.Save(buftmp); return CompactTrieMinimize<typename TTrieBuilder::TPacker>(os, buftmp.Buffer().Data(), len, verbose); } - -//---------------------------------------------------------------------------------------------------------------- -// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. -// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. -// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. -// Calling it the second time on the same trie does nothing. -// -// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms -// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels -// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). -// The original paper (describing the layout in Section 2.1) is: + +//---------------------------------------------------------------------------------------------------------------- +// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. +// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. +// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. +// Calling it the second time on the same trie does nothing. +// +// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms +// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels +// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). +// The original paper (describing the layout in Section 2.1) is: // Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341-358. -// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ +// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ // Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium // on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12-14, 2000, pages 399-409. -// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ -// (there is not much difference between these papers, actually). -// -template <class TPacker> +// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ +// (there is not much difference between these papers, actually). +// +template <class TPacker> size_t CompactTrieMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose /*= false*/, const TPacker& packer /*= TPacker()*/) { - using namespace NCompactTrie; - return CompactTrieMakeFastLayoutImpl(os, data, datalength, verbose, &packer); -} - -template <class TTrieBuilder> + using namespace NCompactTrie; + return CompactTrieMakeFastLayoutImpl(os, data, datalength, verbose, &packer); +} + +template <class TTrieBuilder> size_t CompactTrieMakeFastLayout(IOutputStream& os, const TTrieBuilder& builder, bool verbose /*=false*/) { - TBufferStream buftmp; - size_t len = builder.Save(buftmp); - return CompactTrieMakeFastLayout<typename TTrieBuilder::TPacker>(os, buftmp.Buffer().Data(), len, verbose); -} + TBufferStream buftmp; + size_t len = builder.Save(buftmp); + return CompactTrieMakeFastLayout<typename TTrieBuilder::TPacker>(os, buftmp.Buffer().Data(), len, verbose); +} template <class TPacker> size_t CompactTrieMinimizeAndMakeFastLayout(IOutputStream& os, const char* data, size_t datalength, bool verbose/*=false*/, const TPacker& packer/*= TPacker()*/) { diff --git a/library/cpp/containers/comptrie/comptrie_impl.cpp b/library/cpp/containers/comptrie/comptrie_impl.cpp index 0d48a8f5d3..a116ab6d1e 100644 --- a/library/cpp/containers/comptrie/comptrie_impl.cpp +++ b/library/cpp/containers/comptrie/comptrie_impl.cpp @@ -1,7 +1,7 @@ -#include "comptrie_impl.h" +#include "comptrie_impl.h" #include <util/system/rusage.h> -#include <util/stream/output.h> +#include <util/stream/output.h> // Unpack the leaf value. The algorithm can store up to 8 full bytes in leafs. diff --git a/library/cpp/containers/comptrie/comptrie_impl.h b/library/cpp/containers/comptrie/comptrie_impl.h index 09a3554e2d..f41c38311a 100644 --- a/library/cpp/containers/comptrie/comptrie_impl.h +++ b/library/cpp/containers/comptrie/comptrie_impl.h @@ -30,17 +30,17 @@ namespace NCompactTrie { return !(flags & (MT_FINAL | MT_NEXT)); } - static inline void TraverseEpsilon(const char*& datapos) { - const char flags = *datapos; - if (!IsEpsilonLink(flags)) { - return; - } - const size_t offsetlength = flags & MT_SIZEMASK; - const size_t offset = UnpackOffset(datapos + 1, offsetlength); + static inline void TraverseEpsilon(const char*& datapos) { + const char flags = *datapos; + if (!IsEpsilonLink(flags)) { + return; + } + const size_t offsetlength = flags & MT_SIZEMASK; + const size_t offset = UnpackOffset(datapos + 1, offsetlength); Y_ASSERT(offset); - datapos += offset; - } - + datapos += offset; + } + static inline size_t LeftOffsetLen(const char flags) { return (flags >> MT_LEFTSHIFT) & MT_SIZEMASK; } @@ -176,12 +176,12 @@ namespace NCompactTrie { return 0; } - // Auxiliary function: consumes one (multibyte) symbol from the input. - // Advances the data pointer to the root of the subtrie beginning after the symbol, - // zeroes it if this subtrie is empty. - // If there is a value associated with the symbol, makes the value pointer point to it, + // Auxiliary function: consumes one (multibyte) symbol from the input. + // Advances the data pointer to the root of the subtrie beginning after the symbol, + // zeroes it if this subtrie is empty. + // If there is a value associated with the symbol, makes the value pointer point to it, // otherwise sets it to nullptr. - // Returns true if the symbol was succesfully found in the trie, false otherwise. + // Returns true if the symbol was succesfully found in the trie, false otherwise. template <typename TSymbol, class TPacker> Y_FORCE_INLINE bool Advance(const char*& datapos, const char* const dataend, const char*& value, TSymbol label, TPacker packer) { diff --git a/library/cpp/containers/comptrie/comptrie_packer.h b/library/cpp/containers/comptrie/comptrie_packer.h index b26335e76e..0341eeeae3 100644 --- a/library/cpp/containers/comptrie/comptrie_packer.h +++ b/library/cpp/containers/comptrie/comptrie_packer.h @@ -1,21 +1,21 @@ -#pragma once - +#pragma once + #include <library/cpp/packers/packers.h> - -template <class T> -class TCompactTriePacker { -public: - void UnpackLeaf(const char* p, T& t) const { - NPackers::TPacker<T>().UnpackLeaf(p, t); - } - void PackLeaf(char* buffer, const T& data, size_t computedSize) const { - NPackers::TPacker<T>().PackLeaf(buffer, data, computedSize); - } - size_t MeasureLeaf(const T& data) const { - return NPackers::TPacker<T>().MeasureLeaf(data); - } - size_t SkipLeaf(const char* p) const // this function better be fast because it is very frequently used - { - return NPackers::TPacker<T>().SkipLeaf(p); - } -}; + +template <class T> +class TCompactTriePacker { +public: + void UnpackLeaf(const char* p, T& t) const { + NPackers::TPacker<T>().UnpackLeaf(p, t); + } + void PackLeaf(char* buffer, const T& data, size_t computedSize) const { + NPackers::TPacker<T>().PackLeaf(buffer, data, computedSize); + } + size_t MeasureLeaf(const T& data) const { + return NPackers::TPacker<T>().MeasureLeaf(data); + } + size_t SkipLeaf(const char* p) const // this function better be fast because it is very frequently used + { + return NPackers::TPacker<T>().SkipLeaf(p); + } +}; diff --git a/library/cpp/containers/comptrie/comptrie_trie.h b/library/cpp/containers/comptrie/comptrie_trie.h index 3875dca513..40ec1e52b3 100644 --- a/library/cpp/containers/comptrie/comptrie_trie.h +++ b/library/cpp/containers/comptrie/comptrie_trie.h @@ -1,33 +1,33 @@ #pragma once #include "comptrie_impl.h" -#include "comptrie_packer.h" -#include "opaque_trie_iterator.h" -#include "leaf_skipper.h" -#include "key_selector.h" - -#include <util/generic/buffer.h> -#include <util/generic/ptr.h> -#include <util/generic/vector.h> -#include <util/generic/yexception.h> -#include <util/memory/blob.h> +#include "comptrie_packer.h" +#include "opaque_trie_iterator.h" +#include "leaf_skipper.h" +#include "key_selector.h" + +#include <util/generic/buffer.h> +#include <util/generic/ptr.h> +#include <util/generic/vector.h> +#include <util/generic/yexception.h> +#include <util/memory/blob.h> #include <util/stream/input.h> #include <utility> - + template <class T, class D, class S> class TCompactTrieBuilder; -namespace NCompactTrie { - template <class TTrie> - class TFirstSymbolIterator; -} - -template <class TTrie> -class TSearchIterator; - -template <class TTrie> -class TPrefixIterator; - +namespace NCompactTrie { + template <class TTrie> + class TFirstSymbolIterator; +} + +template <class TTrie> +class TSearchIterator; + +template <class TTrie> +class TPrefixIterator; + // in case of <char> specialization cannot distinguish between "" and "\0" keys template <class T = char, class D = ui64, class S = TCompactTriePacker<D>> class TCompactTrie { @@ -36,8 +36,8 @@ public: typedef D TData; typedef S TPacker; - typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; - typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; + typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; + typedef typename TCompactTrieKeySelector<TSymbol>::TKeyBuf TKeyBuf; typedef std::pair<TKey, TData> TValueType; typedef std::pair<size_t, TData> TPhraseMatch; @@ -47,9 +47,9 @@ public: protected: TBlob DataHolder; - const char* EmptyValue = nullptr; + const char* EmptyValue = nullptr; TPacker Packer; - NCompactTrie::TPackerLeafSkipper<TPacker> Skipper = &Packer; // This should be true for every constructor. + NCompactTrie::TPackerLeafSkipper<TPacker> Skipper = &Packer; // This should be true for every constructor. public: TCompactTrie() = default; @@ -64,12 +64,12 @@ public: : TCompactTrie{data, TPacker{}} { } - // Skipper should be initialized with &Packer, not with &other.Packer, so you have to redefine these. - TCompactTrie(const TCompactTrie& other); + // Skipper should be initialized with &Packer, not with &other.Packer, so you have to redefine these. + TCompactTrie(const TCompactTrie& other); TCompactTrie(TCompactTrie&& other) noexcept; - TCompactTrie& operator=(const TCompactTrie& other); + TCompactTrie& operator=(const TCompactTrie& other); TCompactTrie& operator=(TCompactTrie&& other) noexcept; - + explicit operator bool() const { return !IsEmpty(); } @@ -106,18 +106,18 @@ public: return DataHolder; }; - const NCompactTrie::ILeafSkipper& GetSkipper() const { - return Skipper; - } - + const NCompactTrie::ILeafSkipper& GetSkipper() const { + return Skipper; + } + TPacker GetPacker() const { return Packer; } - bool HasCorrectSkipper() const { - return Skipper.GetPacker() == &Packer; - } - + bool HasCorrectSkipper() const { + return Skipper.GetPacker() == &Packer; + } + void FindPhrases(const TSymbol* key, size_t keylen, TPhraseMatchVector& matches, TSymbol separator = TSymbol(' ')) const; void FindPhrases(const TKeyBuf& key, TPhraseMatchVector& matches, TSymbol separator = TSymbol(' ')) const { return FindPhrases(key.data(), key.size(), matches, separator); @@ -131,11 +131,11 @@ public: inline TCompactTrie<T, D, S> FindTails(const TSymbol* key, size_t keylen) const; TCompactTrie<T, D, S> FindTails(const TKeyBuf& key) const { return FindTails(key.data(), key.size()); - } + } bool FindTails(const TSymbol* key, size_t keylen, TCompactTrie<T, D, S>& res) const; bool FindTails(const TKeyBuf& key, TCompactTrie<T, D, S>& res) const { return FindTails(key.data(), key.size(), res); - } + } // same as FindTails(&key, 1), a bit faster // return false, if no arc with @label exists @@ -143,11 +143,11 @@ public: class TConstIterator { private: - typedef NCompactTrie::TOpaqueTrieIterator TOpaqueTrieIterator; - typedef NCompactTrie::TOpaqueTrie TOpaqueTrie; + typedef NCompactTrie::TOpaqueTrieIterator TOpaqueTrieIterator; + typedef NCompactTrie::TOpaqueTrie TOpaqueTrie; friend class TCompactTrie; TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, TPacker packer); // only usable from Begin() and End() methods - TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer); // only usable from UpperBound() method + TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer); // only usable from UpperBound() method public: TConstIterator() = default; @@ -159,8 +159,8 @@ public: bool operator!=(const TConstIterator& other) const; TConstIterator& operator++(); TConstIterator operator++(int /*unused*/); - TConstIterator& operator--(); - TConstIterator operator--(int /*unused*/); + TConstIterator& operator--(); + TConstIterator operator--(int /*unused*/); TValueType operator*(); TKey GetKey() const; @@ -170,8 +170,8 @@ public: const char* GetValuePtr() const; private: - TPacker Packer; - TCopyPtr<TOpaqueTrieIterator> Impl; + TPacker Packer; + TCopyPtr<TOpaqueTrieIterator> Impl; }; TConstIterator Begin() const; @@ -179,20 +179,20 @@ public: TConstIterator End() const; TConstIterator end() const; - // Returns an iterator pointing to the smallest key in the trie >= the argument. + // Returns an iterator pointing to the smallest key in the trie >= the argument. // TODO: misleading name. Should be called LowerBound for consistency with stl. - // No. It is the STL that has a misleading name. - // LowerBound of X cannot be greater than X. + // No. It is the STL that has a misleading name. + // LowerBound of X cannot be greater than X. TConstIterator UpperBound(const TKeyBuf& key) const; - + void Print(IOutputStream& os); size_t Size() const; - friend class NCompactTrie::TFirstSymbolIterator<TCompactTrie>; - friend class TSearchIterator<TCompactTrie>; - friend class TPrefixIterator<TCompactTrie>; - + friend class NCompactTrie::TFirstSymbolIterator<TCompactTrie>; + friend class TSearchIterator<TCompactTrie>; + friend class TPrefixIterator<TCompactTrie>; + protected: explicit TCompactTrie(const char* emptyValue); TCompactTrie(const TBlob& data, const char* emptyValue, TPacker packer = TPacker()); @@ -231,7 +231,7 @@ TCompactTrie<T, D, S>::TCompactTrie(const TBlob& data, TPacker packer) template <class T, class D, class S> TCompactTrie<T, D, S>::TCompactTrie(const char* d, size_t len, TPacker packer) - : Packer(packer) + : Packer(packer) { Init(d, len, packer); } @@ -251,42 +251,42 @@ TCompactTrie<T, D, S>::TCompactTrie(const TBlob& data, const char* emptyValue, T } template <class T, class D, class S> -TCompactTrie<T, D, S>::TCompactTrie(const TCompactTrie& other) +TCompactTrie<T, D, S>::TCompactTrie(const TCompactTrie& other) : DataHolder(other.DataHolder) , EmptyValue(other.EmptyValue) , Packer(other.Packer) { } - -template <class T, class D, class S> + +template <class T, class D, class S> TCompactTrie<T, D, S>::TCompactTrie(TCompactTrie&& other) noexcept : DataHolder(std::move(other.DataHolder)) , EmptyValue(std::move(other.EmptyValue)) , Packer(std::move(other.Packer)) { } - -template <class T, class D, class S> + +template <class T, class D, class S> TCompactTrie<T, D, S>& TCompactTrie<T, D, S>::operator=(const TCompactTrie& other) { - if (this != &other) { - DataHolder = other.DataHolder; - EmptyValue = other.EmptyValue; - Packer = other.Packer; - } - return *this; -} - -template <class T, class D, class S> + if (this != &other) { + DataHolder = other.DataHolder; + EmptyValue = other.EmptyValue; + Packer = other.Packer; + } + return *this; +} + +template <class T, class D, class S> TCompactTrie<T, D, S>& TCompactTrie<T, D, S>::operator=(TCompactTrie&& other) noexcept { - if (this != &other) { + if (this != &other) { DataHolder = std::move(other.DataHolder); EmptyValue = std::move(other.EmptyValue); Packer = std::move(other.Packer); - } - return *this; -} - -template <class T, class D, class S> + } + return *this; +} + +template <class T, class D, class S> void TCompactTrie<T, D, S>::Init(const char* d, size_t len, TPacker packer) { Init(TBlob::NoCopy(d, len), packer); } @@ -298,7 +298,7 @@ void TCompactTrie<T, D, S>::Init(const TBlob& data, TPacker packer) { DataHolder = data; Packer = packer; - const char* datapos = DataHolder.AsCharPtr(); + const char* datapos = DataHolder.AsCharPtr(); size_t len = DataHolder.Length(); if (!len) return; @@ -337,7 +337,7 @@ bool TCompactTrie<T, D, S>::Find(const TSymbol* key, size_t keylen, TData* value template <class T, class D, class S> void TCompactTrie<T, D, S>::FindPhrases(const TSymbol* key, size_t keylen, TPhraseMatchVector& matches, TSymbol separator) const { - LookupPhrases(DataHolder.AsCharPtr(), DataHolder.Length(), key, keylen, matches, separator); + LookupPhrases(DataHolder.AsCharPtr(), DataHolder.Length(), key, keylen, matches, separator); } template <class T, class D, class S> @@ -397,7 +397,7 @@ inline bool TCompactTrie<T, D, S>::FindTails(TSymbol label, TCompactTrie<T, D, S if (!len) return false; - const char* datastart = DataHolder.AsCharPtr(); + const char* datastart = DataHolder.AsCharPtr(); const char* dataend = datastart + len; const char* datapos = datastart; const char* value = nullptr; @@ -417,8 +417,8 @@ inline bool TCompactTrie<T, D, S>::FindTails(TSymbol label, TCompactTrie<T, D, S template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::Begin() const { - NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); - return TConstIterator(self, EmptyValue, false, Packer); + NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); + return TConstIterator(self, EmptyValue, false, Packer); } template <class T, class D, class S> @@ -428,8 +428,8 @@ typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::begin() co template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::End() const { - NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); - return TConstIterator(self, EmptyValue, true, Packer); + NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); + return TConstIterator(self, EmptyValue, true, Packer); } template <class T, class D, class S> @@ -447,13 +447,13 @@ size_t TCompactTrie<T, D, S>::Size() const { template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::UpperBound(const TKeyBuf& key) const { - NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); - return TConstIterator(self, EmptyValue, key, Packer); -} - + NCompactTrie::TOpaqueTrie self(DataHolder.AsCharPtr(), DataHolder.Length(), Skipper); + return TConstIterator(self, EmptyValue, key, Packer); +} + template <class T, class D, class S> void TCompactTrie<T, D, S>::Print(IOutputStream& os) { - typedef typename ::TCompactTrieKeySelector<T>::TKeyBuf TSBuffer; + typedef typename ::TCompactTrieKeySelector<T>::TKeyBuf TSBuffer; for (TConstIterator it = Begin(); it != End(); ++it) { os << TSBuffer((*it).first.data(), (*it).first.size()) << "\t" << (*it).second << Endl; } @@ -478,7 +478,7 @@ template <class T, class D, class S> bool TCompactTrie<T, D, S>::LookupLongestPrefix(const TSymbol* key, size_t keylen, size_t& prefixLen, const char*& valuepos, bool& hasNext) const { using namespace NCompactTrie; - const char* datapos = DataHolder.AsCharPtr(); + const char* datapos = DataHolder.AsCharPtr(); size_t len = DataHolder.Length(); prefixLen = 0; @@ -540,18 +540,18 @@ void TCompactTrie<T, D, S>::LookupPhrases( const T* const keystart = key; const T* const keyend = key + keylen; const char* const dataend = datapos + len; - while (datapos && key != keyend) { + while (datapos && key != keyend) { T label = *(key++); - const char* value = nullptr; - if (!Advance(datapos, dataend, value, label, Packer)) { - return; + const char* value = nullptr; + if (!Advance(datapos, dataend, value, label, Packer)) { + return; + } + if (value && (key == keyend || *key == separator)) { + size_t matchlength = (size_t)(key - keystart); + D data; + Packer.UnpackLeaf(value, data); + matches.push_back(TPhraseMatch(matchlength, data)); } - if (value && (key == keyend || *key == separator)) { - size_t matchlength = (size_t)(key - keystart); - D data; - Packer.UnpackLeaf(value, data); - matches.push_back(TPhraseMatch(matchlength, data)); - } } } @@ -567,30 +567,30 @@ TCompactTrieHolder<T, D, S>::TCompactTrieHolder(IInputStream& is, size_t len) TBase::Init(Storage.Get(), len); } -//---------------------------------------------------------------------------------------------------------------- +//---------------------------------------------------------------------------------------------------------------- // TCompactTrie::TConstIterator template <class T, class D, class S> -TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, TPacker packer) +TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, TPacker packer) : Packer(packer) , Impl(new TOpaqueTrieIterator(trie, emptyValue, atend)) { } template <class T, class D, class S> -TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer) - : Packer(packer) - , Impl(new TOpaqueTrieIterator(trie, emptyValue, true)) -{ - Impl->UpperBound<TSymbol>(key); -} - +TCompactTrie<T, D, S>::TConstIterator::TConstIterator(const TOpaqueTrie& trie, const char* emptyValue, const TKeyBuf& key, TPacker packer) + : Packer(packer) + , Impl(new TOpaqueTrieIterator(trie, emptyValue, true)) +{ + Impl->UpperBound<TSymbol>(key); +} + template <class T, class D, class S> bool TCompactTrie<T, D, S>::TConstIterator::operator==(const TConstIterator& other) const { - if (!Impl) - return !other.Impl; - if (!other.Impl) - return false; + if (!Impl) + return !other.Impl; + if (!other.Impl) + return false; return *Impl == *other.Impl; } @@ -614,46 +614,46 @@ typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::TConstIter template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator& TCompactTrie<T, D, S>::TConstIterator::operator--() { - Impl->Backward(); - return *this; -} - + Impl->Backward(); + return *this; +} + template <class T, class D, class S> typename TCompactTrie<T, D, S>::TConstIterator TCompactTrie<T, D, S>::TConstIterator::operator--(int /*unused*/) { - TConstIterator copy(*this); - Impl->Backward(); - return copy; -} - + TConstIterator copy(*this); + Impl->Backward(); + return copy; +} + template <class T, class D, class S> typename TCompactTrie<T, D, S>::TValueType TCompactTrie<T, D, S>::TConstIterator::operator*() { return TValueType(GetKey(), GetValue()); } template <class T, class D, class S> -typename TCompactTrie<T, D, S>::TKey TCompactTrie<T, D, S>::TConstIterator::GetKey() const { - return Impl->GetKey<TSymbol>(); +typename TCompactTrie<T, D, S>::TKey TCompactTrie<T, D, S>::TConstIterator::GetKey() const { + return Impl->GetKey<TSymbol>(); } template <class T, class D, class S> -size_t TCompactTrie<T, D, S>::TConstIterator::GetKeySize() const { - return Impl->MeasureKey<TSymbol>(); +size_t TCompactTrie<T, D, S>::TConstIterator::GetKeySize() const { + return Impl->MeasureKey<TSymbol>(); } template <class T, class D, class S> -const char* TCompactTrie<T, D, S>::TConstIterator::GetValuePtr() const { - return Impl->GetValuePtr(); +const char* TCompactTrie<T, D, S>::TConstIterator::GetValuePtr() const { + return Impl->GetValuePtr(); } template <class T, class D, class S> -typename TCompactTrie<T, D, S>::TData TCompactTrie<T, D, S>::TConstIterator::GetValue() const { +typename TCompactTrie<T, D, S>::TData TCompactTrie<T, D, S>::TConstIterator::GetValue() const { D data; GetValue(data); return data; } template <class T, class D, class S> -void TCompactTrie<T, D, S>::TConstIterator::GetValue(typename TCompactTrie<T, D, S>::TData& data) const { +void TCompactTrie<T, D, S>::TConstIterator::GetValue(typename TCompactTrie<T, D, S>::TData& data) const { const char* ptr = GetValuePtr(); if (ptr) { Packer.UnpackLeaf(ptr, data); diff --git a/library/cpp/containers/comptrie/comptrie_ut.cpp b/library/cpp/containers/comptrie/comptrie_ut.cpp index 8ee798bc1e..74bee09b5d 100644 --- a/library/cpp/containers/comptrie/comptrie_ut.cpp +++ b/library/cpp/containers/comptrie/comptrie_ut.cpp @@ -22,14 +22,14 @@ #include "comptrie.h" #include "set.h" -#include "first_symbol_iterator.h" -#include "search_iterator.h" +#include "first_symbol_iterator.h" +#include "search_iterator.h" #include "pattern_searcher.h" #include <array> #include <iterator> - + class TCompactTrieTest: public TTestBase { private: UNIT_TEST_SUITE(TCompactTrieTest); @@ -40,7 +40,7 @@ private: UNIT_TEST(TestFastTrie8); UNIT_TEST(TestFastTrie16); UNIT_TEST(TestFastTrie32); - + UNIT_TEST(TestMinimizedTrie8); UNIT_TEST(TestMinimizedTrie16); UNIT_TEST(TestMinimizedTrie32); @@ -48,7 +48,7 @@ private: UNIT_TEST(TestFastMinimizedTrie8); UNIT_TEST(TestFastMinimizedTrie16); UNIT_TEST(TestFastMinimizedTrie32); - + UNIT_TEST(TestTrieIterator8); UNIT_TEST(TestTrieIterator16); UNIT_TEST(TestTrieIterator32); @@ -98,12 +98,12 @@ private: UNIT_TEST(TestSearchIterWchar32) UNIT_TEST(TestCopyAndAssignment); - + UNIT_TEST(TestFirstSymbolIterator8); UNIT_TEST(TestFirstSymbolIterator16); UNIT_TEST(TestFirstSymbolIterator32); UNIT_TEST(TestFirstSymbolIteratorChar32); - + UNIT_TEST(TestArrayPacker); UNIT_TEST(TestBuilderFindLongestPrefix); @@ -124,13 +124,13 @@ private: void CheckData(const char* src, size_t len); template <class T> - void CheckUpperBound(const char* src, size_t len); - - template <class T> + void CheckUpperBound(const char* src, size_t len); + + template <class T> void CheckIterator(const char* src, size_t len); template <class T> - void TestTrie(bool minimize, bool useFastLayout); + void TestTrie(bool minimize, bool useFastLayout); template <class T> void TestTrieIterator(bool minimize); @@ -140,8 +140,8 @@ private: void TestFindTailsImpl(const TString& prefix); - void TestUniqueImpl(bool isPrefixGrouped); - + void TestUniqueImpl(bool isPrefixGrouped); + TVector<TUtf16String> GetSampleKeys(size_t nKeys) const; template <class TContainer> TVector<TContainer> GetSampleVectorData(size_t nValues); @@ -155,12 +155,12 @@ private: template <typename TChar> void TestSearchIterImpl(); - template <class TTrie> - void TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers); - - template <typename TSymbol> - void TestFirstSymbolIterator(); - + template <class TTrie> + void TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers); + + template <typename TSymbol> + void TestFirstSymbolIterator(); + template <class T> class TIntPacker; template <class T> @@ -174,18 +174,18 @@ public: void TestTrie16(); void TestTrie32(); - void TestFastTrie8(); - void TestFastTrie16(); - void TestFastTrie32(); - + void TestFastTrie8(); + void TestFastTrie16(); + void TestFastTrie32(); + void TestMinimizedTrie8(); void TestMinimizedTrie16(); void TestMinimizedTrie32(); - void TestFastMinimizedTrie8(); - void TestFastMinimizedTrie16(); - void TestFastMinimizedTrie32(); - + void TestFastMinimizedTrie8(); + void TestFastMinimizedTrie16(); + void TestFastMinimizedTrie32(); + void TestTrieIterator8(); void TestTrieIterator16(); void TestTrieIterator32(); @@ -197,15 +197,15 @@ public: void TestPhraseSearch(); void TestAddGet(); void TestEmpty(); - void TestUninitializedNonEmpty(); + void TestUninitializedNonEmpty(); void TestRandom(); void TestFindTails(); - void TestPrefixGrouped(); - void CrashTestPrefixGrouped(); + void TestPrefixGrouped(); + void CrashTestPrefixGrouped(); void TestMergeFromFile(); void TestMergeFromBuffer(); - void TestUnique(); - void TestAddRetValue(); + void TestUnique(); + void TestAddRetValue(); void TestClear(); void TestIterateEmptyKey(); @@ -227,20 +227,20 @@ public: void TestTrieForListVectorInt64(); void TestTrieForPairWtrokaVectorInt64(); - void TestEmptyValueOutOfOrder(); + void TestEmptyValueOutOfOrder(); void TestFindLongestPrefixWithEmptyValue(); void TestSearchIterChar(); void TestSearchIterWchar(); void TestSearchIterWchar32(); - - void TestCopyAndAssignment(); - - void TestFirstSymbolIterator8(); - void TestFirstSymbolIterator16(); - void TestFirstSymbolIterator32(); + + void TestCopyAndAssignment(); + + void TestFirstSymbolIterator8(); + void TestFirstSymbolIterator16(); + void TestFirstSymbolIterator32(); void TestFirstSymbolIteratorChar32(); - + void TestArrayPacker(); void TestBuilderFindLongestPrefix(); @@ -278,7 +278,7 @@ const char* TCompactTrieTest::SampleData[] = { template <class T> typename TCompactTrie<T>::TKey MakeWideKey(const char* str, size_t len) { - typename TCompactTrie<T>::TKey buffer; + typename TCompactTrie<T>::TKey buffer; for (size_t i = 0; i < len; i++) { unsigned int ch = (str[i] & 0xFF); buffer.push_back((T)(ch | ch << 8 | ch << 16 | ch << 24)); @@ -288,17 +288,17 @@ typename TCompactTrie<T>::TKey MakeWideKey(const char* str, size_t len) { template <class T> typename TCompactTrie<T>::TKey MakeWideKey(const TString& str) { - return MakeWideKey<T>(str.c_str(), str.length()); -} - + return MakeWideKey<T>(str.c_str(), str.length()); +} + template <class T> -typename TCompactTrie<T>::TKey MakeWideKey(const TStringBuf& buf) { - return MakeWideKey<T>(buf.data(), buf.length()); -} - +typename TCompactTrie<T>::TKey MakeWideKey(const TStringBuf& buf) { + return MakeWideKey<T>(buf.data(), buf.length()); +} + template <class T> void TCompactTrieTest::CreateTrie(IOutputStream& out, bool minimize, bool useFastLayout) { - TCompactTrieBuilder<T> builder; + TCompactTrieBuilder<T> builder; for (auto& i : SampleData) { size_t len = strlen(i); @@ -306,57 +306,57 @@ void TCompactTrieTest::CreateTrie(IOutputStream& out, bool minimize, bool useFas builder.Add(MakeWideKey<T>(i, len), len * 2); } - TBufferOutput tmp2; + TBufferOutput tmp2; IOutputStream& currentOutput = useFastLayout ? tmp2 : out; if (minimize) { TBufferOutput buftmp; builder.Save(buftmp); CompactTrieMinimize<TCompactTriePacker<ui64>>(currentOutput, buftmp.Buffer().Data(), buftmp.Buffer().Size(), false); } else { - builder.Save(currentOutput); + builder.Save(currentOutput); } - if (useFastLayout) { + if (useFastLayout) { CompactTrieMakeFastLayout<TCompactTriePacker<T>>(out, tmp2.Buffer().Data(), tmp2.Buffer().Size(), false); - } + } } -// Iterates over all strings of length <= 4 made of letters a-g. +// Iterates over all strings of length <= 4 made of letters a-g. static bool LexicographicStep(TString& s) { - if (s.length() < 4) { - s += "a"; - return true; - } - while (!s.empty() && s.back() == 'g') - s.pop_back(); - if (s.empty()) - return false; - char last = s.back(); - last++; - s.pop_back(); - s.push_back(last); - return true; -} - + if (s.length() < 4) { + s += "a"; + return true; + } + while (!s.empty() && s.back() == 'g') + s.pop_back(); + if (s.empty()) + return false; + char last = s.back(); + last++; + s.pop_back(); + s.push_back(last); + return true; +} + template <class T> -void TCompactTrieTest::CheckUpperBound(const char* data, size_t datalen) { - TCompactTrie<T> trie(data, datalen); +void TCompactTrieTest::CheckUpperBound(const char* data, size_t datalen) { + TCompactTrie<T> trie(data, datalen); typedef typename TCompactTrie<T>::TKey TKey; typedef typename TCompactTrie<T>::TData TData; - + TString key; - do { - const TKey wideKey = MakeWideKey<T>(key); + do { + const TKey wideKey = MakeWideKey<T>(key); typename TCompactTrie<T>::TConstIterator it = trie.UpperBound(wideKey); - UNIT_ASSERT_C(it == trie.End() || it.GetKey() >= wideKey, "key=" + key); - TData data; - const bool found = trie.Find(wideKey, &data); - if (found) - UNIT_ASSERT_C(it.GetKey() == wideKey && it.GetValue() == data, "key=" + key); - if (it != trie.Begin()) - UNIT_ASSERT_C((--it).GetKey() < wideKey, "key=" + key); - } while (LexicographicStep(key)); -} - + UNIT_ASSERT_C(it == trie.End() || it.GetKey() >= wideKey, "key=" + key); + TData data; + const bool found = trie.Find(wideKey, &data); + if (found) + UNIT_ASSERT_C(it.GetKey() == wideKey && it.GetValue() == data, "key=" + key); + if (it != trie.Begin()) + UNIT_ASSERT_C((--it).GetKey() < wideKey, "key=" + key); + } while (LexicographicStep(key)); +} + template <class T> void TCompactTrieTest::CheckData(const char* data, size_t datalen) { TCompactTrie<T> trie(data, datalen); @@ -377,7 +377,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { TString badkey("bb"); badkey += i; - key = MakeWideKey<T>(badkey); + key = MakeWideKey<T>(badkey); UNIT_ASSERT(!trie.Find(key)); value = 123; UNIT_ASSERT(!trie.Find(key, &value)); @@ -388,7 +388,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { badkey = i; badkey += "x"; - key = MakeWideKey<T>(badkey); + key = MakeWideKey<T>(badkey); UNIT_ASSERT(!trie.Find(key)); value = 1234; UNIT_ASSERT(!trie.Find(key, &value)); @@ -401,7 +401,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { } TString testkey("fbbaa"); - typename TCompactTrie<T>::TKey key = MakeWideKey<T>(testkey); + typename TCompactTrie<T>::TKey key = MakeWideKey<T>(testkey); ui64 value = 0; size_t prefixLen = 0; UNIT_ASSERT(trie.FindLongestPrefix(key.data(), testkey.length() - 1, &prefixLen, &value)); @@ -409,7 +409,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { UNIT_ASSERT_EQUAL(6, value); testkey = "fbbax"; - key = MakeWideKey<T>(testkey); + key = MakeWideKey<T>(testkey); UNIT_ASSERT(trie.FindLongestPrefix(key, &prefixLen, &value)); UNIT_ASSERT_EQUAL(prefixLen, 3); UNIT_ASSERT_EQUAL(6, value); @@ -422,7 +422,7 @@ void TCompactTrieTest::CheckData(const char* data, size_t datalen) { template <class T> void TCompactTrieTest::CheckIterator(const char* data, size_t datalen) { typedef typename TCompactTrie<T>::TKey TKey; - typedef typename TCompactTrie<T>::TValueType TValue; + typedef typename TCompactTrie<T>::TValueType TValue; TMap<TKey, ui64> stored; for (auto& i : SampleData) { @@ -439,7 +439,7 @@ void TCompactTrieTest::CheckIterator(const char* data, size_t datalen) { while (it != trie.End()) { UNIT_ASSERT_VALUES_EQUAL(it.GetKeySize(), it.GetKey().size()); received.insert(*it); - items.push_back(*it); + items.push_back(*it); entry_count++; it++; } @@ -450,53 +450,53 @@ void TCompactTrieTest::CheckIterator(const char* data, size_t datalen) { UNIT_ASSERT(entry_count == stored.size()); UNIT_ASSERT(received == stored); UNIT_ASSERT(received2 == stored); - - std::reverse(items.begin(), items.end()); - typename TCompactTrie<T>::TConstIterator revIt = trie.End(); - typename TCompactTrie<T>::TConstIterator const begin = trie.Begin(); + + std::reverse(items.begin(), items.end()); + typename TCompactTrie<T>::TConstIterator revIt = trie.End(); + typename TCompactTrie<T>::TConstIterator const begin = trie.Begin(); typename TCompactTrie<T>::TConstIterator emptyIt; - size_t pos = 0; - while (revIt != begin) { - revIt--; - UNIT_ASSERT(*revIt == items[pos]); - pos++; - } - // Checking the assignment operator. - revIt = begin; - UNIT_ASSERT(revIt == trie.Begin()); - UNIT_ASSERT(!revIt.IsEmpty()); + size_t pos = 0; + while (revIt != begin) { + revIt--; + UNIT_ASSERT(*revIt == items[pos]); + pos++; + } + // Checking the assignment operator. + revIt = begin; + UNIT_ASSERT(revIt == trie.Begin()); + UNIT_ASSERT(!revIt.IsEmpty()); UNIT_ASSERT(revIt != emptyIt); - UNIT_ASSERT(revIt != trie.End()); - ++revIt; // Call a method that uses Skipper. + UNIT_ASSERT(revIt != trie.End()); + ++revIt; // Call a method that uses Skipper. revIt = emptyIt; UNIT_ASSERT(revIt == emptyIt); - UNIT_ASSERT(revIt.IsEmpty()); - UNIT_ASSERT(revIt != trie.End()); - // Checking the move assignment operator. - revIt = trie.Begin(); - UNIT_ASSERT(revIt == trie.Begin()); - UNIT_ASSERT(!revIt.IsEmpty()); - UNIT_ASSERT(revIt != emptyIt); - UNIT_ASSERT(revIt != trie.End()); - ++revIt; // Call a method that uses Skipper. - revIt = typename TCompactTrie<T>::TConstIterator(); - UNIT_ASSERT(revIt == emptyIt); - UNIT_ASSERT(revIt.IsEmpty()); - UNIT_ASSERT(revIt != trie.End()); + UNIT_ASSERT(revIt.IsEmpty()); + UNIT_ASSERT(revIt != trie.End()); + // Checking the move assignment operator. + revIt = trie.Begin(); + UNIT_ASSERT(revIt == trie.Begin()); + UNIT_ASSERT(!revIt.IsEmpty()); + UNIT_ASSERT(revIt != emptyIt); + UNIT_ASSERT(revIt != trie.End()); + ++revIt; // Call a method that uses Skipper. + revIt = typename TCompactTrie<T>::TConstIterator(); + UNIT_ASSERT(revIt == emptyIt); + UNIT_ASSERT(revIt.IsEmpty()); + UNIT_ASSERT(revIt != trie.End()); } template <class T> -void TCompactTrieTest::TestTrie(bool minimize, bool useFastLayout) { +void TCompactTrieTest::TestTrie(bool minimize, bool useFastLayout) { TBufferOutput bufout; - CreateTrie<T>(bufout, minimize, useFastLayout); + CreateTrie<T>(bufout, minimize, useFastLayout); CheckData<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); - CheckUpperBound<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); + CheckUpperBound<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); } template <class T> void TCompactTrieTest::TestTrieIterator(bool minimize) { TBufferOutput bufout; - CreateTrie<T>(bufout, minimize, false); + CreateTrie<T>(bufout, minimize, false); CheckIterator<T>(bufout.Buffer().Data(), bufout.Buffer().Size()); } @@ -539,7 +539,7 @@ void TCompactTrieTest::TestFastMinimizedTrie16() { void TCompactTrieTest::TestFastMinimizedTrie32() { TestTrie<wchar32>(true, true); } - + void TCompactTrieTest::TestTrieIterator8() { TestTrieIterator<char>(false); } @@ -549,7 +549,7 @@ void TCompactTrieTest::TestTrieIterator16() { void TCompactTrieTest::TestTrieIterator32() { TestTrieIterator<wchar32>(false); } - + void TCompactTrieTest::TestMinimizedTrieIterator8() { TestTrieIterator<char>(true); } @@ -566,7 +566,7 @@ void TCompactTrieTest::TestPhraseSearch() { static const char* const badphrase = "cd ef gh ab"; TBufferOutput bufout; - TCompactTrieBuilder<char> builder; + TCompactTrieBuilder<char> builder; for (size_t i = 0; i < Y_ARRAY_SIZE(phrases); i++) { builder.Add(phrases[i], strlen(phrases[i]), i); } @@ -626,26 +626,26 @@ void TCompactTrieTest::TestEmpty() { UNIT_ASSERT(trieNull.Begin() == trieNull.End()); } -void TCompactTrieTest::TestUninitializedNonEmpty() { - TBufferOutput bufout; - CreateTrie<char>(bufout, false, false); - TCompactTrie<char> trie(bufout.Buffer().Data(), bufout.Buffer().Size()); - typedef TCompactTrie<char>::TKey TKey; - typedef TCompactTrie<char>::TConstIterator TIter; - - TCompactTrie<char> tails = trie.FindTails("abd", 3); // A trie that has empty value and no data. - UNIT_ASSERT(!tails.IsEmpty()); - UNIT_ASSERT(!tails.IsInitialized()); - const TKey wideKey = MakeWideKey<char>("c", 1); - TIter it = tails.UpperBound(wideKey); - UNIT_ASSERT(it == tails.End()); - UNIT_ASSERT(it != tails.Begin()); - --it; - UNIT_ASSERT(it == tails.Begin()); - ++it; - UNIT_ASSERT(it == tails.End()); -} - +void TCompactTrieTest::TestUninitializedNonEmpty() { + TBufferOutput bufout; + CreateTrie<char>(bufout, false, false); + TCompactTrie<char> trie(bufout.Buffer().Data(), bufout.Buffer().Size()); + typedef TCompactTrie<char>::TKey TKey; + typedef TCompactTrie<char>::TConstIterator TIter; + + TCompactTrie<char> tails = trie.FindTails("abd", 3); // A trie that has empty value and no data. + UNIT_ASSERT(!tails.IsEmpty()); + UNIT_ASSERT(!tails.IsInitialized()); + const TKey wideKey = MakeWideKey<char>("c", 1); + TIter it = tails.UpperBound(wideKey); + UNIT_ASSERT(it == tails.End()); + UNIT_ASSERT(it != tails.Begin()); + --it; + UNIT_ASSERT(it == tails.Begin()); + ++it; + UNIT_ASSERT(it == tails.End()); +} + static char RandChar() { return char(RandomNumber<size_t>() % 256); } @@ -688,7 +688,7 @@ void TCompactTrieTest::TestRandom(const size_t n, const size_t maxKeySize) { } TCompactTrie<char, typename T::TData, T> trieMin(buftmp.Buffer().Data(), buftmp.Buffer().Size()); - TCompactTrieBuilder<char, typename T::TData, T> prefixGroupedBuilder(CTBF_PREFIX_GROUPED); + TCompactTrieBuilder<char, typename T::TData, T> prefixGroupedBuilder(CTBF_PREFIX_GROUPED); for (typename TKeys::const_iterator i = keys.begin(), mi = keys.end(); i != mi; ++i) { UNIT_ASSERT(!prefixGroupedBuilder.Find(i->first.c_str(), i->first.size(), &dummy)); @@ -699,7 +699,7 @@ void TCompactTrieTest::TestRandom(const size_t n, const size_t maxKeySize) { UNIT_ASSERT(dummy == i->second); } - prefixGroupedBuilder.Add(i->first.c_str(), i->first.size(), dummy); + prefixGroupedBuilder.Add(i->first.c_str(), i->first.size(), dummy); UNIT_ASSERT(prefixGroupedBuilder.Find(i->first.c_str(), i->first.size(), &dummy)); for (typename TKeys::const_iterator j = keys.begin(), end = keys.end(); j != end; ++j) { @@ -713,11 +713,11 @@ void TCompactTrieTest::TestRandom(const size_t n, const size_t maxKeySize) { } } - TBufferStream prefixGroupedBuffer; - prefixGroupedBuilder.Save(prefixGroupedBuffer); + TBufferStream prefixGroupedBuffer; + prefixGroupedBuilder.Save(prefixGroupedBuffer); - UNIT_ASSERT_VALUES_EQUAL(stream.Buffer().Size(), prefixGroupedBuffer.Buffer().Size()); - UNIT_ASSERT(0 == memcmp(stream.Buffer().Data(), prefixGroupedBuffer.Buffer().Data(), stream.Buffer().Size())); + UNIT_ASSERT_VALUES_EQUAL(stream.Buffer().Size(), prefixGroupedBuffer.Buffer().Size()); + UNIT_ASSERT(0 == memcmp(stream.Buffer().Data(), prefixGroupedBuffer.Buffer().Data(), stream.Buffer().Size())); } void TCompactTrieTest::TestRandom() { @@ -730,7 +730,7 @@ void TCompactTrieTest::TestRandom() { } void TCompactTrieTest::TestFindTailsImpl(const TString& prefix) { - TCompactTrieBuilder<> builder; + TCompactTrieBuilder<> builder; TMap<TString, ui64> input; @@ -775,9 +775,9 @@ void TCompactTrieTest::TestFindTailsImpl(const TString& prefix) { UNIT_ASSERT(input == output); } -void TCompactTrieTest::TestPrefixGrouped() { +void TCompactTrieTest::TestPrefixGrouped() { TBuffer b1b; - TCompactTrieBuilder<char, ui32> b1(CTBF_PREFIX_GROUPED); + TCompactTrieBuilder<char, ui32> b1(CTBF_PREFIX_GROUPED); const char* data[] = { "Kazan", "Moscow", @@ -820,27 +820,27 @@ void TCompactTrieTest::TestPrefixGrouped() { } } -void TCompactTrieTest::CrashTestPrefixGrouped() { - TCompactTrieBuilder<char, ui32> builder(CTBF_PREFIX_GROUPED); - const char* data[] = { - "Fryazino", - "Fryanovo", - "Monino", - "", - "Fryazevo", - }; - bool wasException = false; - try { +void TCompactTrieTest::CrashTestPrefixGrouped() { + TCompactTrieBuilder<char, ui32> builder(CTBF_PREFIX_GROUPED); + const char* data[] = { + "Fryazino", + "Fryanovo", + "Monino", + "", + "Fryazevo", + }; + bool wasException = false; + try { for (size_t i = 0; i < Y_ARRAY_SIZE(data); ++i) { - builder.Add(data[i], strlen(data[i]), i + 1); - } - } catch (const yexception& e) { - wasException = true; - UNIT_ASSERT(strstr(e.what(), "Bad input order - expected input strings to be prefix-grouped.")); - } - UNIT_ASSERT_C(wasException, "CrashTestPrefixGrouped"); -} - + builder.Add(data[i], strlen(data[i]), i + 1); + } + } catch (const yexception& e) { + wasException = true; + UNIT_ASSERT(strstr(e.what(), "Bad input order - expected input strings to be prefix-grouped.")); + } + UNIT_ASSERT_C(wasException, "CrashTestPrefixGrouped"); +} + void TCompactTrieTest::TestMergeFromFile() { { TCompactTrieBuilder<> b; @@ -863,7 +863,7 @@ void TCompactTrieTest::TestMergeFromFile() { { TCompactTrieBuilder<> b; UNIT_ASSERT(b.AddSubtreeInFile("com.", GetSystemTempDir() + "/TCompactTrieTest-TestMerge-com")); - UNIT_ASSERT(b.Add("org.kernel", 22)); + UNIT_ASSERT(b.Add("org.kernel", 22)); UNIT_ASSERT(b.AddSubtreeInFile("ru.", GetSystemTempDir() + "/TCompactTrieTest-TestMerge-ru")); TUnbufferedFileOutput out(GetSystemTempDir() + "/TCompactTrieTest-TestMerge-res"); b.Save(out); @@ -929,14 +929,14 @@ void TCompactTrieTest::TestMergeFromBuffer() { unlink((GetSystemTempDir() + "/TCompactTrieTest-TestMergeFromBuffer-res").data()); } -void TCompactTrieTest::TestUnique() { - TestUniqueImpl(false); - TestUniqueImpl(true); -} - -void TCompactTrieTest::TestUniqueImpl(bool isPrefixGrouped) { - TCompactTrieBuilder<char, ui32> builder(CTBF_UNIQUE | (isPrefixGrouped ? CTBF_PREFIX_GROUPED : CTBF_NONE)); - const char* data[] = { +void TCompactTrieTest::TestUnique() { + TestUniqueImpl(false); + TestUniqueImpl(true); +} + +void TCompactTrieTest::TestUniqueImpl(bool isPrefixGrouped) { + TCompactTrieBuilder<char, ui32> builder(CTBF_UNIQUE | (isPrefixGrouped ? CTBF_PREFIX_GROUPED : CTBF_NONE)); + const char* data[] = { "Kazan", "Moscow", "Monino", @@ -946,23 +946,23 @@ void TCompactTrieTest::TestUniqueImpl(bool isPrefixGrouped) { "Fryazevo", "Fry", "Tumen", - }; + }; for (size_t i = 0; i < Y_ARRAY_SIZE(data); ++i) { - UNIT_ASSERT_C(builder.Add(data[i], strlen(data[i]), i + 1), i); - } - bool wasException = false; - try { - builder.Add(data[4], strlen(data[4]), 20); - } catch (const yexception& e) { - wasException = true; - UNIT_ASSERT(strstr(e.what(), "Duplicate key")); - } - UNIT_ASSERT_C(wasException, "TestUnique"); -} - -void TCompactTrieTest::TestAddRetValue() { - TCompactTrieBuilder<char, ui32> builder; - const char* data[] = { + UNIT_ASSERT_C(builder.Add(data[i], strlen(data[i]), i + 1), i); + } + bool wasException = false; + try { + builder.Add(data[4], strlen(data[4]), 20); + } catch (const yexception& e) { + wasException = true; + UNIT_ASSERT(strstr(e.what(), "Duplicate key")); + } + UNIT_ASSERT_C(wasException, "TestUnique"); +} + +void TCompactTrieTest::TestAddRetValue() { + TCompactTrieBuilder<char, ui32> builder; + const char* data[] = { "Kazan", "Moscow", "Monino", @@ -972,16 +972,16 @@ void TCompactTrieTest::TestAddRetValue() { "Fryazevo", "Fry", "Tumen", - }; + }; for (size_t i = 0; i < Y_ARRAY_SIZE(data); ++i) { - UNIT_ASSERT(builder.Add(data[i], strlen(data[i]), i + 1)); - UNIT_ASSERT(!builder.Add(data[i], strlen(data[i]), i + 2)); - ui32 value; - UNIT_ASSERT(builder.Find(data[i], strlen(data[i]), &value)); - UNIT_ASSERT(value == i + 2); - } -} - + UNIT_ASSERT(builder.Add(data[i], strlen(data[i]), i + 1)); + UNIT_ASSERT(!builder.Add(data[i], strlen(data[i]), i + 2)); + ui32 value; + UNIT_ASSERT(builder.Find(data[i], strlen(data[i]), &value)); + UNIT_ASSERT(value == i + 2); + } +} + void TCompactTrieTest::TestClear() { TCompactTrieBuilder<char, ui32> builder; const char* data[] = { @@ -1253,22 +1253,22 @@ void TCompactTrieTest::TestTrieForPairWtrokaVectorInt64() { TestTrieWithContainers<std::pair<TUtf16String, TVector<i64>>>(keys, data, "pair-str-v-i64"); } -void TCompactTrieTest::TestEmptyValueOutOfOrder() { - TBufferOutput buffer; - using TSymbol = ui32; - { - TCompactTrieBuilder<TSymbol, ui32> builder; - TSymbol key = 1; - builder.Add(&key, 1, 10); - builder.Add(nullptr, 0, 14); - builder.Save(buffer); - } - { - TCompactTrie<TSymbol, ui32> trie(buffer.Buffer().Data(), buffer.Buffer().Size()); - UNIT_ASSERT(trie.Find(nullptr, 0)); - } -} - +void TCompactTrieTest::TestEmptyValueOutOfOrder() { + TBufferOutput buffer; + using TSymbol = ui32; + { + TCompactTrieBuilder<TSymbol, ui32> builder; + TSymbol key = 1; + builder.Add(&key, 1, 10); + builder.Add(nullptr, 0, 14); + builder.Save(buffer); + } + { + TCompactTrie<TSymbol, ui32> trie(buffer.Buffer().Data(), buffer.Buffer().Size()); + UNIT_ASSERT(trie.Find(nullptr, 0)); + } +} + void TCompactTrieTest::TestFindLongestPrefixWithEmptyValue() { TBufferOutput buffer; { @@ -1340,25 +1340,25 @@ void TCompactTrieTest::TestSearchIterImpl() { TCompactTrie<TChar, ui32> trie(buffer.Buffer().Data(), buffer.Buffer().Size()); ui32 value = 0; - auto iter(MakeSearchIterator(trie)); + auto iter(MakeSearchIterator(trie)); MoveIter(iter, TConvertKey<TChar>::Convert(TStringBuf("abc"))); UNIT_ASSERT(!iter.GetValue(&value)); - iter = MakeSearchIterator(trie); + iter = MakeSearchIterator(trie); MoveIter(iter, TConvertKey<TChar>::Convert(TStringBuf("abbbc"))); UNIT_ASSERT(iter.GetValue(&value)); UNIT_ASSERT_EQUAL(value, 3); - iter = MakeSearchIterator(trie); + iter = MakeSearchIterator(trie); UNIT_ASSERT(iter.Advance(TConvertKey<TChar>::Convert(TStringBuf("bdfa")))); UNIT_ASSERT(!iter.GetValue(&value)); - iter = MakeSearchIterator(trie); + iter = MakeSearchIterator(trie); UNIT_ASSERT(iter.Advance(TConvertKey<TChar>::Convert(TStringBuf("bdfaa")))); UNIT_ASSERT(iter.GetValue(&value)); UNIT_ASSERT_EQUAL(value, 4); - UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TChar('z'))); + UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TChar('z'))); UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TConvertKey<TChar>::Convert(TStringBuf("cdf")))); UNIT_ASSERT(!MakeSearchIterator(trie).Advance(TConvertKey<TChar>::Convert(TStringBuf("abca")))); } @@ -1370,69 +1370,69 @@ void TCompactTrieTest::TestSearchIterChar() { void TCompactTrieTest::TestSearchIterWchar() { TestSearchIterImpl<wchar16>(); } - + void TCompactTrieTest::TestSearchIterWchar32() { TestSearchIterImpl<wchar32>(); } -void TCompactTrieTest::TestCopyAndAssignment() { - TBufferOutput bufout; - typedef TCompactTrie<> TTrie; - CreateTrie<char>(bufout, false, false); - TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); - TTrie copy(trie); - UNIT_ASSERT(copy.HasCorrectSkipper()); - TTrie assign; - assign = trie; - UNIT_ASSERT(assign.HasCorrectSkipper()); +void TCompactTrieTest::TestCopyAndAssignment() { + TBufferOutput bufout; + typedef TCompactTrie<> TTrie; + CreateTrie<char>(bufout, false, false); + TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); + TTrie copy(trie); + UNIT_ASSERT(copy.HasCorrectSkipper()); + TTrie assign; + assign = trie; + UNIT_ASSERT(assign.HasCorrectSkipper()); TTrie move(std::move(trie)); - UNIT_ASSERT(move.HasCorrectSkipper()); - TTrie moveAssign; - moveAssign = TTrie(bufout.Buffer().Data(), bufout.Buffer().Size()); - UNIT_ASSERT(moveAssign.HasCorrectSkipper()); -} - -template <class TTrie> -void TCompactTrieTest::TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers) { - NCompactTrie::TFirstSymbolIterator<TTrie> it; - it.SetTrie(trie, trie.GetSkipper()); - typename TTrie::TKey answers = MakeWideKey<typename TTrie::TSymbol>(narrowAnswers); - auto answer = answers.begin(); - for (; !it.AtEnd(); it.MakeStep(), ++answer) { - UNIT_ASSERT(answer != answers.end()); - UNIT_ASSERT(it.GetKey() == *answer); - } - UNIT_ASSERT(answer == answers.end()); -} - -template <class TSymbol> -void TCompactTrieTest::TestFirstSymbolIterator() { - TBufferOutput bufout; - typedef TCompactTrie<TSymbol> TTrie; - CreateTrie<TSymbol>(bufout, false, false); - TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); + UNIT_ASSERT(move.HasCorrectSkipper()); + TTrie moveAssign; + moveAssign = TTrie(bufout.Buffer().Data(), bufout.Buffer().Size()); + UNIT_ASSERT(moveAssign.HasCorrectSkipper()); +} + +template <class TTrie> +void TCompactTrieTest::TestFirstSymbolIteratorForTrie(const TTrie& trie, const TStringBuf& narrowAnswers) { + NCompactTrie::TFirstSymbolIterator<TTrie> it; + it.SetTrie(trie, trie.GetSkipper()); + typename TTrie::TKey answers = MakeWideKey<typename TTrie::TSymbol>(narrowAnswers); + auto answer = answers.begin(); + for (; !it.AtEnd(); it.MakeStep(), ++answer) { + UNIT_ASSERT(answer != answers.end()); + UNIT_ASSERT(it.GetKey() == *answer); + } + UNIT_ASSERT(answer == answers.end()); +} + +template <class TSymbol> +void TCompactTrieTest::TestFirstSymbolIterator() { + TBufferOutput bufout; + typedef TCompactTrie<TSymbol> TTrie; + CreateTrie<TSymbol>(bufout, false, false); + TTrie trie(bufout.Buffer().Data(), bufout.Buffer().Size()); TStringBuf rootAnswers = "abcdf"; - TestFirstSymbolIteratorForTrie(trie, rootAnswers); + TestFirstSymbolIteratorForTrie(trie, rootAnswers); TStringBuf aAnswers = "abcd"; - TestFirstSymbolIteratorForTrie(trie.FindTails(MakeWideKey<TSymbol>("a", 1)), aAnswers); -} - -void TCompactTrieTest::TestFirstSymbolIterator8() { - TestFirstSymbolIterator<char>(); -} - -void TCompactTrieTest::TestFirstSymbolIterator16() { - TestFirstSymbolIterator<wchar16>(); -} - -void TCompactTrieTest::TestFirstSymbolIterator32() { - TestFirstSymbolIterator<ui32>(); -} - + TestFirstSymbolIteratorForTrie(trie.FindTails(MakeWideKey<TSymbol>("a", 1)), aAnswers); +} + +void TCompactTrieTest::TestFirstSymbolIterator8() { + TestFirstSymbolIterator<char>(); +} + +void TCompactTrieTest::TestFirstSymbolIterator16() { + TestFirstSymbolIterator<wchar16>(); +} + +void TCompactTrieTest::TestFirstSymbolIterator32() { + TestFirstSymbolIterator<ui32>(); +} + void TCompactTrieTest::TestFirstSymbolIteratorChar32() { TestFirstSymbolIterator<wchar32>(); } - + void TCompactTrieTest::TestArrayPacker() { using TDataInt = std::array<int, 2>; diff --git a/library/cpp/containers/comptrie/first_symbol_iterator.h b/library/cpp/containers/comptrie/first_symbol_iterator.h index 6dbd2f9296..d06135f06f 100644 --- a/library/cpp/containers/comptrie/first_symbol_iterator.h +++ b/library/cpp/containers/comptrie/first_symbol_iterator.h @@ -1,61 +1,61 @@ -#pragma once - -#include "opaque_trie_iterator.h" -#include <util/generic/ptr.h> - -namespace NCompactTrie { - // Iterates over possible first symbols in a trie. - // Allows one to get the symbol and the subtrie starting from it. +#pragma once + +#include "opaque_trie_iterator.h" +#include <util/generic/ptr.h> + +namespace NCompactTrie { + // Iterates over possible first symbols in a trie. + // Allows one to get the symbol and the subtrie starting from it. template <class TTrie> - class TFirstSymbolIterator { - public: - using TSymbol = typename TTrie::TSymbol; - using TData = typename TTrie::TData; - - void SetTrie(const TTrie& trie, const ILeafSkipper& skipper) { - Trie = trie; - Impl.Reset(new TOpaqueTrieIterator( - TOpaqueTrie(Trie.Data().AsCharPtr(), Trie.Data().Size(), skipper), - nullptr, - false, - sizeof(TSymbol))); - if (Impl->MeasureKey<TSymbol>() == 0) { - MakeStep(); - } - } - - const TTrie& GetTrie() const { - return Trie; - } - - bool AtEnd() const { - return *Impl == TOpaqueTrieIterator(Impl->GetTrie(), nullptr, true, sizeof(TSymbol)); - } - - TSymbol GetKey() const { - return Impl->GetKey<TSymbol>()[0]; - } - - TTrie GetTails() const { - const TNode& node = Impl->GetNode(); - const size_t forwardOffset = node.GetForwardOffset(); - const char* emptyValue = node.IsFinal() ? Trie.Data().AsCharPtr() + node.GetLeafOffset() : nullptr; - if (forwardOffset) { - const char* start = Trie.Data().AsCharPtr() + forwardOffset; - TBlob body = TBlob::NoCopy(start, Trie.Data().Size() - forwardOffset); - return TTrie(body, emptyValue, Trie.GetPacker()); - } else { - return TTrie(emptyValue); - } - } - - void MakeStep() { - Impl->Forward(); - } - - private: - TTrie Trie; - TCopyPtr<TOpaqueTrieIterator> Impl; - }; - + class TFirstSymbolIterator { + public: + using TSymbol = typename TTrie::TSymbol; + using TData = typename TTrie::TData; + + void SetTrie(const TTrie& trie, const ILeafSkipper& skipper) { + Trie = trie; + Impl.Reset(new TOpaqueTrieIterator( + TOpaqueTrie(Trie.Data().AsCharPtr(), Trie.Data().Size(), skipper), + nullptr, + false, + sizeof(TSymbol))); + if (Impl->MeasureKey<TSymbol>() == 0) { + MakeStep(); + } + } + + const TTrie& GetTrie() const { + return Trie; + } + + bool AtEnd() const { + return *Impl == TOpaqueTrieIterator(Impl->GetTrie(), nullptr, true, sizeof(TSymbol)); + } + + TSymbol GetKey() const { + return Impl->GetKey<TSymbol>()[0]; + } + + TTrie GetTails() const { + const TNode& node = Impl->GetNode(); + const size_t forwardOffset = node.GetForwardOffset(); + const char* emptyValue = node.IsFinal() ? Trie.Data().AsCharPtr() + node.GetLeafOffset() : nullptr; + if (forwardOffset) { + const char* start = Trie.Data().AsCharPtr() + forwardOffset; + TBlob body = TBlob::NoCopy(start, Trie.Data().Size() - forwardOffset); + return TTrie(body, emptyValue, Trie.GetPacker()); + } else { + return TTrie(emptyValue); + } + } + + void MakeStep() { + Impl->Forward(); + } + + private: + TTrie Trie; + TCopyPtr<TOpaqueTrieIterator> Impl; + }; + } diff --git a/library/cpp/containers/comptrie/key_selector.h b/library/cpp/containers/comptrie/key_selector.h index 01387947b4..60466cef71 100644 --- a/library/cpp/containers/comptrie/key_selector.h +++ b/library/cpp/containers/comptrie/key_selector.h @@ -1,28 +1,28 @@ -#pragma once - -#include <util/generic/vector.h> +#pragma once + +#include <util/generic/vector.h> #include <util/generic/string.h> -#include <util/generic/strbuf.h> - +#include <util/generic/strbuf.h> + template <class T> -struct TCompactTrieKeySelector { +struct TCompactTrieKeySelector { typedef TVector<T> TKey; typedef TVector<T> TKeyBuf; -}; - +}; + template <class TChar> -struct TCompactTrieCharKeySelector { +struct TCompactTrieCharKeySelector { typedef TBasicString<TChar> TKey; typedef TBasicStringBuf<TChar> TKeyBuf; -}; - +}; + template <> struct TCompactTrieKeySelector<char>: public TCompactTrieCharKeySelector<char> { -}; - +}; + template <> struct TCompactTrieKeySelector<wchar16>: public TCompactTrieCharKeySelector<wchar16> { -}; +}; template <> struct TCompactTrieKeySelector<wchar32>: public TCompactTrieCharKeySelector<wchar32> { diff --git a/library/cpp/containers/comptrie/leaf_skipper.h b/library/cpp/containers/comptrie/leaf_skipper.h index 9218735a2b..3959258948 100644 --- a/library/cpp/containers/comptrie/leaf_skipper.h +++ b/library/cpp/containers/comptrie/leaf_skipper.h @@ -1,6 +1,6 @@ #pragma once -#include <cstddef> +#include <cstddef> namespace NCompactTrie { class ILeafSkipper { @@ -23,34 +23,34 @@ namespace NCompactTrie { size_t SkipLeaf(const char* p) const override { return Packer->SkipLeaf(p); } - - // For test purposes. + + // For test purposes. const TPacker* GetPacker() const { return Packer; } }; - - // The data you need to traverse the trie without unpacking the values. - struct TOpaqueTrie { - const char* Data; - size_t Length; - const ILeafSkipper& SkipFunction; - - TOpaqueTrie(const char* data, size_t dataLength, const ILeafSkipper& skipFunction) - : Data(data) - , Length(dataLength) - , SkipFunction(skipFunction) + + // The data you need to traverse the trie without unpacking the values. + struct TOpaqueTrie { + const char* Data; + size_t Length; + const ILeafSkipper& SkipFunction; + + TOpaqueTrie(const char* data, size_t dataLength, const ILeafSkipper& skipFunction) + : Data(data) + , Length(dataLength) + , SkipFunction(skipFunction) { } - - bool operator==(const TOpaqueTrie& other) const { - return Data == other.Data && + + bool operator==(const TOpaqueTrie& other) const { + return Data == other.Data && Length == other.Length && &SkipFunction == &other.SkipFunction; - } - - bool operator!=(const TOpaqueTrie& other) const { - return !(*this == other); - } - }; + } + + bool operator!=(const TOpaqueTrie& other) const { + return !(*this == other); + } + }; } diff --git a/library/cpp/containers/comptrie/make_fast_layout.cpp b/library/cpp/containers/comptrie/make_fast_layout.cpp index 3dea56b7b9..ade78d7899 100644 --- a/library/cpp/containers/comptrie/make_fast_layout.cpp +++ b/library/cpp/containers/comptrie/make_fast_layout.cpp @@ -1,70 +1,70 @@ -#include "make_fast_layout.h" -#include "node.h" -#include "writeable_node.h" -#include "write_trie_backwards.h" -#include "comptrie_impl.h" +#include "make_fast_layout.h" +#include "node.h" +#include "writeable_node.h" +#include "write_trie_backwards.h" +#include "comptrie_impl.h" #include <util/generic/hash.h> -#include <util/generic/utility.h> - -// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. -// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. -// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. -// Calling it the second time on the same trie does nothing. -// -// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms -// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels -// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). -// The original paper (describing the layout in Section 2.1) is: -// Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341–358. -// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ -// Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium -// on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12–14, 2000, pages 399–409. -// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ -// (there is not much difference between these papers, actually). -// -namespace NCompactTrie { +#include <util/generic/utility.h> + +// Lay the trie in memory in such a way that there are less cache misses when jumping from root to leaf. +// The trie becomes about 2% larger, but the access became about 25% faster in our experiments. +// Can be called on minimized and non-minimized tries, in the first case in requires half a trie more memory. +// Calling it the second time on the same trie does nothing. +// +// The algorithm is based on van Emde Boas layout as described in the yandex data school lectures on external memory algoritms +// by Maxim Babenko and Ivan Puzyrevsky. The difference is that when we cut the tree into levels +// two nodes connected by a forward link are put into the same level (because they usually lie near each other in the original tree). +// The original paper (describing the layout in Section 2.1) is: +// Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Cache-Oblivious B-Trees // SIAM Journal on Computing, volume 35, number 2, 2005, pages 341–358. +// Available on the web: http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/ +// Or: Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees // Proceedings of the 41st Annual Symposium +// on Foundations of Computer Science (FOCS 2000), Redondo Beach, California, November 12–14, 2000, pages 399–409. +// Available on the web: http://erikdemaine.org/papers/FOCS2000b/ +// (there is not much difference between these papers, actually). +// +namespace NCompactTrie { static size_t FindSupportingPowerOf2(size_t n) { size_t result = 1ull << (8 * sizeof(size_t) - 1); while (result > n) { result >>= 1; } return result; - } - + } + namespace { class TTrieNodeSet { public: TTrieNodeSet() = default; - + explicit TTrieNodeSet(const TOpaqueTrie& trie) : Body(trie.Length / (8 * MinNodeSize) + 1, 0) { } - + bool Has(size_t offset) const { const size_t reducedOffset = ReducedOffset(offset); return OffsetCell(reducedOffset) & OffsetMask(reducedOffset); } - + void Add(size_t offset) { const size_t reducedOffset = ReducedOffset(offset); OffsetCell(reducedOffset) |= OffsetMask(reducedOffset); } - + void Remove(size_t offset) { const size_t reducedOffset = ReducedOffset(offset); OffsetCell(reducedOffset) &= ~OffsetMask(reducedOffset); } - + void Swap(TTrieNodeSet& other) { Body.swap(other.Body); } - + private: static const size_t MinNodeSize = 2; TVector<ui8> Body; - + static size_t ReducedOffset(size_t offset) { return offset / MinNodeSize; } @@ -78,23 +78,23 @@ namespace NCompactTrie { return Body.at(reducedOffset / 8); } }; - + //--------------------------------------------------------------------- - + class TTrieNodeCounts { public: TTrieNodeCounts() = default; - + explicit TTrieNodeCounts(const TOpaqueTrie& trie) : Body(trie.Length / MinNodeSize, 0) , IsTree(false) { } - + size_t Get(size_t offset) const { return IsTree ? 1 : Body.at(offset / MinNodeSize); } - + void Inc(size_t offset) { if (IsTree) { return; @@ -104,7 +104,7 @@ namespace NCompactTrie { ++count; } } - + size_t Dec(size_t offset) { if (IsTree) { return 0; @@ -116,33 +116,33 @@ namespace NCompactTrie { } return count; } - + void Swap(TTrieNodeCounts& other) { Body.swap(other.Body); ::DoSwap(IsTree, other.IsTree); } - + void SetTreeMode() { IsTree = true; Body = TVector<ui8>(); } - + private: static const size_t MinNodeSize = 2; static const ui8 MaxCount = 255; - + TVector<ui8> Body; bool IsTree = false; }; - + //---------------------------------------------------------- - + class TOffsetIndex { public: // In all methods: // Key --- offset from the beginning of the old trie. // Value --- offset from the end of the new trie. - + explicit TOffsetIndex(TTrieNodeCounts& counts) { ParentCounts.Swap(counts); } @@ -150,7 +150,7 @@ namespace NCompactTrie { void Add(size_t key, size_t value) { Data[key] = value; } - + size_t Size() const { return Data.size(); } @@ -172,14 +172,14 @@ namespace NCompactTrie { TTrieNodeCounts ParentCounts; THashMap<size_t, size_t> Data; }; - + //--------------------------------------------------------------------------------------- class TTrieMeasurer { public: TTrieMeasurer(const TOpaqueTrie& trie, bool verbose); void Measure(); - + size_t GetDepth() const { return Depth; } @@ -187,23 +187,23 @@ namespace NCompactTrie { size_t GetNodeCount() const { return NodeCount; } - + size_t GetUnminimizedNodeCount() const { return UnminimizedNodeCount; } - + bool IsTree() const { return NodeCount == UnminimizedNodeCount; } - + TTrieNodeCounts& GetParentCounts() { return ParentCounts; } - + const TOpaqueTrie& GetTrie() const { return Trie; } - + private: const TOpaqueTrie& Trie; size_t Depth; @@ -211,11 +211,11 @@ namespace NCompactTrie { size_t NodeCount; size_t UnminimizedNodeCount; const bool Verbose; - + // returns depth, increments NodeCount. size_t MeasureSubtrie(size_t rootOffset, bool isNewPath); }; - + TTrieMeasurer::TTrieMeasurer(const TOpaqueTrie& trie, bool verbose) : Trie(trie) , Depth(0) @@ -243,7 +243,7 @@ namespace NCompactTrie { Cerr << "Node count: " << NodeCount << Endl; } } - + // A chain of nodes linked by forward links // is considered one node with many left and right children // for depth measuring here and in @@ -276,14 +276,14 @@ namespace NCompactTrie { } else { break; } - } + } return depth; - } + } //-------------------------------------------------------------------------------------- - + using TLevelNodes = TVector<size_t>; - + struct TLevel { size_t Depth; TLevelNodes Nodes; @@ -295,7 +295,7 @@ namespace NCompactTrie { }; //---------------------------------------------------------------------------------------- - + class TVanEmdeBoasReverseNodeEnumerator: public TReverseNodeEnumerator { public: TVanEmdeBoasReverseNodeEnumerator(TTrieMeasurer& measurer, bool verbose) @@ -326,7 +326,7 @@ namespace NCompactTrie { size_t GetLeafLength() const override { return Get().GetLeafLength(); } - + // Returns recalculated offset from the end of the current node. size_t PrepareOffset(size_t absoffset, size_t resultLength) { if (!absoffset) @@ -352,19 +352,19 @@ namespace NCompactTrie { TOpaqueTrie Trie; size_t Depth; size_t MaxIndexSize; - + TVector<TLevel> Trace; TOffsetIndex BackIndex; TVector<TNode> CentralWord; TTrieNodeSet ProcessedNodes; - + const bool Verbose; - + private: bool IsVisited(size_t offset) const { return ProcessedNodes.Has(offset); } - + void AddCascade(size_t step) { Y_ASSERT(!(step & (step - 1))); // Should be a power of 2. while (step > 0) { @@ -381,7 +381,7 @@ namespace NCompactTrie { step /= 2; } } - } + } void FillCentralWord() { CentralWord.clear(); @@ -391,7 +391,7 @@ namespace NCompactTrie { CentralWord.push_back(TNode(Trie.Data, CentralWord.back().GetForwardOffset(), Trie.SkipFunction)); } } - + bool MoveCentralWordStart() { do { if (Fresh) { @@ -420,7 +420,7 @@ namespace NCompactTrie { } while (IsVisited(Trace.back().Nodes.back())); FillCentralWord(); return true; - } + } // Returns the maximal depth it has reached while searching. // This is a method because it needs OffsetIndex to skip processed nodes. @@ -435,7 +435,7 @@ namespace NCompactTrie { if (node.GetLeftOffset() && !IsVisited(node.GetLeftOffset())) { actualDepth = Max(actualDepth, 1 + FindDescendants(node.GetLeftOffset(), depth - 1, result)); - } + } if (node.GetRightOffset() && !IsVisited(node.GetRightOffset())) { actualDepth = Max(actualDepth, 1 + FindDescendants(node.GetRightOffset(), depth - 1, result)); @@ -445,23 +445,23 @@ namespace NCompactTrie { } else { break; } - } + } return actualDepth; - } + } }; - + } // Anonymous namespace. //----------------------------------------------------------------------------------- size_t RawCompactTrieFastLayoutImpl(IOutputStream& os, const TOpaqueTrie& trie, bool verbose) { if (!trie.Data || !trie.Length) { - return 0; - } + return 0; + } TTrieMeasurer mes(trie, verbose); mes.Measure(); TVanEmdeBoasReverseNodeEnumerator enumerator(mes, verbose); return WriteTrieBackwards(os, enumerator, verbose); } -} +} diff --git a/library/cpp/containers/comptrie/make_fast_layout.h b/library/cpp/containers/comptrie/make_fast_layout.h index 65d128a767..b8fab5d65b 100644 --- a/library/cpp/containers/comptrie/make_fast_layout.h +++ b/library/cpp/containers/comptrie/make_fast_layout.h @@ -1,7 +1,7 @@ #pragma once -#include "leaf_skipper.h" -#include <cstddef> +#include "leaf_skipper.h" +#include <cstddef> class IOutputStream; diff --git a/library/cpp/containers/comptrie/minimize.cpp b/library/cpp/containers/comptrie/minimize.cpp index 85260619bc..80d0b25217 100644 --- a/library/cpp/containers/comptrie/minimize.cpp +++ b/library/cpp/containers/comptrie/minimize.cpp @@ -1,11 +1,11 @@ -#include "minimize.h" -#include "node.h" -#include "writeable_node.h" -#include "write_trie_backwards.h" -#include "comptrie_impl.h" +#include "minimize.h" +#include "node.h" +#include "writeable_node.h" +#include "write_trie_backwards.h" +#include "comptrie_impl.h" #include <util/generic/hash.h> -#include <util/generic/algorithm.h> +#include <util/generic/algorithm.h> namespace NCompactTrie { // Minimize the trie. The result is equivalent to the original @@ -27,7 +27,7 @@ namespace NCompactTrie { using TSizePair = std::pair<size_t, size_t>; using TSizePairVector = TVector<TSizePair>; using TSizePairVectorVector = TVector<TSizePairVector>; - + class TOffsetMap { protected: TSizePairVectorVector Data; @@ -180,7 +180,7 @@ namespace NCompactTrie { if (Node.GetRightOffset()) nextOffset = Node.GetRightOffset(); break; - + case 2: if (Node.GetLeftOffset()) nextOffset = Node.GetLeftOffset(); @@ -280,7 +280,7 @@ namespace NCompactTrie { static void AddPiece(TPieceIndex& index, size_t offset, size_t len) { index[len].push_back(offset); } - + static TOffsetMap FindEquivalentSubtries(const TOpaqueTrie& trie, bool verbose, size_t minMergeSize) { // Tree nodes, arranged by span length. // When all nodes of a given size are considered, they pop off the queue. @@ -354,6 +354,6 @@ namespace NCompactTrie { return WriteTrieBackwards(os, enumerator, verbose); else return WriteTrieBackwardsNoAlloc(os, enumerator, trie, mode); - } - -} + } + +} diff --git a/library/cpp/containers/comptrie/minimize.h b/library/cpp/containers/comptrie/minimize.h index 6d2ef30920..baaa431d04 100644 --- a/library/cpp/containers/comptrie/minimize.h +++ b/library/cpp/containers/comptrie/minimize.h @@ -1,7 +1,7 @@ #pragma once -#include "leaf_skipper.h" -#include <cstddef> +#include "leaf_skipper.h" +#include <cstddef> class IOutputStream; diff --git a/library/cpp/containers/comptrie/node.cpp b/library/cpp/containers/comptrie/node.cpp index c3e8e807ce..5fd22f15ec 100644 --- a/library/cpp/containers/comptrie/node.cpp +++ b/library/cpp/containers/comptrie/node.cpp @@ -1,9 +1,9 @@ -#include "node.h" -#include "leaf_skipper.h" +#include "node.h" +#include "leaf_skipper.h" #include "comptrie_impl.h" -#include <util/system/yassert.h> -#include <util/generic/yexception.h> +#include <util/system/yassert.h> +#include <util/generic/yexception.h> namespace NCompactTrie { TNode::TNode() @@ -15,7 +15,7 @@ namespace NCompactTrie { for (auto& offset : Offsets) { offset = 0; } - } + } // We believe that epsilon links are found only on the forward-position and that afer jumping an epsilon link you come to an ordinary node. diff --git a/library/cpp/containers/comptrie/node.h b/library/cpp/containers/comptrie/node.h index 0f14048982..d6f4317db0 100644 --- a/library/cpp/containers/comptrie/node.h +++ b/library/cpp/containers/comptrie/node.h @@ -1,6 +1,6 @@ -#pragma once +#pragma once -#include <cstddef> +#include <cstddef> namespace NCompactTrie { class ILeafSkipper; @@ -17,18 +17,18 @@ namespace NCompactTrie { direction = static_cast<TDirection>(direction + 1); return direction; } - + inline TDirection& operator--(TDirection& direction) { direction = static_cast<TDirection>(direction - 1); return direction; } - + class TNode { public: TNode(); // Processes epsilon links and sets ForwardOffset to correct value. Assumes an epsilon link doesn't point to an epsilon link. TNode(const char* data, size_t offset, const ILeafSkipper& skipFunction); - + size_t GetOffset() const { return Offset; } @@ -59,7 +59,7 @@ namespace NCompactTrie { char GetLabel() const { return Label; } - + bool IsFinal() const { return GetLeafOffset() != 0; } @@ -73,7 +73,7 @@ namespace NCompactTrie { size_t Offset; size_t LeafLength; size_t CoreLength; - + char Label; }; diff --git a/library/cpp/containers/comptrie/opaque_trie_iterator.cpp b/library/cpp/containers/comptrie/opaque_trie_iterator.cpp index 2fef2230cb..5fd3914be6 100644 --- a/library/cpp/containers/comptrie/opaque_trie_iterator.cpp +++ b/library/cpp/containers/comptrie/opaque_trie_iterator.cpp @@ -1,8 +1,8 @@ -#include "opaque_trie_iterator.h" +#include "opaque_trie_iterator.h" #include "comptrie_impl.h" -#include "node.h" +#include "node.h" -namespace NCompactTrie { +namespace NCompactTrie { TOpaqueTrieIterator::TOpaqueTrieIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, size_t maxKeyLength) : Trie(trie) @@ -69,20 +69,20 @@ namespace NCompactTrie { bool TOpaqueTrieIterator::Backward() { if (AtEmptyValue) - return false; - + return false; + if (!Trie.Length) { if (EmptyValue) { // A trie that has only the empty value; // we are not at the empty value, so move to it. - AtEmptyValue = true; - return true; + AtEmptyValue = true; + return true; } else { // Empty trie. return false; - } - } - + } + } + if (Forks.Empty()) { TFork fork(Trie.Data, 0, Trie.Length, Trie.SkipFunction); fork.LastDirection(); @@ -123,8 +123,8 @@ namespace NCompactTrie { Forks.Clear(); } return true; - } - + } + const char* TOpaqueTrieIterator::GetValuePtr() const { if (!Forks.Empty()) { const TFork& lastFork = Forks.Top(); @@ -133,10 +133,10 @@ namespace NCompactTrie { } Y_ASSERT(AtEmptyValue); return EmptyValue; - } - + } + //------------------------------------------------------------------------- - + TString TForkStack::GetKey() const { if (HasEmptyKey()) { return TString(); @@ -147,7 +147,7 @@ namespace NCompactTrie { result.append(Top().GetLabel()); } return result; - } + } bool TForkStack::HasEmptyKey() const { // Special case: if we get a single zero label, treat it as an empty key @@ -175,14 +175,14 @@ namespace NCompactTrie { , Limit(limit) , CurrentDirection(TDirection(0)) { -#if COMPTRIE_DATA_CHECK +#if COMPTRIE_DATA_CHECK if (Node.GetOffset() >= Limit - 1) ythrow yexception() << "gone beyond the limit, data is corrupted"; -#endif +#endif while (CurrentDirection < D_MAX && !HasDirection(CurrentDirection)) { ++CurrentDirection; } - } + } bool TFork::operator==(const TFork& rhs) const { return (Data == rhs.Data && @@ -206,12 +206,12 @@ namespace NCompactTrie { } while (CurrentDirection > 0 && !HasDirection(CurrentDirection)); return HasDirection(CurrentDirection); } - + void TFork::LastDirection() { CurrentDirection = D_MAX; PrevDirection(); - } - + } + bool TFork::SetDirection(TDirection direction) { if (!HasDirection(direction)) { return false; @@ -219,13 +219,13 @@ namespace NCompactTrie { CurrentDirection = direction; return true; } - + char TFork::GetLabel() const { return Node.GetLabel(); } - + size_t TFork::GetValueOffset() const { return Node.GetLeafOffset(); - } - + } + } diff --git a/library/cpp/containers/comptrie/opaque_trie_iterator.h b/library/cpp/containers/comptrie/opaque_trie_iterator.h index 958c226fea..195da3c191 100644 --- a/library/cpp/containers/comptrie/opaque_trie_iterator.h +++ b/library/cpp/containers/comptrie/opaque_trie_iterator.h @@ -1,14 +1,14 @@ #pragma once #include "comptrie_impl.h" -#include "node.h" -#include "key_selector.h" -#include "leaf_skipper.h" - -#include <util/generic/vector.h> -#include <util/generic/yexception.h> - -namespace NCompactTrie { +#include "node.h" +#include "key_selector.h" +#include "leaf_skipper.h" + +#include <util/generic/vector.h> +#include <util/generic/yexception.h> + +namespace NCompactTrie { class ILeafSkipper; class TFork { // Auxiliary class for a branching point in the iterator @@ -30,7 +30,7 @@ namespace NCompactTrie { bool NextDirection(); bool PrevDirection(); void LastDirection(); - + bool HasDirection(TDirection direction) const { return Node.GetOffsetByDirection(direction); } @@ -48,8 +48,8 @@ namespace NCompactTrie { Y_ASSERT(CurrentDirection != D_FINAL); size_t offset = Node.GetOffsetByDirection(CurrentDirection); return TFork(Data, offset, Limit, skipper); - } - + } + //------------------------------------------------------------------------------------------------ class TForkStack { public: @@ -66,10 +66,10 @@ namespace NCompactTrie { Key.pop_back(); } } - + TFork& Top() { return Forks.back(); - } + } const TFork& Top() const { return Forks.back(); } @@ -77,35 +77,35 @@ namespace NCompactTrie { bool Empty() const { return Forks.empty(); } - + void Clear() { Forks.clear(); Key.clear(); - } - + } + bool operator==(const TForkStack& other) const { return Forks == other.Forks; } bool operator!=(const TForkStack& other) const { return !(*this == other); } - + TString GetKey() const; size_t MeasureKey() const; - + private: TVector<TFork> Forks; TString Key; - + private: bool TopHasLabelInKey() const { return !Empty() && Top().HasLabelInKey(); } bool HasEmptyKey() const; }; - + //------------------------------------------------------------------------------------------------ - + template <class TSymbol> struct TConvertRawKey { typedef typename TCompactTrieKeySelector<TSymbol>::TKey TKey; @@ -127,37 +127,37 @@ namespace NCompactTrie { } return result; } - + static size_t Size(size_t rawsize) { return rawsize / sizeof(TSymbol); } }; - + template <> struct TConvertRawKey<char> { static TString Get(const TString& rawkey) { return rawkey; } - + static size_t Size(size_t rawsize) { return rawsize; - } + } }; - + //------------------------------------------------------------------------------------------------ class TOpaqueTrieIterator { // Iterator stuff. Stores a stack of visited forks. public: TOpaqueTrieIterator(const TOpaqueTrie& trie, const char* emptyValue, bool atend, size_t maxKeyLength = size_t(-1)); - + bool operator==(const TOpaqueTrieIterator& rhs) const; bool operator!=(const TOpaqueTrieIterator& rhs) const { return !(*this == rhs); } - + bool Forward(); bool Backward(); - + template <class TSymbol> bool UpperBound(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key); // True if matched exactly. @@ -185,22 +185,22 @@ namespace NCompactTrie { const TOpaqueTrie& GetTrie() const { return Trie; } - + private: TOpaqueTrie Trie; TForkStack Forks; const char* const EmptyValue; bool AtEmptyValue; const size_t MaxKeyLength; - + private: bool HasMaxKeyLength() const; - + template <class TSymbol> int LongestPrefix(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key); // Used in UpperBound. }; - template <class TSymbol> + template <class TSymbol> int TOpaqueTrieIterator::LongestPrefix(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key) { Forks.Clear(); TFork next(Trie.Data, 0, Trie.Length, Trie.SkipFunction); @@ -230,17 +230,17 @@ namespace NCompactTrie { } } else if (!top.SetDirection(D_NEXT)) { top.SetDirection(D_FINAL); - return -1; - } + return -1; + } next = top.NextFork(Trie.SkipFunction); } while (Forks.Top().CurrentDirection != D_NEXT); // Proceed to the next byte. } - } + } // We get here only if the key was empty. Forks.Push(next); return 1; - } - + } + template <class TSymbol> bool TOpaqueTrieIterator::UpperBound(const typename TCompactTrieKeySelector<TSymbol>::TKeyBuf& key) { Forks.Clear(); @@ -249,7 +249,7 @@ namespace NCompactTrie { return true; } else { AtEmptyValue = false; - } + } const int defect = LongestPrefix<TSymbol>(key); if (defect > 0) { // Continue the constructed forks with the smallest key possible. @@ -261,6 +261,6 @@ namespace NCompactTrie { Forward(); } return defect == 0; - } + } -} +} diff --git a/library/cpp/containers/comptrie/prefix_iterator.cpp b/library/cpp/containers/comptrie/prefix_iterator.cpp index aec8c6ed28..5d4dfa3500 100644 --- a/library/cpp/containers/comptrie/prefix_iterator.cpp +++ b/library/cpp/containers/comptrie/prefix_iterator.cpp @@ -1 +1 @@ -#include "prefix_iterator.h" +#include "prefix_iterator.h" diff --git a/library/cpp/containers/comptrie/prefix_iterator.h b/library/cpp/containers/comptrie/prefix_iterator.h index 23b9cbf7c6..b369bb4f42 100644 --- a/library/cpp/containers/comptrie/prefix_iterator.h +++ b/library/cpp/containers/comptrie/prefix_iterator.h @@ -1,88 +1,88 @@ -#pragma once +#pragma once -#include "comptrie_trie.h" - -// Iterates over all prefixes of the given key in the trie. -template <class TTrie> -class TPrefixIterator { -public: - using TSymbol = typename TTrie::TSymbol; - using TPacker = typename TTrie::TPacker; - using TData = typename TTrie::TData; - -private: - const TTrie& Trie; - const TSymbol* key; - size_t keylen; - const TSymbol* keyend; - size_t prefixLen; - const char* valuepos; - const char* datapos; - const char* dataend; - TPacker Packer; - const char* EmptyValue; - bool result; - - bool Next(); - -public: +#include "comptrie_trie.h" + +// Iterates over all prefixes of the given key in the trie. +template <class TTrie> +class TPrefixIterator { +public: + using TSymbol = typename TTrie::TSymbol; + using TPacker = typename TTrie::TPacker; + using TData = typename TTrie::TData; + +private: + const TTrie& Trie; + const TSymbol* key; + size_t keylen; + const TSymbol* keyend; + size_t prefixLen; + const char* valuepos; + const char* datapos; + const char* dataend; + TPacker Packer; + const char* EmptyValue; + bool result; + + bool Next(); + +public: TPrefixIterator(const TTrie& trie, const TSymbol* aKey, size_t aKeylen) - : Trie(trie) + : Trie(trie) , key(aKey) , keylen(aKeylen) , keyend(aKey + aKeylen) - , prefixLen(0) - , valuepos(nullptr) - , datapos(trie.DataHolder.AsCharPtr()) - , dataend(datapos + trie.DataHolder.Length()) - { - result = Next(); - } - + , prefixLen(0) + , valuepos(nullptr) + , datapos(trie.DataHolder.AsCharPtr()) + , dataend(datapos + trie.DataHolder.Length()) + { + result = Next(); + } + operator bool() const { - return result; - } - + return result; + } + TPrefixIterator& operator++() { - result = Next(); - return *this; - } - - size_t GetPrefixLen() const { - return prefixLen; - } - - void GetValue(TData& to) const { - Trie.Packer.UnpackLeaf(valuepos, to); - } -}; - -template <class TTrie> -bool TPrefixIterator<TTrie>::Next() { - using namespace NCompactTrie; - if (!key || datapos == dataend) - return false; - - if ((key == keyend - keylen) && !valuepos && Trie.EmptyValue) { - valuepos = Trie.EmptyValue; - return true; - } - - while (datapos && key != keyend) { - TSymbol label = *(key++); - if (!Advance(datapos, dataend, valuepos, label, Packer)) { - return false; - } - if (valuepos) { // There is a value at the end of this symbol. - prefixLen = keylen - (keyend - key); - return true; - } - } - - return false; -} - -template <class TTrie> -TPrefixIterator<TTrie> MakePrefixIterator(const TTrie& trie, const typename TTrie::TSymbol* key, size_t keylen) { - return TPrefixIterator<TTrie>(trie, key, keylen); -} + result = Next(); + return *this; + } + + size_t GetPrefixLen() const { + return prefixLen; + } + + void GetValue(TData& to) const { + Trie.Packer.UnpackLeaf(valuepos, to); + } +}; + +template <class TTrie> +bool TPrefixIterator<TTrie>::Next() { + using namespace NCompactTrie; + if (!key || datapos == dataend) + return false; + + if ((key == keyend - keylen) && !valuepos && Trie.EmptyValue) { + valuepos = Trie.EmptyValue; + return true; + } + + while (datapos && key != keyend) { + TSymbol label = *(key++); + if (!Advance(datapos, dataend, valuepos, label, Packer)) { + return false; + } + if (valuepos) { // There is a value at the end of this symbol. + prefixLen = keylen - (keyend - key); + return true; + } + } + + return false; +} + +template <class TTrie> +TPrefixIterator<TTrie> MakePrefixIterator(const TTrie& trie, const typename TTrie::TSymbol* key, size_t keylen) { + return TPrefixIterator<TTrie>(trie, key, keylen); +} diff --git a/library/cpp/containers/comptrie/search_iterator.cpp b/library/cpp/containers/comptrie/search_iterator.cpp index cbeac7b2bc..eb91523574 100644 --- a/library/cpp/containers/comptrie/search_iterator.cpp +++ b/library/cpp/containers/comptrie/search_iterator.cpp @@ -1 +1 @@ -#include "search_iterator.h" +#include "search_iterator.h" diff --git a/library/cpp/containers/comptrie/search_iterator.h b/library/cpp/containers/comptrie/search_iterator.h index 17f6c2bb00..247f7e5936 100644 --- a/library/cpp/containers/comptrie/search_iterator.h +++ b/library/cpp/containers/comptrie/search_iterator.h @@ -1,38 +1,38 @@ -#pragma once +#pragma once -#include "comptrie_trie.h" +#include "comptrie_trie.h" #include "first_symbol_iterator.h" - + #include <util/str_stl.h> #include <util/digest/numeric.h> #include <util/digest/multi.h> -// Iterator for incremental searching. -// All Advance() methods shift the iterator using specifed key/char. -// The subsequent Advance() call starts searching from the previous state. -// The Advance() returns 'true' if specified key part exists in the trie and -// returns 'false' for unsuccessful search. In case of 'false' result -// all subsequent calls also will return 'false'. -// If current iterator state is final then GetValue() method returns 'true' and -// associated value. - -template <class TTrie> -class TSearchIterator { -public: - using TData = typename TTrie::TData; - using TSymbol = typename TTrie::TSymbol; - using TKeyBuf = typename TTrie::TKeyBuf; - +// Iterator for incremental searching. +// All Advance() methods shift the iterator using specifed key/char. +// The subsequent Advance() call starts searching from the previous state. +// The Advance() returns 'true' if specified key part exists in the trie and +// returns 'false' for unsuccessful search. In case of 'false' result +// all subsequent calls also will return 'false'. +// If current iterator state is final then GetValue() method returns 'true' and +// associated value. + +template <class TTrie> +class TSearchIterator { +public: + using TData = typename TTrie::TData; + using TSymbol = typename TTrie::TSymbol; + using TKeyBuf = typename TTrie::TKeyBuf; + TSearchIterator() = default; - explicit TSearchIterator(const TTrie& trie) - : Trie(&trie) - , DataPos(trie.DataHolder.AsCharPtr()) - , DataEnd(DataPos + trie.DataHolder.Length()) + explicit TSearchIterator(const TTrie& trie) + : Trie(&trie) + , DataPos(trie.DataHolder.AsCharPtr()) + , DataEnd(DataPos + trie.DataHolder.Length()) , ValuePos(trie.EmptyValue) - { - } - + { + } + explicit TSearchIterator(const TTrie& trie, const TTrie& subTrie) : Trie(&trie) , DataPos(subTrie.Data().AsCharPtr()) @@ -52,33 +52,33 @@ public: return !(*this == other); } - inline bool Advance(TSymbol label) { + inline bool Advance(TSymbol label) { Y_ASSERT(Trie); - if (DataPos == nullptr || DataPos >= DataEnd) { - return false; - } - return NCompactTrie::Advance(DataPos, DataEnd, ValuePos, label, Trie->Packer); - } - inline bool Advance(const TKeyBuf& key) { + if (DataPos == nullptr || DataPos >= DataEnd) { + return false; + } + return NCompactTrie::Advance(DataPos, DataEnd, ValuePos, label, Trie->Packer); + } + inline bool Advance(const TKeyBuf& key) { return Advance(key.data(), key.size()); - } - bool Advance(const TSymbol* key, size_t keylen); - bool GetValue(TData* value = nullptr) const; + } + bool Advance(const TSymbol* key, size_t keylen); + bool GetValue(TData* value = nullptr) const; bool HasValue() const; inline size_t GetHash() const; - -private: + +private: const TTrie* Trie = nullptr; const char* DataPos = nullptr; const char* DataEnd = nullptr; const char* ValuePos = nullptr; -}; - +}; + template <class TTrie> -inline TSearchIterator<TTrie> MakeSearchIterator(const TTrie& trie) { - return TSearchIterator<TTrie>(trie); -} - +inline TSearchIterator<TTrie> MakeSearchIterator(const TTrie& trie) { + return TSearchIterator<TTrie>(trie); +} + template <class TTrie> struct THash<TSearchIterator<TTrie>> { inline size_t operator()(const TSearchIterator<TTrie>& item) { @@ -86,42 +86,42 @@ struct THash<TSearchIterator<TTrie>> { } }; -//---------------------------------------------------------------------------- - -template <class TTrie> -bool TSearchIterator<TTrie>::Advance(const TSymbol* key, size_t keylen) { +//---------------------------------------------------------------------------- + +template <class TTrie> +bool TSearchIterator<TTrie>::Advance(const TSymbol* key, size_t keylen) { Y_ASSERT(Trie); - if (!key || DataPos == nullptr || DataPos >= DataEnd) { - return false; - } - if (!keylen) { - return true; - } - - const TSymbol* keyend = key + keylen; - while (key != keyend && DataPos != nullptr) { - if (!NCompactTrie::Advance(DataPos, DataEnd, ValuePos, *(key++), Trie->Packer)) { - return false; - } - if (key == keyend) { - return true; - } - } - return false; -} - -template <class TTrie> -bool TSearchIterator<TTrie>::GetValue(TData* value) const { + if (!key || DataPos == nullptr || DataPos >= DataEnd) { + return false; + } + if (!keylen) { + return true; + } + + const TSymbol* keyend = key + keylen; + while (key != keyend && DataPos != nullptr) { + if (!NCompactTrie::Advance(DataPos, DataEnd, ValuePos, *(key++), Trie->Packer)) { + return false; + } + if (key == keyend) { + return true; + } + } + return false; +} + +template <class TTrie> +bool TSearchIterator<TTrie>::GetValue(TData* value) const { Y_ASSERT(Trie); - bool result = false; - if (value) { - if (ValuePos) { - result = true; - Trie->Packer.UnpackLeaf(ValuePos, *value); - } - } - return result; -} + bool result = false; + if (value) { + if (ValuePos) { + result = true; + Trie->Packer.UnpackLeaf(ValuePos, *value); + } + } + return result; +} template <class TTrie> bool TSearchIterator<TTrie>::HasValue() const { diff --git a/library/cpp/containers/comptrie/write_trie_backwards.cpp b/library/cpp/containers/comptrie/write_trie_backwards.cpp index 93155490c4..fd8c28b0ed 100644 --- a/library/cpp/containers/comptrie/write_trie_backwards.cpp +++ b/library/cpp/containers/comptrie/write_trie_backwards.cpp @@ -1,10 +1,10 @@ -#include "write_trie_backwards.h" - -#include "comptrie_impl.h" +#include "write_trie_backwards.h" + +#include "comptrie_impl.h" #include "leaf_skipper.h" -#include <util/generic/buffer.h> -#include <util/generic/vector.h> +#include <util/generic/buffer.h> +#include <util/generic/vector.h> namespace NCompactTrie { size_t WriteTrieBackwards(IOutputStream& os, TReverseNodeEnumerator& enumerator, bool verbose) { diff --git a/library/cpp/containers/comptrie/write_trie_backwards.h b/library/cpp/containers/comptrie/write_trie_backwards.h index b398f9607f..634e6b811a 100644 --- a/library/cpp/containers/comptrie/write_trie_backwards.h +++ b/library/cpp/containers/comptrie/write_trie_backwards.h @@ -1,23 +1,23 @@ -#pragma once - +#pragma once + #include "minimize.h" #include <util/generic/vector.h> -#include <util/stream/output.h> -#include <cstddef> - -namespace NCompactTrie { - class TReverseNodeEnumerator { - public: +#include <util/stream/output.h> +#include <cstddef> + +namespace NCompactTrie { + class TReverseNodeEnumerator { + public: virtual ~TReverseNodeEnumerator() = default; - virtual bool Move() = 0; - virtual size_t GetLeafLength() const = 0; - virtual size_t RecreateNode(char* buffer, size_t resultLength) = 0; - }; - + virtual bool Move() = 0; + virtual size_t GetLeafLength() const = 0; + virtual size_t RecreateNode(char* buffer, size_t resultLength) = 0; + }; + struct TOpaqueTrie; size_t WriteTrieBackwards(IOutputStream& os, TReverseNodeEnumerator& enumerator, bool verbose); size_t WriteTrieBackwardsNoAlloc(IOutputStream& os, TReverseNodeEnumerator& enumerator, TOpaqueTrie& trie, EMinimizeMode mode); - -} + +} diff --git a/library/cpp/containers/comptrie/writeable_node.cpp b/library/cpp/containers/comptrie/writeable_node.cpp index 749e4fc8b9..404003dbbd 100644 --- a/library/cpp/containers/comptrie/writeable_node.cpp +++ b/library/cpp/containers/comptrie/writeable_node.cpp @@ -1,6 +1,6 @@ -#include "writeable_node.h" -#include "node.h" -#include "comptrie_impl.h" +#include "writeable_node.h" +#include "node.h" +#include "comptrie_impl.h" namespace NCompactTrie { TWriteableNode::TWriteableNode() @@ -26,7 +26,7 @@ namespace NCompactTrie { , Label(node.GetLabel()) { } - + size_t TWriteableNode::Measure() const { size_t len = 2 + LeafLength; size_t fwdLen = 0; @@ -77,7 +77,7 @@ namespace NCompactTrie { size_t usedLen = 2; usedLen += PackOffset(buffer + usedLen, leftOffset); usedLen += PackOffset(buffer + usedLen, rightOffset); - + if (LeafPos && LeafLength) { memcpy(buffer + usedLen, LeafPos, LeafLength); usedLen += LeafLength; diff --git a/library/cpp/containers/comptrie/writeable_node.h b/library/cpp/containers/comptrie/writeable_node.h index a5878176e3..5454e579ef 100644 --- a/library/cpp/containers/comptrie/writeable_node.h +++ b/library/cpp/containers/comptrie/writeable_node.h @@ -1,26 +1,26 @@ -#pragma once - -#include <cstddef> - -namespace NCompactTrie { +#pragma once + +#include <cstddef> + +namespace NCompactTrie { class TNode; - + class TWriteableNode { public: const char* LeafPos; size_t LeafLength; - + size_t ForwardOffset; size_t LeftOffset; size_t RightOffset; char Label; - + TWriteableNode(); TWriteableNode(const TNode& node, const char* data); - + // When you call this, the offsets should be relative to the end of the node. Use NPOS to indicate an absent offset. size_t Pack(char* buffer) const; size_t Measure() const; }; - -} + +} diff --git a/library/cpp/containers/comptrie/ya.make b/library/cpp/containers/comptrie/ya.make index b24b9787c5..81352da4b2 100644 --- a/library/cpp/containers/comptrie/ya.make +++ b/library/cpp/containers/comptrie/ya.make @@ -1,28 +1,28 @@ LIBRARY() -OWNER(velavokr) +OWNER(velavokr) SRCS( array_with_size.h chunked_helpers_trie.h comptrie.h - comptrie_packer.h + comptrie_packer.h comptrie_trie.h - first_symbol_iterator.h - key_selector.h - leaf_skipper.h + first_symbol_iterator.h + key_selector.h + leaf_skipper.h set.h comptrie.cpp - comptrie_builder.cpp + comptrie_builder.cpp comptrie_impl.cpp - make_fast_layout.cpp - minimize.cpp - node.cpp - opaque_trie_iterator.cpp - prefix_iterator.cpp - search_iterator.cpp - write_trie_backwards.cpp - writeable_node.cpp + make_fast_layout.cpp + minimize.cpp + node.cpp + opaque_trie_iterator.cpp + prefix_iterator.cpp + search_iterator.cpp + write_trie_backwards.cpp + writeable_node.cpp ) PEERDIR( diff --git a/library/cpp/lfalloc/lf_allocX64.h b/library/cpp/lfalloc/lf_allocX64.h index 8ae2c58adc..fd2a906d6f 100644 --- a/library/cpp/lfalloc/lf_allocX64.h +++ b/library/cpp/lfalloc/lf_allocX64.h @@ -301,12 +301,12 @@ enum EMMapMode { MM_HUGE // memory for large allocs }; -#ifndef _MSC_VER +#ifndef _MSC_VER inline void VerifyMmapResult(void* result) { if (Y_UNLIKELY(result == MAP_FAILED)) NMalloc::AbortFromCorruptedAllocator("negative size requested? or just out of mem"); } -#endif +#endif #if !defined(_MSC_VER) && !defined(_freebsd_) && defined(_64_) static char* AllocWithMMapLinuxImpl(uintptr_t sz, EMMapMode mode) { diff --git a/library/cpp/packers/packers.h b/library/cpp/packers/packers.h index 3ce454ba28..1bde1b59aa 100644 --- a/library/cpp/packers/packers.h +++ b/library/cpp/packers/packers.h @@ -32,23 +32,23 @@ public: template <typename T> class TAsIsPacker { // this packer is not really a packer... -public: +public: void UnpackLeaf(const char* p, T& t) const { memcpy(&t, p, sizeof(T)); - } - void PackLeaf(char* buffer, const T& data, size_t computedSize) const { + } + void PackLeaf(char* buffer, const T& data, size_t computedSize) const { Y_ASSERT(computedSize == sizeof(data)); memcpy(buffer, &data, sizeof(T)); - } - size_t MeasureLeaf(const T& data) const { + } + size_t MeasureLeaf(const T& data) const { Y_UNUSED(data); return sizeof(T); - } + } size_t SkipLeaf(const char*) const { - return sizeof(T); - } -}; - + return sizeof(T); + } +}; + // Implementation namespace NPackers { @@ -91,9 +91,9 @@ namespace NPackers { return NImpl::TConvertImpl<T, std::is_signed<T>::value>::Convert(data); } - //--------------------------------- - // TIntegralPacker --- for integral types. - + //--------------------------------- + // TIntegralPacker --- for integral types. + template <class T> class TIntegralPacker { // can pack only integral types <= ui64 public: @@ -196,7 +196,7 @@ namespace NPackers { return TIntegralPacker<ui64>().SkipLeaf(p); } - //------------------------------------------- + //------------------------------------------- // TFPPacker --- for float/double namespace NImpl { template <class TFloat, class TUInt> @@ -250,7 +250,7 @@ namespace NPackers { //------------------------------------------- // TStringPacker --- for TString/TUtf16String and TStringBuf. - + template <class TStringType> class TStringPacker { public: @@ -486,92 +486,92 @@ namespace NPackers { return size1 + size2; } - //------------------------------------------------------------------------------------------ + //------------------------------------------------------------------------------------------ // Packer for fixed-size arrays, i.e. for std::array. - // Saves memory by not storing anything about their size. - // SkipLeaf skips every value, so can be slow for big arrays. + // Saves memory by not storing anything about their size. + // SkipLeaf skips every value, so can be slow for big arrays. // Requires std::tuple_size<TValue>, TValue::operator[] and possibly TValue::value_type. - template <class TValue, class TElementPacker = TPacker<typename TValue::value_type>> - class TArrayPacker { - public: - using TElemPacker = TElementPacker; - - enum { + template <class TValue, class TElementPacker = TPacker<typename TValue::value_type>> + class TArrayPacker { + public: + using TElemPacker = TElementPacker; + + enum { Size = std::tuple_size<TValue>::value - }; - - void UnpackLeaf(const char* p, TValue& t) const { - const char* buf = p; - for (size_t i = 0; i < Size; ++i) { - TElemPacker().UnpackLeaf(buf, t[i]); - buf += TElemPacker().SkipLeaf(buf); - } - } - - void PackLeaf(char* buffer, const TValue& data, size_t computedSize) const { - size_t remainingSize = computedSize; - char* pos = buffer; - for (size_t i = 0; i < Size; ++i) { - const size_t elemSize = TElemPacker().MeasureLeaf(data[i]); - TElemPacker().PackLeaf(pos, data[i], Min(elemSize, remainingSize)); - pos += elemSize; - remainingSize -= elemSize; - } - } - - size_t MeasureLeaf(const TValue& data) const { - size_t result = 0; - for (size_t i = 0; i < Size; ++i) { - result += TElemPacker().MeasureLeaf(data[i]); - } - return result; - } - - size_t SkipLeaf(const char* p) const // this function better be fast because it is very frequently used - { - const char* buf = p; - for (size_t i = 0; i < Size; ++i) { - buf += TElemPacker().SkipLeaf(buf); - } - return buf - p; - } - }; - - //------------------------------------ - // TPacker --- the generic packer. - - template <class T, bool IsIntegral> - class TPackerImpl; + }; + + void UnpackLeaf(const char* p, TValue& t) const { + const char* buf = p; + for (size_t i = 0; i < Size; ++i) { + TElemPacker().UnpackLeaf(buf, t[i]); + buf += TElemPacker().SkipLeaf(buf); + } + } + + void PackLeaf(char* buffer, const TValue& data, size_t computedSize) const { + size_t remainingSize = computedSize; + char* pos = buffer; + for (size_t i = 0; i < Size; ++i) { + const size_t elemSize = TElemPacker().MeasureLeaf(data[i]); + TElemPacker().PackLeaf(pos, data[i], Min(elemSize, remainingSize)); + pos += elemSize; + remainingSize -= elemSize; + } + } + + size_t MeasureLeaf(const TValue& data) const { + size_t result = 0; + for (size_t i = 0; i < Size; ++i) { + result += TElemPacker().MeasureLeaf(data[i]); + } + return result; + } + + size_t SkipLeaf(const char* p) const // this function better be fast because it is very frequently used + { + const char* buf = p; + for (size_t i = 0; i < Size; ++i) { + buf += TElemPacker().SkipLeaf(buf); + } + return buf - p; + } + }; + + //------------------------------------ + // TPacker --- the generic packer. + + template <class T, bool IsIntegral> + class TPackerImpl; template <class T> class TPackerImpl<T, true>: public TIntegralPacker<T> { - }; - // No implementation for non-integral types. + }; + // No implementation for non-integral types. template <class T> class TPacker: public TPackerImpl<T, std::is_integral<T>::value> { - }; + }; template <> class TPacker<float>: public TAsIsPacker<float> { - }; + }; template <> class TPacker<double>: public TAsIsPacker<double> { - }; + }; - template <> + template <> class TPacker<TString>: public TStringPacker<TString> { - }; - - template <> + }; + + template <> class TPacker<TUtf16String>: public TStringPacker<TUtf16String> { }; template <> class TPacker<TStringBuf>: public TStringPacker<TStringBuf> { - }; - + }; + template <> class TPacker<TWtringBuf>: public TStringPacker<TWtringBuf> { }; diff --git a/tools/ya.make b/tools/ya.make index 103f083736..51a6b8b426 100644 --- a/tools/ya.make +++ b/tools/ya.make @@ -5,7 +5,7 @@ RECURSE( archiver/alignment_test archiver/tests base64 - bigram_compiler + bigram_compiler blender bmdump bstr diff --git a/util/datetime/cputimer.cpp b/util/datetime/cputimer.cpp index 846f63ee18..516d372c37 100644 --- a/util/datetime/cputimer.cpp +++ b/util/datetime/cputimer.cpp @@ -19,15 +19,15 @@ TTimer::TTimer(const TStringBuf message) { static const int SMALL_DURATION_CHAR_LENGTH = 9; // strlen("0.123456s") Message_.Reserve(message.length() + SMALL_DURATION_CHAR_LENGTH + 1); // +"\n" - Message_ << message; - // Do not measure the allocations above. + Message_ << message; + // Do not measure the allocations above. Start_ = TInstant::Now(); } TTimer::~TTimer() { - const TDuration duration = TInstant::Now() - Start_; - Message_ << duration << "\n"; - Cerr << Message_.Str(); + const TDuration duration = TInstant::Now() - Start_; + Message_ << duration << "\n"; + Cerr << Message_.Str(); } static ui64 ManuallySetCyclesPerSecond = 0; diff --git a/util/datetime/cputimer.h b/util/datetime/cputimer.h index 1d39d74190..7d38d5bdb3 100644 --- a/util/datetime/cputimer.h +++ b/util/datetime/cputimer.h @@ -4,12 +4,12 @@ #include <util/system/rusage.h> #include <util/generic/string.h> -#include <util/stream/str.h> +#include <util/stream/str.h> class TTimer { private: TInstant Start_; - TStringStream Message_; + TStringStream Message_; public: TTimer(const TStringBuf message = TStringBuf(" took: ")); diff --git a/util/generic/strbase.h b/util/generic/strbase.h index 67e9773f24..ab39fc7537 100644 --- a/util/generic/strbase.h +++ b/util/generic/strbase.h @@ -530,8 +530,8 @@ public: inline size_t find_last_of(const TStringView set, size_t pos = npos) const noexcept { return find_last_of(set.data(), pos, set.length()); - } - + } + inline size_t find_last_of(const TCharType* set, size_t pos, size_t n) const noexcept { return AsStringView().find_last_of(set, pos, n); } diff --git a/util/string/cast.h b/util/string/cast.h index 6fcdb5676f..90e925c194 100644 --- a/util/string/cast.h +++ b/util/string/cast.h @@ -104,13 +104,13 @@ inline TString ToString(char* s) { } /* - * Wrapper for wide strings. - */ + * Wrapper for wide strings. + */ template <class T> inline TUtf16String ToWtring(const T& t) { return TUtf16String::FromAscii(ToString(t)); -} - +} + inline const TUtf16String& ToWtring(const TUtf16String& w) { return w; } @@ -122,7 +122,7 @@ inline const TUtf16String& ToWtring(TUtf16String& w) { struct TFromStringException: public TBadCastException { }; -/* +/* * specialized for: * bool * short diff --git a/util/string/util.cpp b/util/string/util.cpp index faf105b280..b14f20bf75 100644 --- a/util/string/util.cpp +++ b/util/string/util.cpp @@ -51,22 +51,22 @@ Tr::Tr(const char* from, const char* to) { size_t Tr::FindFirstChangePosition(const TString& str) const { for (auto it = str.begin(); it != str.end(); ++it) { - if (ConvertChar(*it) != *it) { - return it - str.begin(); - } - } + if (ConvertChar(*it) != *it) { + return it - str.begin(); + } + } return TString::npos; -} - +} + void Tr::Do(TString& str) const { - const size_t changePosition = FindFirstChangePosition(str); + const size_t changePosition = FindFirstChangePosition(str); if (changePosition == TString::npos) { - return; + return; } for (auto it = str.begin() + changePosition; it != str.end(); ++it) { - *it = ConvertChar(*it); + *it = ConvertChar(*it); } } diff --git a/util/string/util.h b/util/string/util.h index 2ecafeaae3..0d77a5042b 100644 --- a/util/string/util.h +++ b/util/string/util.h @@ -152,32 +152,32 @@ protected: }; // an analogue of tr/$from/$to/ -class Tr { -public: +class Tr { +public: Tr(const char* from, const char* to); - - char ConvertChar(char ch) const { - return Map[(ui8)ch]; - } - + + char ConvertChar(char ch) const { + return Map[(ui8)ch]; + } + void Do(char* s) const { for (; *s; s++) - *s = ConvertChar(*s); + *s = ConvertChar(*s); } void Do(const char* src, char* dst) const { for (; *src; src++) - *dst++ = ConvertChar(*src); + *dst++ = ConvertChar(*src); *dst = 0; } void Do(char* s, size_t l) const { for (size_t i = 0; i < l && s[i]; i++) - s[i] = ConvertChar(s[i]); + s[i] = ConvertChar(s[i]); } void Do(TString& str) const; - -private: - char Map[256]; - + +private: + char Map[256]; + size_t FindFirstChangePosition(const TString& str) const; }; diff --git a/util/system/getpid.cpp b/util/system/getpid.cpp index fc4f2e7427..b9615f0dfa 100644 --- a/util/system/getpid.cpp +++ b/util/system/getpid.cpp @@ -1,6 +1,6 @@ -#include "getpid.h" - -#ifdef _win_ +#include "getpid.h" + +#ifdef _win_ // The include file should be Windows.h for Windows <=7, Processthreadsapi.h for Windows >=8 and Server 2012, // see http://msdn.microsoft.com/en-us/library/windows/desktop/ms683180%28v=vs.85%29.aspx // The way to determine windows version is described in http://msdn.microsoft.com/en-us/library/windows/desktop/aa383745%28v=vs.85%29.aspx @@ -13,11 +13,11 @@ #include <sys/types.h> #include <unistd.h> #endif - + TProcessId GetPID() { -#ifdef _win_ - return GetCurrentProcessId(); -#else - return getpid(); -#endif -} +#ifdef _win_ + return GetCurrentProcessId(); +#else + return getpid(); +#endif +} diff --git a/util/system/getpid.h b/util/system/getpid.h index 906d21f3ed..60e243266f 100644 --- a/util/system/getpid.h +++ b/util/system/getpid.h @@ -1,8 +1,8 @@ -#pragma once - +#pragma once + #include "platform.h" #include "types.h" - + #if defined(_win_) using TProcessId = ui32; // DWORD #else diff --git a/util/system/getpid_ut.cpp b/util/system/getpid_ut.cpp index 149e7f2082..e7122a2971 100644 --- a/util/system/getpid_ut.cpp +++ b/util/system/getpid_ut.cpp @@ -1,19 +1,19 @@ -#include "getpid.h" - +#include "getpid.h" + #include <library/cpp/testing/unittest/registar.h> - -class TGetPidTest: public TTestBase { - UNIT_TEST_SUITE(TGetPidTest); - UNIT_TEST(Test); - UNIT_TEST_SUITE_END(); - -public: - void Test(); -}; - -UNIT_TEST_SUITE_REGISTRATION(TGetPidTest); - -void TGetPidTest::Test() { + +class TGetPidTest: public TTestBase { + UNIT_TEST_SUITE(TGetPidTest); + UNIT_TEST(Test); + UNIT_TEST_SUITE_END(); + +public: + void Test(); +}; + +UNIT_TEST_SUITE_REGISTRATION(TGetPidTest); + +void TGetPidTest::Test() { const TProcessId pid = GetPID(); - UNIT_ASSERT(pid != 0); -} + UNIT_ASSERT(pid != 0); +} diff --git a/util/thread/pool.cpp b/util/thread/pool.cpp index 10b74efb80..05fad02e9b 100644 --- a/util/thread/pool.cpp +++ b/util/thread/pool.cpp @@ -592,7 +592,7 @@ size_t TAdaptiveThreadPool::Size() const noexcept { void TAdaptiveThreadPool::SetMaxIdleTime(TDuration interval) { Y_ENSURE_EX(Impl_.Get(), TThreadPoolException() << TStringBuf("mtp queue not started")); - Impl_->SetMaxIdleTime(interval); + Impl_->SetMaxIdleTime(interval); } TSimpleThreadPool::~TSimpleThreadPool() { @@ -623,7 +623,7 @@ void TSimpleThreadPool::Start(size_t thrnum, size_t maxque) { tmp->Start(thrnum, maxque); if (adaptive) { - adaptive->SetMaxIdleTime(TDuration::Seconds(100)); + adaptive->SetMaxIdleTime(TDuration::Seconds(100)); } Slave_.Swap(tmp); diff --git a/util/thread/pool.h b/util/thread/pool.h index 9f92eec8da..d1ea3a67cb 100644 --- a/util/thread/pool.h +++ b/util/thread/pool.h @@ -10,8 +10,8 @@ #include <util/generic/noncopyable.h> #include <functional> -class TDuration; - +class TDuration; + struct IObjectInQueue { virtual ~IObjectInQueue() = default; |