aboutsummaryrefslogtreecommitdiffstats
path: root/contrib/python/matplotlib-inline/matplotlib_inline/config.py
diff options
context:
space:
mode:
authordeminds <deminds@yandex-team.com>2023-09-29 14:40:48 +0300
committerdeminds <deminds@yandex-team.com>2023-09-29 15:12:14 +0300
commita543cac8b62dfcdad87f08d2080b78eb1e3a8e00 (patch)
tree16fced5d95a218cccccee2bad49f03218b40ca9c /contrib/python/matplotlib-inline/matplotlib_inline/config.py
parent958dddf914ab334e16df1d0f5d12c0eec7b96943 (diff)
downloadydb-a543cac8b62dfcdad87f08d2080b78eb1e3a8e00.tar.gz
KIKIMR-17061: limit allowed symbols for names in scheme
Previously, we used C function isalnum to check if the symbols in every path part of tables (and, I guess, other objects too) in schemes were alphanumeric. It depends on the global C locale of the process YDB is run in. YDB does not set the global locale for itself currently, so it should be "C", if nobody called setlocale yet (this assumption seems to be true for our code as of now). In theory we could have left the code as it was (technically I use the same "C" locale, but explicitly) and promise to ourselves that we will never ever call setlocale (or call std::locale::global) to set global locale to anything different from "C". However, I see a couple of reasons to push this change: 1) It is undefined behaviour to call C function isalnum with signed chars (x86, I believe, has signed chars) with values not representable as unsigned chars or EOF (C++ variant does not suffer from this issue). See: a) this note from cppreference: [https://en.cppreference.com/w/cpp/string/byte/isalnum] Notes Like all other functions from <cctype>, the behavior of std::isalnum is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char: bool my_isalnum(char ch) { return std::isalnum(static_cast<unsigned char>(ch)); } b) and this note from Microsoft which is a bit more specific in terms of what does "representable as unsigned char" mean exactly: [https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isalnum-iswalnum-isalnum-l-iswalnum-l?view=msvc-170] The behavior of isalnum and _isalnum_l is undefined if c isn't EOF or in the range 0 through 0xFF, inclusive. When a debug CRT library is used and c isn't one of these values, the functions raise an assertion. 2) Even if we do control the code of YDB, it is still slightly better to explicitly set the expected locale. The reader can reason about the behaviour of such explicit function faster, without checking if somebody calls setlocale elsewhere. As you will see in the code, I propose to use std::isalnum from C++ instead of the plain isalnum from C with specific hardcoded locale ("C"). One thing I need to mention about the unit tests is that the old implementation of PathPartBrokenAt does pass them successfully. It happens because isalnum uses global C locale, not the global C++ locale (which theoretically might be different from the global C locale, see std::locale::global implementation for details) and I can't create mock C locale, we can choose only from the locales installed in the system. We can reasonable expect that there is some utf-8 locale installed in the system. I hope "C.UTF-8" is installed on every machine on which we would like these unit tests to work correctly. However, there is no such char value (and the old implementation classifies symbols char by char, not as code points) that is classified differently by isalnum in "C" locale and isalnum in the utf-8 locale. I could use KOI8-R (or some other 1 byte encoding that has different isalnum behaviour than the "C" locale) in tests to break the old implementation, but we can't rely on it being installed on every testing machine.
Diffstat (limited to 'contrib/python/matplotlib-inline/matplotlib_inline/config.py')
0 files changed, 0 insertions, 0 deletions