diff options
| author | vitalyisaev <[email protected]> | 2023-11-14 09:58:56 +0300 |
|---|---|---|
| committer | vitalyisaev <[email protected]> | 2023-11-14 10:20:20 +0300 |
| commit | c2b2dfd9827a400a8495e172a56343462e3ceb82 (patch) | |
| tree | cd4e4f597d01bede4c82dffeb2d780d0a9046bd0 /contrib/python/zstandard | |
| parent | d4ae8f119e67808cb0cf776ba6e0cf95296f2df7 (diff) | |
YQ Connector: move tests from yql to ydb (OSS)
Перенос папки с тестами на Коннектор из папки yql в папку ydb (синхронизируется с github).
Diffstat (limited to 'contrib/python/zstandard')
51 files changed, 23016 insertions, 0 deletions
diff --git a/contrib/python/zstandard/py2/.dist-info/METADATA b/contrib/python/zstandard/py2/.dist-info/METADATA new file mode 100644 index 00000000000..1c38fe3512d --- /dev/null +++ b/contrib/python/zstandard/py2/.dist-info/METADATA @@ -0,0 +1,1637 @@ +Metadata-Version: 2.1 +Name: zstandard +Version: 0.14.1 +Summary: Zstandard bindings for Python +Home-page: https://github.com/indygreg/python-zstandard +Author: Gregory Szorc +Author-email: [email protected] +License: BSD +Keywords: zstandard zstd compression +Platform: UNKNOWN +Classifier: Development Status :: 4 - Beta +Classifier: Intended Audience :: Developers +Classifier: License :: OSI Approved :: BSD License +Classifier: Programming Language :: C +Classifier: Programming Language :: Python :: 2.7 +Classifier: Programming Language :: Python :: 3.5 +Classifier: Programming Language :: Python :: 3.6 +Classifier: Programming Language :: Python :: 3.7 +Classifier: Programming Language :: Python :: 3.8 + +================ +python-zstandard +================ + +This project provides Python bindings for interfacing with the +`Zstandard <http://www.zstd.net>`_ compression library. A C extension +and CFFI interface are provided. + +The primary goal of the project is to provide a rich interface to the +underlying C API through a Pythonic interface while not sacrificing +performance. This means exposing most of the features and flexibility +of the C API while not sacrificing usability or safety that Python provides. + +The canonical home for this project lives in a Mercurial repository run by +the author. For convenience, that repository is frequently synchronized to +https://github.com/indygreg/python-zstandard. + +| |ci-status| + +Requirements +============ + +This extension is designed to run with Python 2.7, 3.5, 3.6, 3.7, and 3.8 +on common platforms (Linux, Windows, and OS X). On PyPy (both PyPy2 and PyPy3) we support version 6.0.0 and above. +x86 and x86_64 are well-tested on Windows. Only x86_64 is well-tested on Linux and macOS. + +Installing +========== + +This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. +So, to install this package:: + + $ pip install zstandard + +Binary wheels are made available for some platforms. If you need to +install from a source distribution, all you should need is a working C +compiler and the Python development headers/libraries. On many Linux +distributions, you can install a ``python-dev`` or ``python-devel`` +package to provide these dependencies. + +Packages are also uploaded to Anaconda Cloud at +https://anaconda.org/indygreg/zstandard. See that URL for how to install +this package with ``conda``. + +Legacy Format Support +===================== + +To enable legacy zstd format support which is needed to handle files compressed +with zstd < 1.0 you need to provide an installation option:: + + $ pip install zstandard --install-option="--legacy" + +and since pip 7.0 it is possible to have the following line in your +requirements.txt:: + + zstandard --install-option="--legacy" + +Performance +=========== + +zstandard is a highly tunable compression algorithm. In its default settings +(compression level 3), it will be faster at compression and decompression and +will have better compression ratios than zlib on most data sets. When tuned +for speed, it approaches lz4's speed and ratios. When tuned for compression +ratio, it approaches lzma ratios and compression speed, but decompression +speed is much faster. See the official zstandard documentation for more. + +zstandard and this library support multi-threaded compression. There is a +mechanism to compress large inputs using multiple threads. + +The performance of this library is usually very similar to what the zstandard +C API can deliver. Overhead in this library is due to general Python overhead +and can't easily be avoided by *any* zstandard Python binding. This library +exposes multiple APIs for performing compression and decompression so callers +can pick an API suitable for their need. Contrast with the compression +modules in Python's standard library (like ``zlib``), which only offer limited +mechanisms for performing operations. The API flexibility means consumers can +choose to use APIs that facilitate zero copying or minimize Python object +creation and garbage collection overhead. + +This library is capable of single-threaded throughputs well over 1 GB/s. For +exact numbers, measure yourself. The source code repository has a ``bench.py`` +script that can be used to measure things. + +API +=== + +To interface with Zstandard, simply import the ``zstandard`` module:: + + import zstandard + +It is a popular convention to alias the module as a different name for +brevity:: + + import zstandard as zstd + +This module attempts to import and use either the C extension or CFFI +implementation. On Python platforms known to support C extensions (like +CPython), it raises an ImportError if the C extension cannot be imported. +On Python platforms known to not support C extensions (like PyPy), it only +attempts to import the CFFI implementation and raises ImportError if that +can't be done. On other platforms, it first tries to import the C extension +then falls back to CFFI if that fails and raises ImportError if CFFI fails. + +To change the module import behavior, a ``PYTHON_ZSTANDARD_IMPORT_POLICY`` +environment variable can be set. The following values are accepted: + +default + The behavior described above. +cffi_fallback + Always try to import the C extension then fall back to CFFI if that + fails. +cext + Only attempt to import the C extension. +cffi + Only attempt to import the CFFI implementation. + +In addition, the ``zstandard`` module exports a ``backend`` attribute +containing the string name of the backend being used. It will be one +of ``cext`` or ``cffi`` (for *C extension* and *cffi*, respectively). + +The types, functions, and attributes exposed by the ``zstandard`` module +are documented in the sections below. + +.. note:: + + The documentation in this section makes references to various zstd + concepts and functionality. The source repository contains a + ``docs/concepts.rst`` file explaining these in more detail. + +ZstdCompressor +-------------- + +The ``ZstdCompressor`` class provides an interface for performing +compression operations. Each instance is essentially a wrapper around a +``ZSTD_CCtx`` from the C API. + +Each instance is associated with parameters that control compression +behavior. These come from the following named arguments (all optional): + +level + Integer compression level. Valid values are between 1 and 22. +dict_data + Compression dictionary to use. + + Note: When using dictionary data and ``compress()`` is called multiple + times, the ``ZstdCompressionParameters`` derived from an integer + compression ``level`` and the first compressed data's size will be reused + for all subsequent operations. This may not be desirable if source data + size varies significantly. +compression_params + A ``ZstdCompressionParameters`` instance defining compression settings. +write_checksum + Whether a 4 byte checksum should be written with the compressed data. + Defaults to False. If True, the decompressor can verify that decompressed + data matches the original input data. +write_content_size + Whether the size of the uncompressed data will be written into the + header of compressed data. Defaults to True. The data will only be + written if the compressor knows the size of the input data. This is + often not true for streaming compression. +write_dict_id + Whether to write the dictionary ID into the compressed data. + Defaults to True. The dictionary ID is only written if a dictionary + is being used. +threads + Enables and sets the number of threads to use for multi-threaded compression + operations. Defaults to 0, which means to use single-threaded compression. + Negative values will resolve to the number of logical CPUs in the system. + Read below for more info on multi-threaded compression. This argument only + controls thread count for operations that operate on individual pieces of + data. APIs that spawn multiple threads for working on multiple pieces of + data have their own ``threads`` argument. + +``compression_params`` is mutually exclusive with ``level``, ``write_checksum``, +``write_content_size``, ``write_dict_id``, and ``threads``. + +Unless specified otherwise, assume that no two methods of ``ZstdCompressor`` +instances can be called from multiple Python threads simultaneously. In other +words, assume instances are not thread safe unless stated otherwise. + +Utility Methods +^^^^^^^^^^^^^^^ + +``frame_progression()`` returns a 3-tuple containing the number of bytes +ingested, consumed, and produced by the current compression operation. + +``memory_size()`` obtains the memory utilization of the underlying zstd +compression context, in bytes.:: + + cctx = zstd.ZstdCompressor() + memory = cctx.memory_size() + +Simple API +^^^^^^^^^^ + +``compress(data)`` compresses and returns data as a one-shot operation.:: + + cctx = zstd.ZstdCompressor() + compressed = cctx.compress(b'data to compress') + +The ``data`` argument can be any object that implements the *buffer protocol*. + +Stream Reader API +^^^^^^^^^^^^^^^^^ + +``stream_reader(source)`` can be used to obtain an object conforming to the +``io.RawIOBase`` interface for reading compressed output as a stream:: + + with open(path, 'rb') as fh: + cctx = zstd.ZstdCompressor() + reader = cctx.stream_reader(fh) + while True: + chunk = reader.read(16384) + if not chunk: + break + + # Do something with compressed chunk. + +Instances can also be used as context managers:: + + with open(path, 'rb') as fh: + with cctx.stream_reader(fh) as reader: + while True: + chunk = reader.read(16384) + if not chunk: + break + + # Do something with compressed chunk. + +When the context manager exits or ``close()`` is called, the stream is closed, +underlying resources are released, and future operations against the compression +stream will fail. + +The ``source`` argument to ``stream_reader()`` can be any object with a +``read(size)`` method or any object implementing the *buffer protocol*. + +``stream_reader()`` accepts a ``size`` argument specifying how large the input +stream is. This is used to adjust compression parameters so they are +tailored to the source size.:: + + with open(path, 'rb') as fh: + cctx = zstd.ZstdCompressor() + with cctx.stream_reader(fh, size=os.stat(path).st_size) as reader: + ... + +If the ``source`` is a stream, you can specify how large ``read()`` requests +to that stream should be via the ``read_size`` argument. It defaults to +``zstandard.COMPRESSION_RECOMMENDED_INPUT_SIZE``.:: + + with open(path, 'rb') as fh: + cctx = zstd.ZstdCompressor() + # Will perform fh.read(8192) when obtaining data to feed into the + # compressor. + with cctx.stream_reader(fh, read_size=8192) as reader: + ... + +The stream returned by ``stream_reader()`` is neither writable nor seekable +(even if the underlying source is seekable). ``readline()`` and +``readlines()`` are not implemented because they don't make sense for +compressed data. ``tell()`` returns the number of compressed bytes +emitted so far. + +Streaming Input API +^^^^^^^^^^^^^^^^^^^ + +``stream_writer(fh)`` allows you to *stream* data into a compressor. + +Returned instances implement the ``io.RawIOBase`` interface. Only methods +that involve writing will do useful things. + +The argument to ``stream_writer()`` must have a ``write(data)`` method. As +compressed data is available, ``write()`` will be called with the compressed +data as its argument. Many common Python types implement ``write()``, including +open file handles and ``io.BytesIO``. + +The ``write(data)`` method is used to feed data into the compressor. + +The ``flush([flush_mode=FLUSH_BLOCK])`` method can be called to evict whatever +data remains within the compressor's internal state into the output object. This +may result in 0 or more ``write()`` calls to the output object. This method +accepts an optional ``flush_mode`` argument to control the flushing behavior. +Its value can be any of the ``FLUSH_*`` constants. + +Both ``write()`` and ``flush()`` return the number of bytes written to the +object's ``write()``. In many cases, small inputs do not accumulate enough +data to cause a write and ``write()`` will return ``0``. + +Calling ``close()`` will mark the stream as closed and subsequent I/O +operations will raise ``ValueError`` (per the documented behavior of +``io.RawIOBase``). ``close()`` will also call ``close()`` on the underlying +stream if such a method exists. + +Typically usage is as follows:: + + cctx = zstd.ZstdCompressor(level=10) + compressor = cctx.stream_writer(fh) + + compressor.write(b'chunk 0\n') + compressor.write(b'chunk 1\n') + compressor.flush() + # Receiver will be able to decode ``chunk 0\nchunk 1\n`` at this point. + # Receiver is also expecting more data in the zstd *frame*. + + compressor.write(b'chunk 2\n') + compressor.flush(zstd.FLUSH_FRAME) + # Receiver will be able to decode ``chunk 0\nchunk 1\nchunk 2``. + # Receiver is expecting no more data, as the zstd frame is closed. + # Any future calls to ``write()`` at this point will construct a new + # zstd frame. + +Instances can be used as context managers. Exiting the context manager is +the equivalent of calling ``close()``, which is equivalent to calling +``flush(zstd.FLUSH_FRAME)``:: + + cctx = zstd.ZstdCompressor(level=10) + with cctx.stream_writer(fh) as compressor: + compressor.write(b'chunk 0') + compressor.write(b'chunk 1') + ... + +.. important:: + + If ``flush(FLUSH_FRAME)`` is not called, emitted data doesn't constitute + a full zstd *frame* and consumers of this data may complain about malformed + input. It is recommended to use instances as a context manager to ensure + *frames* are properly finished. + +If the size of the data being fed to this streaming compressor is known, +you can declare it before compression begins:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh, size=data_len) as compressor: + compressor.write(chunk0) + compressor.write(chunk1) + ... + +Declaring the size of the source data allows compression parameters to +be tuned. And if ``write_content_size`` is used, it also results in the +content size being written into the frame header of the output data. + +The size of chunks being ``write()`` to the destination can be specified:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh, write_size=32768) as compressor: + ... + +To see how much memory is being used by the streaming compressor:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh) as compressor: + ... + byte_size = compressor.memory_size() + +Thte total number of bytes written so far are exposed via ``tell()``:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh) as compressor: + ... + total_written = compressor.tell() + +``stream_writer()`` accepts a ``write_return_read`` boolean argument to control +the return value of ``write()``. When ``False`` (the default), ``write()`` returns +the number of bytes that were ``write()``en to the underlying object. When +``True``, ``write()`` returns the number of bytes read from the input that +were subsequently written to the compressor. ``True`` is the *proper* behavior +for ``write()`` as specified by the ``io.RawIOBase`` interface and will become +the default value in a future release. + +Streaming Output API +^^^^^^^^^^^^^^^^^^^^ + +``read_to_iter(reader)`` provides a mechanism to stream data out of a +compressor as an iterator of data chunks.:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_to_iter(fh): + # Do something with emitted data. + +``read_to_iter()`` accepts an object that has a ``read(size)`` method or +conforms to the buffer protocol. + +Uncompressed data is fetched from the source either by calling ``read(size)`` +or by fetching a slice of data from the object directly (in the case where +the buffer protocol is being used). The returned iterator consists of chunks +of compressed data. + +If reading from the source via ``read()``, ``read()`` will be called until +it raises or returns an empty bytes (``b''``). It is perfectly valid for +the source to deliver fewer bytes than were what requested by ``read(size)``. + +Like ``stream_writer()``, ``read_to_iter()`` also accepts a ``size`` argument +declaring the size of the input stream:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_to_iter(fh, size=some_int): + pass + +You can also control the size that data is ``read()`` from the source and +the ideal size of output chunks:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_to_iter(fh, read_size=16384, write_size=8192): + pass + +Unlike ``stream_writer()``, ``read_to_iter()`` does not give direct control +over the sizes of chunks fed into the compressor. Instead, chunk sizes will +be whatever the object being read from delivers. These will often be of a +uniform size. + +Stream Copying API +^^^^^^^^^^^^^^^^^^ + +``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while +compressing it.:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh) + +For example, say you wish to compress a file:: + + cctx = zstd.ZstdCompressor() + with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: + cctx.copy_stream(ifh, ofh) + +It is also possible to declare the size of the source stream:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh, size=len_of_input) + +You can also specify how large the chunks that are ``read()`` and ``write()`` +from and to the streams:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384) + +The stream copier returns a 2-tuple of bytes read and written:: + + cctx = zstd.ZstdCompressor() + read_count, write_count = cctx.copy_stream(ifh, ofh) + +Compressor API +^^^^^^^^^^^^^^ + +``compressobj()`` returns an object that exposes ``compress(data)`` and +``flush()`` methods. Each returns compressed data or an empty bytes. + +The purpose of ``compressobj()`` is to provide an API-compatible interface +with ``zlib.compressobj``, ``bz2.BZ2Compressor``, etc. This allows callers to +swap in different compressor objects while using the same API. + +``flush()`` accepts an optional argument indicating how to end the stream. +``zstd.COMPRESSOBJ_FLUSH_FINISH`` (the default) ends the compression stream. +Once this type of flush is performed, ``compress()`` and ``flush()`` can +no longer be called. This type of flush **must** be called to end the +compression context. If not called, returned data may be incomplete. + +A ``zstd.COMPRESSOBJ_FLUSH_BLOCK`` argument to ``flush()`` will flush a +zstd block. Flushes of this type can be performed multiple times. The next +call to ``compress()`` will begin a new zstd block. + +Here is how this API should be used:: + + cctx = zstd.ZstdCompressor() + cobj = cctx.compressobj() + data = cobj.compress(b'raw input 0') + data = cobj.compress(b'raw input 1') + data = cobj.flush() + +Or to flush blocks:: + + cctx.zstd.ZstdCompressor() + cobj = cctx.compressobj() + data = cobj.compress(b'chunk in first block') + data = cobj.flush(zstd.COMPRESSOBJ_FLUSH_BLOCK) + data = cobj.compress(b'chunk in second block') + data = cobj.flush() + +For best performance results, keep input chunks under 256KB. This avoids +extra allocations for a large output object. + +It is possible to declare the input size of the data that will be fed into +the compressor:: + + cctx = zstd.ZstdCompressor() + cobj = cctx.compressobj(size=6) + data = cobj.compress(b'foobar') + data = cobj.flush() + +Chunker API +^^^^^^^^^^^ + +``chunker(size=None, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE)`` returns +an object that can be used to iteratively feed chunks of data into a compressor +and produce output chunks of a uniform size. + +The object returned by ``chunker()`` exposes the following methods: + +``compress(data)`` + Feeds new input data into the compressor. + +``flush()`` + Flushes all data currently in the compressor. + +``finish()`` + Signals the end of input data. No new data can be compressed after this + method is called. + +``compress()``, ``flush()``, and ``finish()`` all return an iterator of +``bytes`` instances holding compressed data. The iterator may be empty. Callers +MUST iterate through all elements of the returned iterator before performing +another operation on the object. + +All chunks emitted by ``compress()`` will have a length of ``chunk_size``. + +``flush()`` and ``finish()`` may return a final chunk smaller than +``chunk_size``. + +Here is how the API should be used:: + + cctx = zstd.ZstdCompressor() + chunker = cctx.chunker(chunk_size=32768) + + with open(path, 'rb') as fh: + while True: + in_chunk = fh.read(32768) + if not in_chunk: + break + + for out_chunk in chunker.compress(in_chunk): + # Do something with output chunk of size 32768. + + for out_chunk in chunker.finish(): + # Do something with output chunks that finalize the zstd frame. + +The ``chunker()`` API is often a better alternative to ``compressobj()``. + +``compressobj()`` will emit output data as it is available. This results in a +*stream* of output chunks of varying sizes. The consistency of the output chunk +size with ``chunker()`` is more appropriate for many usages, such as sending +compressed data to a socket. + +``compressobj()`` may also perform extra memory reallocations in order to +dynamically adjust the sizes of the output chunks. Since ``chunker()`` output +chunks are all the same size (except for flushed or final chunks), there is +less memory allocation overhead. + +Batch Compression API +^^^^^^^^^^^^^^^^^^^^^ + +(Experimental. Not yet supported in CFFI bindings.) + +``multi_compress_to_buffer(data, [threads=0])`` performs compression of multiple +inputs as a single operation. + +Data to be compressed can be passed as a ``BufferWithSegmentsCollection``, a +``BufferWithSegments``, or a list containing byte like objects. Each element of +the container will be compressed individually using the configured parameters +on the ``ZstdCompressor`` instance. + +The ``threads`` argument controls how many threads to use for compression. The +default is ``0`` which means to use a single thread. Negative values use the +number of logical CPUs in the machine. + +The function returns a ``BufferWithSegmentsCollection``. This type represents +N discrete memory allocations, eaching holding 1 or more compressed frames. + +Output data is written to shared memory buffers. This means that unlike +regular Python objects, a reference to *any* object within the collection +keeps the shared buffer and therefore memory backing it alive. This can have +undesirable effects on process memory usage. + +The API and behavior of this function is experimental and will likely change. +Known deficiencies include: + +* If asked to use multiple threads, it will always spawn that many threads, + even if the input is too small to use them. It should automatically lower + the thread count when the extra threads would just add overhead. +* The buffer allocation strategy is fixed. There is room to make it dynamic, + perhaps even to allow one output buffer per input, facilitating a variation + of the API to return a list without the adverse effects of shared memory + buffers. + +ZstdDecompressor +---------------- + +The ``ZstdDecompressor`` class provides an interface for performing +decompression. It is effectively a wrapper around the ``ZSTD_DCtx`` type from +the C API. + +Each instance is associated with parameters that control decompression. These +come from the following named arguments (all optional): + +dict_data + Compression dictionary to use. +max_window_size + Sets an uppet limit on the window size for decompression operations in + kibibytes. This setting can be used to prevent large memory allocations + for inputs using large compression windows. +format + Set the format of data for the decoder. By default, this is + ``zstd.FORMAT_ZSTD1``. It can be set to ``zstd.FORMAT_ZSTD1_MAGICLESS`` to + allow decoding frames without the 4 byte magic header. Not all decompression + APIs support this mode. + +The interface of this class is very similar to ``ZstdCompressor`` (by design). + +Unless specified otherwise, assume that no two methods of ``ZstdDecompressor`` +instances can be called from multiple Python threads simultaneously. In other +words, assume instances are not thread safe unless stated otherwise. + +Utility Methods +^^^^^^^^^^^^^^^ + +``memory_size()`` obtains the size of the underlying zstd decompression context, +in bytes.:: + + dctx = zstd.ZstdDecompressor() + size = dctx.memory_size() + +Simple API +^^^^^^^^^^ + +``decompress(data)`` can be used to decompress an entire compressed zstd +frame in a single operation.:: + + dctx = zstd.ZstdDecompressor() + decompressed = dctx.decompress(data) + +By default, ``decompress(data)`` will only work on data written with the content +size encoded in its header (this is the default behavior of +``ZstdCompressor().compress()`` but may not be true for streaming compression). If +compressed data without an embedded content size is seen, ``zstd.ZstdError`` will +be raised. + +If the compressed data doesn't have its content size embedded within it, +decompression can be attempted by specifying the ``max_output_size`` +argument.:: + + dctx = zstd.ZstdDecompressor() + uncompressed = dctx.decompress(data, max_output_size=1048576) + +Ideally, ``max_output_size`` will be identical to the decompressed output +size. + +If ``max_output_size`` is too small to hold the decompressed data, +``zstd.ZstdError`` will be raised. + +If ``max_output_size`` is larger than the decompressed data, the allocated +output buffer will be resized to only use the space required. + +Please note that an allocation of the requested ``max_output_size`` will be +performed every time the method is called. Setting to a very large value could +result in a lot of work for the memory allocator and may result in +``MemoryError`` being raised if the allocation fails. + +.. important:: + + If the exact size of decompressed data is unknown (not passed in explicitly + and not stored in the zstandard frame), for performance reasons it is + encouraged to use a streaming API. + +Stream Reader API +^^^^^^^^^^^^^^^^^ + +``stream_reader(source)`` can be used to obtain an object conforming to the +``io.RawIOBase`` interface for reading decompressed output as a stream:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + reader = dctx.stream_reader(fh) + while True: + chunk = reader.read(16384) + if not chunk: + break + + # Do something with decompressed chunk. + +The stream can also be used as a context manager:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + with dctx.stream_reader(fh) as reader: + ... + +When used as a context manager, the stream is closed and the underlying +resources are released when the context manager exits. Future operations against +the stream will fail. + +The ``source`` argument to ``stream_reader()`` can be any object with a +``read(size)`` method or any object implementing the *buffer protocol*. + +If the ``source`` is a stream, you can specify how large ``read()`` requests +to that stream should be via the ``read_size`` argument. It defaults to +``zstandard.DECOMPRESSION_RECOMMENDED_INPUT_SIZE``.:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + # Will perform fh.read(8192) when obtaining data for the decompressor. + with dctx.stream_reader(fh, read_size=8192) as reader: + ... + +The stream returned by ``stream_reader()`` is not writable. + +The stream returned by ``stream_reader()`` is *partially* seekable. +Absolute and relative positions (``SEEK_SET`` and ``SEEK_CUR``) forward +of the current position are allowed. Offsets behind the current read +position and offsets relative to the end of stream are not allowed and +will raise ``ValueError`` if attempted. + +``tell()`` returns the number of decompressed bytes read so far. + +Not all I/O methods are implemented. Notably missing is support for +``readline()``, ``readlines()``, and linewise iteration support. This is +because streams operate on binary data - not text data. If you want to +convert decompressed output to text, you can chain an ``io.TextIOWrapper`` +to the stream:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + stream_reader = dctx.stream_reader(fh) + text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8') + + for line in text_stream: + ... + +The ``read_across_frames`` argument to ``stream_reader()`` controls the +behavior of read operations when the end of a zstd *frame* is encountered. +When ``False`` (the default), a read will complete when the end of a +zstd *frame* is encountered. When ``True``, a read can potentially +return data spanning multiple zstd *frames*. + +Streaming Input API +^^^^^^^^^^^^^^^^^^^ + +``stream_writer(fh)`` allows you to *stream* data into a decompressor. + +Returned instances implement the ``io.RawIOBase`` interface. Only methods +that involve writing will do useful things. + +The argument to ``stream_writer()`` is typically an object that also implements +``io.RawIOBase``. But any object with a ``write(data)`` method will work. Many +common Python types conform to this interface, including open file handles +and ``io.BytesIO``. + +Behavior is similar to ``ZstdCompressor.stream_writer()``: compressed data +is sent to the decompressor by calling ``write(data)`` and decompressed +output is written to the underlying stream by calling its ``write(data)`` +method.:: + + dctx = zstd.ZstdDecompressor() + decompressor = dctx.stream_writer(fh) + + decompressor.write(compressed_data) + ... + + +Calls to ``write()`` will return the number of bytes written to the output +object. Not all inputs will result in bytes being written, so return values +of ``0`` are possible. + +Like the ``stream_writer()`` compressor, instances can be used as context +managers. However, context managers add no extra special behavior and offer +little to no benefit to being used. + +Calling ``close()`` will mark the stream as closed and subsequent I/O operations +will raise ``ValueError`` (per the documented behavior of ``io.RawIOBase``). +``close()`` will also call ``close()`` on the underlying stream if such a +method exists. + +The size of chunks being ``write()`` to the destination can be specified:: + + dctx = zstd.ZstdDecompressor() + with dctx.stream_writer(fh, write_size=16384) as decompressor: + pass + +You can see how much memory is being used by the decompressor:: + + dctx = zstd.ZstdDecompressor() + with dctx.stream_writer(fh) as decompressor: + byte_size = decompressor.memory_size() + +``stream_writer()`` accepts a ``write_return_read`` boolean argument to control +the return value of ``write()``. When ``False`` (the default)``, ``write()`` +returns the number of bytes that were ``write()``en to the underlying stream. +When ``True``, ``write()`` returns the number of bytes read from the input. +``True`` is the *proper* behavior for ``write()`` as specified by the +``io.RawIOBase`` interface and will become the default in a future release. + +Streaming Output API +^^^^^^^^^^^^^^^^^^^^ + +``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a +compressed source as an iterator of data chunks.:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_to_iter(fh): + # Do something with original data. + +``read_to_iter()`` accepts an object with a ``read(size)`` method that will +return compressed bytes or an object conforming to the buffer protocol that +can expose its data as a contiguous range of bytes. + +``read_to_iter()`` returns an iterator whose elements are chunks of the +decompressed data. + +The size of requested ``read()`` from the source can be specified:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_to_iter(fh, read_size=16384): + pass + +It is also possible to skip leading bytes in the input data:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_to_iter(fh, skip_bytes=1): + pass + +.. tip:: + + Skipping leading bytes is useful if the source data contains extra + *header* data. Traditionally, you would need to create a slice or + ``memoryview`` of the data you want to decompress. This would create + overhead. It is more efficient to pass the offset into this API. + +Similarly to ``ZstdCompressor.read_to_iter()``, the consumer of the iterator +controls when data is decompressed. If the iterator isn't consumed, +decompression is put on hold. + +When ``read_to_iter()`` is passed an object conforming to the buffer protocol, +the behavior may seem similar to what occurs when the simple decompression +API is used. However, this API works when the decompressed size is unknown. +Furthermore, if feeding large inputs, the decompressor will work in chunks +instead of performing a single operation. + +Stream Copying API +^^^^^^^^^^^^^^^^^^ + +``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while +performing decompression.:: + + dctx = zstd.ZstdDecompressor() + dctx.copy_stream(ifh, ofh) + +e.g. to decompress a file to another file:: + + dctx = zstd.ZstdDecompressor() + with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: + dctx.copy_stream(ifh, ofh) + +The size of chunks being ``read()`` and ``write()`` from and to the streams +can be specified:: + + dctx = zstd.ZstdDecompressor() + dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384) + +Decompressor API +^^^^^^^^^^^^^^^^ + +``decompressobj()`` returns an object that exposes a ``decompress(data)`` +method. Compressed data chunks are fed into ``decompress(data)`` and +uncompressed output (or an empty bytes) is returned. Output from subsequent +calls needs to be concatenated to reassemble the full decompressed byte +sequence. + +The purpose of ``decompressobj()`` is to provide an API-compatible interface +with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers +to swap in different decompressor objects while using the same API. + +Each object is single use: once an input frame is decoded, ``decompress()`` +can no longer be called. + +Here is how this API should be used:: + + dctx = zstd.ZstdDecompressor() + dobj = dctx.decompressobj() + data = dobj.decompress(compressed_chunk_0) + data = dobj.decompress(compressed_chunk_1) + +By default, calls to ``decompress()`` write output data in chunks of size +``DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE``. These chunks are concatenated +before being returned to the caller. It is possible to define the size of +these temporary chunks by passing ``write_size`` to ``decompressobj()``:: + + dctx = zstd.ZstdDecompressor() + dobj = dctx.decompressobj(write_size=1048576) + +.. note:: + + Because calls to ``decompress()`` may need to perform multiple + memory (re)allocations, this streaming decompression API isn't as + efficient as other APIs. + +For compatibility with the standard library APIs, instances expose a +``flush([length=None])`` method. This method no-ops and has no meaningful +side-effects, making it safe to call any time. + +Batch Decompression API +^^^^^^^^^^^^^^^^^^^^^^^ + +(Experimental. Not yet supported in CFFI bindings.) + +``multi_decompress_to_buffer()`` performs decompression of multiple +frames as a single operation and returns a ``BufferWithSegmentsCollection`` +containing decompressed data for all inputs. + +Compressed frames can be passed to the function as a ``BufferWithSegments``, +a ``BufferWithSegmentsCollection``, or as a list containing objects that +conform to the buffer protocol. For best performance, pass a +``BufferWithSegmentsCollection`` or a ``BufferWithSegments``, as +minimal input validation will be done for that type. If calling from +Python (as opposed to C), constructing one of these instances may add +overhead cancelling out the performance overhead of validation for list +inputs.:: + + dctx = zstd.ZstdDecompressor() + results = dctx.multi_decompress_to_buffer([b'...', b'...']) + +The decompressed size of each frame MUST be discoverable. It can either be +embedded within the zstd frame (``write_content_size=True`` argument to +``ZstdCompressor``) or passed in via the ``decompressed_sizes`` argument. + +The ``decompressed_sizes`` argument is an object conforming to the buffer +protocol which holds an array of 64-bit unsigned integers in the machine's +native format defining the decompressed sizes of each frame. If this argument +is passed, it avoids having to scan each frame for its decompressed size. +This frame scanning can add noticeable overhead in some scenarios.:: + + frames = [...] + sizes = struct.pack('=QQQQ', len0, len1, len2, len3) + + dctx = zstd.ZstdDecompressor() + results = dctx.multi_decompress_to_buffer(frames, decompressed_sizes=sizes) + +The ``threads`` argument controls the number of threads to use to perform +decompression operations. The default (``0``) or the value ``1`` means to +use a single thread. Negative values use the number of logical CPUs in the +machine. + +.. note:: + + It is possible to pass a ``mmap.mmap()`` instance into this function by + wrapping it with a ``BufferWithSegments`` instance (which will define the + offsets of frames within the memory mapped region). + +This function is logically equivalent to performing ``dctx.decompress()`` +on each input frame and returning the result. + +This function exists to perform decompression on multiple frames as fast +as possible by having as little overhead as possible. Since decompression is +performed as a single operation and since the decompressed output is stored in +a single buffer, extra memory allocations, Python objects, and Python function +calls are avoided. This is ideal for scenarios where callers know up front that +they need to access data for multiple frames, such as when *delta chains* are +being used. + +Currently, the implementation always spawns multiple threads when requested, +even if the amount of work to do is small. In the future, it will be smarter +about avoiding threads and their associated overhead when the amount of +work to do is small. + +Prefix Dictionary Chain Decompression +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``decompress_content_dict_chain(frames)`` performs decompression of a list of +zstd frames produced using chained *prefix* dictionary compression. Such +a list of frames is produced by compressing discrete inputs where each +non-initial input is compressed with a *prefix* dictionary consisting of the +content of the previous input. + +For example, say you have the following inputs:: + + inputs = [b'input 1', b'input 2', b'input 3'] + +The zstd frame chain consists of: + +1. ``b'input 1'`` compressed in standalone/discrete mode +2. ``b'input 2'`` compressed using ``b'input 1'`` as a *prefix* dictionary +3. ``b'input 3'`` compressed using ``b'input 2'`` as a *prefix* dictionary + +Each zstd frame **must** have the content size written. + +The following Python code can be used to produce a *prefix dictionary chain*:: + + def make_chain(inputs): + frames = [] + + # First frame is compressed in standalone/discrete mode. + zctx = zstd.ZstdCompressor() + frames.append(zctx.compress(inputs[0])) + + # Subsequent frames use the previous fulltext as a prefix dictionary + for i, raw in enumerate(inputs[1:]): + dict_data = zstd.ZstdCompressionDict( + inputs[i], dict_type=zstd.DICT_TYPE_RAWCONTENT) + zctx = zstd.ZstdCompressor(dict_data=dict_data) + frames.append(zctx.compress(raw)) + + return frames + +``decompress_content_dict_chain()`` returns the uncompressed data of the last +element in the input chain. + + +.. note:: + + It is possible to implement *prefix dictionary chain* decompression + on top of other APIs. However, this function will likely be faster - + especially for long input chains - as it avoids the overhead of instantiating + and passing around intermediate objects between C and Python. + +Multi-Threaded Compression +-------------------------- + +``ZstdCompressor`` accepts a ``threads`` argument that controls the number +of threads to use for compression. The way this works is that input is split +into segments and each segment is fed into a worker pool for compression. Once +a segment is compressed, it is flushed/appended to the output. + +.. note:: + + These threads are created at the C layer and are not Python threads. So they + work outside the GIL. It is therefore possible to CPU saturate multiple cores + from Python. + +The segment size for multi-threaded compression is chosen from the window size +of the compressor. This is derived from the ``window_log`` attribute of a +``ZstdCompressionParameters`` instance. By default, segment sizes are in the 1+MB +range. + +If multi-threaded compression is requested and the input is smaller than the +configured segment size, only a single compression thread will be used. If the +input is smaller than the segment size multiplied by the thread pool size or +if data cannot be delivered to the compressor fast enough, not all requested +compressor threads may be active simultaneously. + +Compared to non-multi-threaded compression, multi-threaded compression has +higher per-operation overhead. This includes extra memory operations, +thread creation, lock acquisition, etc. + +Due to the nature of multi-threaded compression using *N* compression +*states*, the output from multi-threaded compression will likely be larger +than non-multi-threaded compression. The difference is usually small. But +there is a CPU/wall time versus size trade off that may warrant investigation. + +Output from multi-threaded compression does not require any special handling +on the decompression side. To the decompressor, data generated with single +threaded compressor looks the same as data generated by a multi-threaded +compressor and does not require any special handling or additional resource +requirements. + +Dictionary Creation and Management +---------------------------------- + +Compression dictionaries are represented with the ``ZstdCompressionDict`` type. + +Instances can be constructed from bytes:: + + dict_data = zstd.ZstdCompressionDict(data) + +It is possible to construct a dictionary from *any* data. If the data doesn't +begin with a magic header, it will be treated as a *prefix* dictionary. +*Prefix* dictionaries allow compression operations to reference raw data +within the dictionary. + +It is possible to force the use of *prefix* dictionaries or to require a +dictionary header: + + dict_data = zstd.ZstdCompressionDict(data, + dict_type=zstd.DICT_TYPE_RAWCONTENT) + + dict_data = zstd.ZstdCompressionDict(data, + dict_type=zstd.DICT_TYPE_FULLDICT) + +You can see how many bytes are in the dictionary by calling ``len()``:: + + dict_data = zstd.train_dictionary(size, samples) + dict_size = len(dict_data) # will not be larger than ``size`` + +Once you have a dictionary, you can pass it to the objects performing +compression and decompression:: + + dict_data = zstd.train_dictionary(131072, samples) + + cctx = zstd.ZstdCompressor(dict_data=dict_data) + for source_data in input_data: + compressed = cctx.compress(source_data) + # Do something with compressed data. + + dctx = zstd.ZstdDecompressor(dict_data=dict_data) + for compressed_data in input_data: + buffer = io.BytesIO() + with dctx.stream_writer(buffer) as decompressor: + decompressor.write(compressed_data) + # Do something with raw data in ``buffer``. + +Dictionaries have unique integer IDs. You can retrieve this ID via:: + + dict_id = zstd.dictionary_id(dict_data) + +You can obtain the raw data in the dict (useful for persisting and constructing +a ``ZstdCompressionDict`` later) via ``as_bytes()``:: + + dict_data = zstd.train_dictionary(size, samples) + raw_data = dict_data.as_bytes() + +By default, when a ``ZstdCompressionDict`` is *attached* to a +``ZstdCompressor``, each ``ZstdCompressor`` performs work to prepare the +dictionary for use. This is fine if only 1 compression operation is being +performed or if the ``ZstdCompressor`` is being reused for multiple operations. +But if multiple ``ZstdCompressor`` instances are being used with the dictionary, +this can add overhead. + +It is possible to *precompute* the dictionary so it can readily be consumed +by multiple ``ZstdCompressor`` instances:: + + d = zstd.ZstdCompressionDict(data) + + # Precompute for compression level 3. + d.precompute_compress(level=3) + + # Precompute with specific compression parameters. + params = zstd.ZstdCompressionParameters(...) + d.precompute_compress(compression_params=params) + +.. note:: + + When a dictionary is precomputed, the compression parameters used to + precompute the dictionary overwrite some of the compression parameters + specified to ``ZstdCompressor.__init__``. + +Training Dictionaries +^^^^^^^^^^^^^^^^^^^^^ + +Unless using *prefix* dictionaries, dictionary data is produced by *training* +on existing data:: + + dict_data = zstd.train_dictionary(size, samples) + +This takes a target dictionary size and list of bytes instances and creates and +returns a ``ZstdCompressionDict``. + +The dictionary training mechanism is known as *cover*. More details about it are +available in the paper *Effective Construction of Relative Lempel-Ziv +Dictionaries* (authors: Liao, Petri, Moffat, Wirth). + +The cover algorithm takes parameters ``k` and ``d``. These are the +*segment size* and *dmer size*, respectively. The returned dictionary +instance created by this function has ``k`` and ``d`` attributes +containing the values for these parameters. If a ``ZstdCompressionDict`` +is constructed from raw bytes data (a content-only dictionary), the +``k`` and ``d`` attributes will be ``0``. + +The segment and dmer size parameters to the cover algorithm can either be +specified manually or ``train_dictionary()`` can try multiple values +and pick the best one, where *best* means the smallest compressed data size. +This later mode is called *optimization* mode. + +If none of ``k``, ``d``, ``steps``, ``threads``, ``level``, ``notifications``, +or ``dict_id`` (basically anything from the underlying ``ZDICT_cover_params_t`` +struct) are defined, *optimization* mode is used with default parameter +values. + +If ``steps`` or ``threads`` are defined, then *optimization* mode is engaged +with explicit control over those parameters. Specifying ``threads=0`` or +``threads=1`` can be used to engage *optimization* mode if other parameters +are not defined. + +Otherwise, non-*optimization* mode is used with the parameters specified. + +This function takes the following arguments: + +dict_size + Target size in bytes of the dictionary to generate. +samples + A list of bytes holding samples the dictionary will be trained from. +k + Parameter to cover algorithm defining the segment size. A reasonable range + is [16, 2048+]. +d + Parameter to cover algorithm defining the dmer size. A reasonable range is + [6, 16]. ``d`` must be less than or equal to ``k``. +dict_id + Integer dictionary ID for the produced dictionary. Default is 0, which uses + a random value. +steps + Number of steps through ``k`` values to perform when trying parameter + variations. +threads + Number of threads to use when trying parameter variations. Default is 0, + which means to use a single thread. A negative value can be specified to + use as many threads as there are detected logical CPUs. +level + Integer target compression level when trying parameter variations. +notifications + Controls writing of informational messages to ``stderr``. ``0`` (the + default) means to write nothing. ``1`` writes errors. ``2`` writes + progression info. ``3`` writes more details. And ``4`` writes all info. + +Explicit Compression Parameters +------------------------------- + +Zstandard offers a high-level *compression level* that maps to lower-level +compression parameters. For many consumers, this numeric level is the only +compression setting you'll need to touch. + +But for advanced use cases, it might be desirable to tweak these lower-level +settings. + +The ``ZstdCompressionParameters`` type represents these low-level compression +settings. + +Instances of this type can be constructed from a myriad of keyword arguments +(defined below) for complete low-level control over each adjustable +compression setting. + +From a higher level, one can construct a ``ZstdCompressionParameters`` instance +given a desired compression level and target input and dictionary size +using ``ZstdCompressionParameters.from_level()``. e.g.:: + + # Derive compression settings for compression level 7. + params = zstd.ZstdCompressionParameters.from_level(7) + + # With an input size of 1MB + params = zstd.ZstdCompressionParameters.from_level(7, source_size=1048576) + +Using ``from_level()``, it is also possible to override individual compression +parameters or to define additional settings that aren't automatically derived. +e.g.:: + + params = zstd.ZstdCompressionParameters.from_level(4, window_log=10) + params = zstd.ZstdCompressionParameters.from_level(5, threads=4) + +Or you can define low-level compression settings directly:: + + params = zstd.ZstdCompressionParameters(window_log=12, enable_ldm=True) + +Once a ``ZstdCompressionParameters`` instance is obtained, it can be used to +configure a compressor:: + + cctx = zstd.ZstdCompressor(compression_params=params) + +The named arguments and attributes of ``ZstdCompressionParameters`` are as +follows: + +* format +* compression_level +* window_log +* hash_log +* chain_log +* search_log +* min_match +* target_length +* strategy +* compression_strategy (deprecated: same as ``strategy``) +* write_content_size +* write_checksum +* write_dict_id +* job_size +* overlap_log +* overlap_size_log (deprecated: same as ``overlap_log``) +* force_max_window +* enable_ldm +* ldm_hash_log +* ldm_min_match +* ldm_bucket_size_log +* ldm_hash_rate_log +* ldm_hash_every_log (deprecated: same as ``ldm_hash_rate_log``) +* threads + +Some of these are very low-level settings. It may help to consult the official +zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants +in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h). + +Frame Inspection +---------------- + +Data emitted from zstd compression is encapsulated in a *frame*. This frame +begins with a 4 byte *magic number* header followed by 2 to 14 bytes describing +the frame in more detail. For more info, see +https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md. + +``zstd.get_frame_parameters(data)`` parses a zstd *frame* header from a bytes +instance and return a ``FrameParameters`` object describing the frame. + +Depending on which fields are present in the frame and their values, the +length of the frame parameters varies. If insufficient bytes are passed +in to fully parse the frame parameters, ``ZstdError`` is raised. To ensure +frame parameters can be parsed, pass in at least 18 bytes. + +``FrameParameters`` instances have the following attributes: + +content_size + Integer size of original, uncompressed content. This will be ``0`` if the + original content size isn't written to the frame (controlled with the + ``write_content_size`` argument to ``ZstdCompressor``) or if the input + content size was ``0``. + +window_size + Integer size of maximum back-reference distance in compressed data. + +dict_id + Integer of dictionary ID used for compression. ``0`` if no dictionary + ID was used or if the dictionary ID was ``0``. + +has_checksum + Bool indicating whether a 4 byte content checksum is stored at the end + of the frame. + +``zstd.frame_header_size(data)`` returns the size of the zstandard frame +header. + +``zstd.frame_content_size(data)`` returns the content size as parsed from +the frame header. ``-1`` means the content size is unknown. ``0`` means +an empty frame. The content size is usually correct. However, it may not +be accurate. + +Misc Functionality +------------------ + +estimate_decompression_context_size() +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Estimate the memory size requirements for a decompressor instance. + +Constants +--------- + +The following module constants/attributes are exposed: + +ZSTD_VERSION + This module attribute exposes a 3-tuple of the Zstandard version. e.g. + ``(1, 0, 0)`` +MAX_COMPRESSION_LEVEL + Integer max compression level accepted by compression functions +COMPRESSION_RECOMMENDED_INPUT_SIZE + Recommended chunk size to feed to compressor functions +COMPRESSION_RECOMMENDED_OUTPUT_SIZE + Recommended chunk size for compression output +DECOMPRESSION_RECOMMENDED_INPUT_SIZE + Recommended chunk size to feed into decompresor functions +DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE + Recommended chunk size for decompression output + +FRAME_HEADER + bytes containing header of the Zstandard frame +MAGIC_NUMBER + Frame header as an integer + +FLUSH_BLOCK + Flushing behavior that denotes to flush a zstd block. A decompressor will + be able to decode all data fed into the compressor so far. +FLUSH_FRAME + Flushing behavior that denotes to end a zstd frame. Any new data fed + to the compressor will start a new frame. + +CONTENTSIZE_UNKNOWN + Value for content size when the content size is unknown. +CONTENTSIZE_ERROR + Value for content size when content size couldn't be determined. + +WINDOWLOG_MIN + Minimum value for compression parameter +WINDOWLOG_MAX + Maximum value for compression parameter +CHAINLOG_MIN + Minimum value for compression parameter +CHAINLOG_MAX + Maximum value for compression parameter +HASHLOG_MIN + Minimum value for compression parameter +HASHLOG_MAX + Maximum value for compression parameter +SEARCHLOG_MIN + Minimum value for compression parameter +SEARCHLOG_MAX + Maximum value for compression parameter +MINMATCH_MIN + Minimum value for compression parameter +MINMATCH_MAX + Maximum value for compression parameter +SEARCHLENGTH_MIN + Minimum value for compression parameter + + Deprecated: use ``MINMATCH_MIN`` +SEARCHLENGTH_MAX + Maximum value for compression parameter + + Deprecated: use ``MINMATCH_MAX`` +TARGETLENGTH_MIN + Minimum value for compression parameter +STRATEGY_FAST + Compression strategy +STRATEGY_DFAST + Compression strategy +STRATEGY_GREEDY + Compression strategy +STRATEGY_LAZY + Compression strategy +STRATEGY_LAZY2 + Compression strategy +STRATEGY_BTLAZY2 + Compression strategy +STRATEGY_BTOPT + Compression strategy +STRATEGY_BTULTRA + Compression strategy +STRATEGY_BTULTRA2 + Compression strategy + +FORMAT_ZSTD1 + Zstandard frame format +FORMAT_ZSTD1_MAGICLESS + Zstandard frame format without magic header + +Performance Considerations +-------------------------- + +The ``ZstdCompressor`` and ``ZstdDecompressor`` types maintain state to a +persistent compression or decompression *context*. Reusing a ``ZstdCompressor`` +or ``ZstdDecompressor`` instance for multiple operations is faster than +instantiating a new ``ZstdCompressor`` or ``ZstdDecompressor`` for each +operation. The differences are magnified as the size of data decreases. For +example, the difference between *context* reuse and non-reuse for 100,000 +100 byte inputs will be significant (possiby over 10x faster to reuse contexts) +whereas 10 100,000,000 byte inputs will be more similar in speed (because the +time spent doing compression dwarfs time spent creating new *contexts*). + +Buffer Types +------------ + +The API exposes a handful of custom types for interfacing with memory buffers. +The primary goal of these types is to facilitate efficient multi-object +operations. + +The essential idea is to have a single memory allocation provide backing +storage for multiple logical objects. This has 2 main advantages: fewer +allocations and optimal memory access patterns. This avoids having to allocate +a Python object for each logical object and furthermore ensures that access of +data for objects can be sequential (read: fast) in memory. + +BufferWithSegments +^^^^^^^^^^^^^^^^^^ + +The ``BufferWithSegments`` type represents a memory buffer containing N +discrete items of known lengths (segments). It is essentially a fixed size +memory address and an array of 2-tuples of ``(offset, length)`` 64-bit +unsigned native endian integers defining the byte offset and length of each +segment within the buffer. + +Instances behave like containers. + +``len()`` returns the number of segments within the instance. + +``o[index]`` or ``__getitem__`` obtains a ``BufferSegment`` representing an +individual segment within the backing buffer. That returned object references +(not copies) memory. This means that iterating all objects doesn't copy +data within the buffer. + +The ``.size`` attribute contains the total size in bytes of the backing +buffer. + +Instances conform to the buffer protocol. So a reference to the backing bytes +can be obtained via ``memoryview(o)``. A *copy* of the backing bytes can also +be obtained via ``.tobytes()``. + +The ``.segments`` attribute exposes the array of ``(offset, length)`` for +segments within the buffer. It is a ``BufferSegments`` type. + +BufferSegment +^^^^^^^^^^^^^ + +The ``BufferSegment`` type represents a segment within a ``BufferWithSegments``. +It is essentially a reference to N bytes within a ``BufferWithSegments``. + +``len()`` returns the length of the segment in bytes. + +``.offset`` contains the byte offset of this segment within its parent +``BufferWithSegments`` instance. + +The object conforms to the buffer protocol. ``.tobytes()`` can be called to +obtain a ``bytes`` instance with a copy of the backing bytes. + +BufferSegments +^^^^^^^^^^^^^^ + +This type represents an array of ``(offset, length)`` integers defining segments +within a ``BufferWithSegments``. + +The array members are 64-bit unsigned integers using host/native bit order. + +Instances conform to the buffer protocol. + +BufferWithSegmentsCollection +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``BufferWithSegmentsCollection`` type represents a virtual spanning view +of multiple ``BufferWithSegments`` instances. + +Instances are constructed from 1 or more ``BufferWithSegments`` instances. The +resulting object behaves like an ordered sequence whose members are the +segments within each ``BufferWithSegments``. + +``len()`` returns the number of segments within all ``BufferWithSegments`` +instances. + +``o[index]`` and ``__getitem__(index)`` return the ``BufferSegment`` at +that offset as if all ``BufferWithSegments`` instances were a single +entity. + +If the object is composed of 2 ``BufferWithSegments`` instances with the +first having 2 segments and the second have 3 segments, then ``b[0]`` +and ``b[1]`` access segments in the first object and ``b[2]``, ``b[3]``, +and ``b[4]`` access segments from the second. + +Choosing an API +=============== + +There are multiple APIs for performing compression and decompression. This is +because different applications have different needs and the library wants to +facilitate optimal use in as many use cases as possible. + +From a high-level, APIs are divided into *one-shot* and *streaming*: either you +are operating on all data at once or you operate on it piecemeal. + +The *one-shot* APIs are useful for small data, where the input or output +size is known. (The size can come from a buffer length, file size, or +stored in the zstd frame header.) A limitation of the *one-shot* APIs is that +input and output must fit in memory simultaneously. For say a 4 GB input, +this is often not feasible. + +The *one-shot* APIs also perform all work as a single operation. So, if you +feed it large input, it could take a long time for the function to return. + +The streaming APIs do not have the limitations of the simple API. But the +price you pay for this flexibility is that they are more complex than a +single function call. + +The streaming APIs put the caller in control of compression and decompression +behavior by allowing them to directly control either the input or output side +of the operation. + +With the *streaming input*, *compressor*, and *decompressor* APIs, the caller +has full control over the input to the compression or decompression stream. +They can directly choose when new data is operated on. + +With the *streaming ouput* APIs, the caller has full control over the output +of the compression or decompression stream. It can choose when to receive +new data. + +When using the *streaming* APIs that operate on file-like or stream objects, +it is important to consider what happens in that object when I/O is requested. +There is potential for long pauses as data is read or written from the +underlying stream (say from interacting with a filesystem or network). This +could add considerable overhead. + +Thread Safety +============= + +``ZstdCompressor`` and ``ZstdDecompressor`` instances have no guarantees +about thread safety. Do not operate on the same ``ZstdCompressor`` and +``ZstdDecompressor`` instance simultaneously from different threads. It is +fine to have different threads call into a single instance, just not at the +same time. + +Some operations require multiple function calls to complete. e.g. streaming +operations. A single ``ZstdCompressor`` or ``ZstdDecompressor`` cannot be used +for simultaneously active operations. e.g. you must not start a streaming +operation when another streaming operation is already active. + +The C extension releases the GIL during non-trivial calls into the zstd C +API. Non-trivial calls are notably compression and decompression. Trivial +calls are things like parsing frame parameters. Where the GIL is released +is considered an implementation detail and can change in any release. + +APIs that accept bytes-like objects don't enforce that the underlying object +is read-only. However, it is assumed that the passed object is read-only for +the duration of the function call. It is possible to pass a mutable object +(like a ``bytearray``) to e.g. ``ZstdCompressor.compress()``, have the GIL +released, and mutate the object from another thread. Such a race condition +is a bug in the consumer of python-zstandard. Most Python data types are +immutable, so unless you are doing something fancy, you don't need to +worry about this. + +Note on Zstandard's *Experimental* API +====================================== + +Many of the Zstandard APIs used by this module are marked as *experimental* +within the Zstandard project. + +It is unclear how Zstandard's C API will evolve over time, especially with +regards to this *experimental* functionality. We will try to maintain +backwards compatibility at the Python API level. However, we cannot +guarantee this for things not under our control. + +Since a copy of the Zstandard source code is distributed with this +module and since we compile against it, the behavior of a specific +version of this module should be constant for all of time. So if you +pin the version of this module used in your projects (which is a Python +best practice), you should be shielded from unwanted future changes. + +Donate +====== + +A lot of time has been invested into this project by the author. + +If you find this project useful and would like to thank the author for +their work, consider donating some money. Any amount is appreciated. + +.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif + :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted + :alt: Donate via PayPal + +.. |ci-status| image:: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master + :target: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master + + diff --git a/contrib/python/zstandard/py2/.dist-info/top_level.txt b/contrib/python/zstandard/py2/.dist-info/top_level.txt new file mode 100644 index 00000000000..8ed261e4995 --- /dev/null +++ b/contrib/python/zstandard/py2/.dist-info/top_level.txt @@ -0,0 +1,2 @@ +zstandard +zstd diff --git a/contrib/python/zstandard/py2/LICENSE b/contrib/python/zstandard/py2/LICENSE new file mode 100644 index 00000000000..dcec4760996 --- /dev/null +++ b/contrib/python/zstandard/py2/LICENSE @@ -0,0 +1,27 @@ +Copyright (c) 2016, Gregory Szorc +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, +are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this +list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, +this list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors +may be used to endorse or promote products derived from this software without +specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR +ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/contrib/python/zstandard/py2/README.rst b/contrib/python/zstandard/py2/README.rst new file mode 100644 index 00000000000..52bcf32503e --- /dev/null +++ b/contrib/python/zstandard/py2/README.rst @@ -0,0 +1,1615 @@ +================ +python-zstandard +================ + +This project provides Python bindings for interfacing with the +`Zstandard <http://www.zstd.net>`_ compression library. A C extension +and CFFI interface are provided. + +The primary goal of the project is to provide a rich interface to the +underlying C API through a Pythonic interface while not sacrificing +performance. This means exposing most of the features and flexibility +of the C API while not sacrificing usability or safety that Python provides. + +The canonical home for this project lives in a Mercurial repository run by +the author. For convenience, that repository is frequently synchronized to +https://github.com/indygreg/python-zstandard. + +| |ci-status| + +Requirements +============ + +This extension is designed to run with Python 2.7, 3.5, 3.6, 3.7, and 3.8 +on common platforms (Linux, Windows, and OS X). On PyPy (both PyPy2 and PyPy3) we support version 6.0.0 and above. +x86 and x86_64 are well-tested on Windows. Only x86_64 is well-tested on Linux and macOS. + +Installing +========== + +This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. +So, to install this package:: + + $ pip install zstandard + +Binary wheels are made available for some platforms. If you need to +install from a source distribution, all you should need is a working C +compiler and the Python development headers/libraries. On many Linux +distributions, you can install a ``python-dev`` or ``python-devel`` +package to provide these dependencies. + +Packages are also uploaded to Anaconda Cloud at +https://anaconda.org/indygreg/zstandard. See that URL for how to install +this package with ``conda``. + +Legacy Format Support +===================== + +To enable legacy zstd format support which is needed to handle files compressed +with zstd < 1.0 you need to provide an installation option:: + + $ pip install zstandard --install-option="--legacy" + +and since pip 7.0 it is possible to have the following line in your +requirements.txt:: + + zstandard --install-option="--legacy" + +Performance +=========== + +zstandard is a highly tunable compression algorithm. In its default settings +(compression level 3), it will be faster at compression and decompression and +will have better compression ratios than zlib on most data sets. When tuned +for speed, it approaches lz4's speed and ratios. When tuned for compression +ratio, it approaches lzma ratios and compression speed, but decompression +speed is much faster. See the official zstandard documentation for more. + +zstandard and this library support multi-threaded compression. There is a +mechanism to compress large inputs using multiple threads. + +The performance of this library is usually very similar to what the zstandard +C API can deliver. Overhead in this library is due to general Python overhead +and can't easily be avoided by *any* zstandard Python binding. This library +exposes multiple APIs for performing compression and decompression so callers +can pick an API suitable for their need. Contrast with the compression +modules in Python's standard library (like ``zlib``), which only offer limited +mechanisms for performing operations. The API flexibility means consumers can +choose to use APIs that facilitate zero copying or minimize Python object +creation and garbage collection overhead. + +This library is capable of single-threaded throughputs well over 1 GB/s. For +exact numbers, measure yourself. The source code repository has a ``bench.py`` +script that can be used to measure things. + +API +=== + +To interface with Zstandard, simply import the ``zstandard`` module:: + + import zstandard + +It is a popular convention to alias the module as a different name for +brevity:: + + import zstandard as zstd + +This module attempts to import and use either the C extension or CFFI +implementation. On Python platforms known to support C extensions (like +CPython), it raises an ImportError if the C extension cannot be imported. +On Python platforms known to not support C extensions (like PyPy), it only +attempts to import the CFFI implementation and raises ImportError if that +can't be done. On other platforms, it first tries to import the C extension +then falls back to CFFI if that fails and raises ImportError if CFFI fails. + +To change the module import behavior, a ``PYTHON_ZSTANDARD_IMPORT_POLICY`` +environment variable can be set. The following values are accepted: + +default + The behavior described above. +cffi_fallback + Always try to import the C extension then fall back to CFFI if that + fails. +cext + Only attempt to import the C extension. +cffi + Only attempt to import the CFFI implementation. + +In addition, the ``zstandard`` module exports a ``backend`` attribute +containing the string name of the backend being used. It will be one +of ``cext`` or ``cffi`` (for *C extension* and *cffi*, respectively). + +The types, functions, and attributes exposed by the ``zstandard`` module +are documented in the sections below. + +.. note:: + + The documentation in this section makes references to various zstd + concepts and functionality. The source repository contains a + ``docs/concepts.rst`` file explaining these in more detail. + +ZstdCompressor +-------------- + +The ``ZstdCompressor`` class provides an interface for performing +compression operations. Each instance is essentially a wrapper around a +``ZSTD_CCtx`` from the C API. + +Each instance is associated with parameters that control compression +behavior. These come from the following named arguments (all optional): + +level + Integer compression level. Valid values are between 1 and 22. +dict_data + Compression dictionary to use. + + Note: When using dictionary data and ``compress()`` is called multiple + times, the ``ZstdCompressionParameters`` derived from an integer + compression ``level`` and the first compressed data's size will be reused + for all subsequent operations. This may not be desirable if source data + size varies significantly. +compression_params + A ``ZstdCompressionParameters`` instance defining compression settings. +write_checksum + Whether a 4 byte checksum should be written with the compressed data. + Defaults to False. If True, the decompressor can verify that decompressed + data matches the original input data. +write_content_size + Whether the size of the uncompressed data will be written into the + header of compressed data. Defaults to True. The data will only be + written if the compressor knows the size of the input data. This is + often not true for streaming compression. +write_dict_id + Whether to write the dictionary ID into the compressed data. + Defaults to True. The dictionary ID is only written if a dictionary + is being used. +threads + Enables and sets the number of threads to use for multi-threaded compression + operations. Defaults to 0, which means to use single-threaded compression. + Negative values will resolve to the number of logical CPUs in the system. + Read below for more info on multi-threaded compression. This argument only + controls thread count for operations that operate on individual pieces of + data. APIs that spawn multiple threads for working on multiple pieces of + data have their own ``threads`` argument. + +``compression_params`` is mutually exclusive with ``level``, ``write_checksum``, +``write_content_size``, ``write_dict_id``, and ``threads``. + +Unless specified otherwise, assume that no two methods of ``ZstdCompressor`` +instances can be called from multiple Python threads simultaneously. In other +words, assume instances are not thread safe unless stated otherwise. + +Utility Methods +^^^^^^^^^^^^^^^ + +``frame_progression()`` returns a 3-tuple containing the number of bytes +ingested, consumed, and produced by the current compression operation. + +``memory_size()`` obtains the memory utilization of the underlying zstd +compression context, in bytes.:: + + cctx = zstd.ZstdCompressor() + memory = cctx.memory_size() + +Simple API +^^^^^^^^^^ + +``compress(data)`` compresses and returns data as a one-shot operation.:: + + cctx = zstd.ZstdCompressor() + compressed = cctx.compress(b'data to compress') + +The ``data`` argument can be any object that implements the *buffer protocol*. + +Stream Reader API +^^^^^^^^^^^^^^^^^ + +``stream_reader(source)`` can be used to obtain an object conforming to the +``io.RawIOBase`` interface for reading compressed output as a stream:: + + with open(path, 'rb') as fh: + cctx = zstd.ZstdCompressor() + reader = cctx.stream_reader(fh) + while True: + chunk = reader.read(16384) + if not chunk: + break + + # Do something with compressed chunk. + +Instances can also be used as context managers:: + + with open(path, 'rb') as fh: + with cctx.stream_reader(fh) as reader: + while True: + chunk = reader.read(16384) + if not chunk: + break + + # Do something with compressed chunk. + +When the context manager exits or ``close()`` is called, the stream is closed, +underlying resources are released, and future operations against the compression +stream will fail. + +The ``source`` argument to ``stream_reader()`` can be any object with a +``read(size)`` method or any object implementing the *buffer protocol*. + +``stream_reader()`` accepts a ``size`` argument specifying how large the input +stream is. This is used to adjust compression parameters so they are +tailored to the source size.:: + + with open(path, 'rb') as fh: + cctx = zstd.ZstdCompressor() + with cctx.stream_reader(fh, size=os.stat(path).st_size) as reader: + ... + +If the ``source`` is a stream, you can specify how large ``read()`` requests +to that stream should be via the ``read_size`` argument. It defaults to +``zstandard.COMPRESSION_RECOMMENDED_INPUT_SIZE``.:: + + with open(path, 'rb') as fh: + cctx = zstd.ZstdCompressor() + # Will perform fh.read(8192) when obtaining data to feed into the + # compressor. + with cctx.stream_reader(fh, read_size=8192) as reader: + ... + +The stream returned by ``stream_reader()`` is neither writable nor seekable +(even if the underlying source is seekable). ``readline()`` and +``readlines()`` are not implemented because they don't make sense for +compressed data. ``tell()`` returns the number of compressed bytes +emitted so far. + +Streaming Input API +^^^^^^^^^^^^^^^^^^^ + +``stream_writer(fh)`` allows you to *stream* data into a compressor. + +Returned instances implement the ``io.RawIOBase`` interface. Only methods +that involve writing will do useful things. + +The argument to ``stream_writer()`` must have a ``write(data)`` method. As +compressed data is available, ``write()`` will be called with the compressed +data as its argument. Many common Python types implement ``write()``, including +open file handles and ``io.BytesIO``. + +The ``write(data)`` method is used to feed data into the compressor. + +The ``flush([flush_mode=FLUSH_BLOCK])`` method can be called to evict whatever +data remains within the compressor's internal state into the output object. This +may result in 0 or more ``write()`` calls to the output object. This method +accepts an optional ``flush_mode`` argument to control the flushing behavior. +Its value can be any of the ``FLUSH_*`` constants. + +Both ``write()`` and ``flush()`` return the number of bytes written to the +object's ``write()``. In many cases, small inputs do not accumulate enough +data to cause a write and ``write()`` will return ``0``. + +Calling ``close()`` will mark the stream as closed and subsequent I/O +operations will raise ``ValueError`` (per the documented behavior of +``io.RawIOBase``). ``close()`` will also call ``close()`` on the underlying +stream if such a method exists. + +Typically usage is as follows:: + + cctx = zstd.ZstdCompressor(level=10) + compressor = cctx.stream_writer(fh) + + compressor.write(b'chunk 0\n') + compressor.write(b'chunk 1\n') + compressor.flush() + # Receiver will be able to decode ``chunk 0\nchunk 1\n`` at this point. + # Receiver is also expecting more data in the zstd *frame*. + + compressor.write(b'chunk 2\n') + compressor.flush(zstd.FLUSH_FRAME) + # Receiver will be able to decode ``chunk 0\nchunk 1\nchunk 2``. + # Receiver is expecting no more data, as the zstd frame is closed. + # Any future calls to ``write()`` at this point will construct a new + # zstd frame. + +Instances can be used as context managers. Exiting the context manager is +the equivalent of calling ``close()``, which is equivalent to calling +``flush(zstd.FLUSH_FRAME)``:: + + cctx = zstd.ZstdCompressor(level=10) + with cctx.stream_writer(fh) as compressor: + compressor.write(b'chunk 0') + compressor.write(b'chunk 1') + ... + +.. important:: + + If ``flush(FLUSH_FRAME)`` is not called, emitted data doesn't constitute + a full zstd *frame* and consumers of this data may complain about malformed + input. It is recommended to use instances as a context manager to ensure + *frames* are properly finished. + +If the size of the data being fed to this streaming compressor is known, +you can declare it before compression begins:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh, size=data_len) as compressor: + compressor.write(chunk0) + compressor.write(chunk1) + ... + +Declaring the size of the source data allows compression parameters to +be tuned. And if ``write_content_size`` is used, it also results in the +content size being written into the frame header of the output data. + +The size of chunks being ``write()`` to the destination can be specified:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh, write_size=32768) as compressor: + ... + +To see how much memory is being used by the streaming compressor:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh) as compressor: + ... + byte_size = compressor.memory_size() + +Thte total number of bytes written so far are exposed via ``tell()``:: + + cctx = zstd.ZstdCompressor() + with cctx.stream_writer(fh) as compressor: + ... + total_written = compressor.tell() + +``stream_writer()`` accepts a ``write_return_read`` boolean argument to control +the return value of ``write()``. When ``False`` (the default), ``write()`` returns +the number of bytes that were ``write()``en to the underlying object. When +``True``, ``write()`` returns the number of bytes read from the input that +were subsequently written to the compressor. ``True`` is the *proper* behavior +for ``write()`` as specified by the ``io.RawIOBase`` interface and will become +the default value in a future release. + +Streaming Output API +^^^^^^^^^^^^^^^^^^^^ + +``read_to_iter(reader)`` provides a mechanism to stream data out of a +compressor as an iterator of data chunks.:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_to_iter(fh): + # Do something with emitted data. + +``read_to_iter()`` accepts an object that has a ``read(size)`` method or +conforms to the buffer protocol. + +Uncompressed data is fetched from the source either by calling ``read(size)`` +or by fetching a slice of data from the object directly (in the case where +the buffer protocol is being used). The returned iterator consists of chunks +of compressed data. + +If reading from the source via ``read()``, ``read()`` will be called until +it raises or returns an empty bytes (``b''``). It is perfectly valid for +the source to deliver fewer bytes than were what requested by ``read(size)``. + +Like ``stream_writer()``, ``read_to_iter()`` also accepts a ``size`` argument +declaring the size of the input stream:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_to_iter(fh, size=some_int): + pass + +You can also control the size that data is ``read()`` from the source and +the ideal size of output chunks:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_to_iter(fh, read_size=16384, write_size=8192): + pass + +Unlike ``stream_writer()``, ``read_to_iter()`` does not give direct control +over the sizes of chunks fed into the compressor. Instead, chunk sizes will +be whatever the object being read from delivers. These will often be of a +uniform size. + +Stream Copying API +^^^^^^^^^^^^^^^^^^ + +``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while +compressing it.:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh) + +For example, say you wish to compress a file:: + + cctx = zstd.ZstdCompressor() + with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: + cctx.copy_stream(ifh, ofh) + +It is also possible to declare the size of the source stream:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh, size=len_of_input) + +You can also specify how large the chunks that are ``read()`` and ``write()`` +from and to the streams:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384) + +The stream copier returns a 2-tuple of bytes read and written:: + + cctx = zstd.ZstdCompressor() + read_count, write_count = cctx.copy_stream(ifh, ofh) + +Compressor API +^^^^^^^^^^^^^^ + +``compressobj()`` returns an object that exposes ``compress(data)`` and +``flush()`` methods. Each returns compressed data or an empty bytes. + +The purpose of ``compressobj()`` is to provide an API-compatible interface +with ``zlib.compressobj``, ``bz2.BZ2Compressor``, etc. This allows callers to +swap in different compressor objects while using the same API. + +``flush()`` accepts an optional argument indicating how to end the stream. +``zstd.COMPRESSOBJ_FLUSH_FINISH`` (the default) ends the compression stream. +Once this type of flush is performed, ``compress()`` and ``flush()`` can +no longer be called. This type of flush **must** be called to end the +compression context. If not called, returned data may be incomplete. + +A ``zstd.COMPRESSOBJ_FLUSH_BLOCK`` argument to ``flush()`` will flush a +zstd block. Flushes of this type can be performed multiple times. The next +call to ``compress()`` will begin a new zstd block. + +Here is how this API should be used:: + + cctx = zstd.ZstdCompressor() + cobj = cctx.compressobj() + data = cobj.compress(b'raw input 0') + data = cobj.compress(b'raw input 1') + data = cobj.flush() + +Or to flush blocks:: + + cctx.zstd.ZstdCompressor() + cobj = cctx.compressobj() + data = cobj.compress(b'chunk in first block') + data = cobj.flush(zstd.COMPRESSOBJ_FLUSH_BLOCK) + data = cobj.compress(b'chunk in second block') + data = cobj.flush() + +For best performance results, keep input chunks under 256KB. This avoids +extra allocations for a large output object. + +It is possible to declare the input size of the data that will be fed into +the compressor:: + + cctx = zstd.ZstdCompressor() + cobj = cctx.compressobj(size=6) + data = cobj.compress(b'foobar') + data = cobj.flush() + +Chunker API +^^^^^^^^^^^ + +``chunker(size=None, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE)`` returns +an object that can be used to iteratively feed chunks of data into a compressor +and produce output chunks of a uniform size. + +The object returned by ``chunker()`` exposes the following methods: + +``compress(data)`` + Feeds new input data into the compressor. + +``flush()`` + Flushes all data currently in the compressor. + +``finish()`` + Signals the end of input data. No new data can be compressed after this + method is called. + +``compress()``, ``flush()``, and ``finish()`` all return an iterator of +``bytes`` instances holding compressed data. The iterator may be empty. Callers +MUST iterate through all elements of the returned iterator before performing +another operation on the object. + +All chunks emitted by ``compress()`` will have a length of ``chunk_size``. + +``flush()`` and ``finish()`` may return a final chunk smaller than +``chunk_size``. + +Here is how the API should be used:: + + cctx = zstd.ZstdCompressor() + chunker = cctx.chunker(chunk_size=32768) + + with open(path, 'rb') as fh: + while True: + in_chunk = fh.read(32768) + if not in_chunk: + break + + for out_chunk in chunker.compress(in_chunk): + # Do something with output chunk of size 32768. + + for out_chunk in chunker.finish(): + # Do something with output chunks that finalize the zstd frame. + +The ``chunker()`` API is often a better alternative to ``compressobj()``. + +``compressobj()`` will emit output data as it is available. This results in a +*stream* of output chunks of varying sizes. The consistency of the output chunk +size with ``chunker()`` is more appropriate for many usages, such as sending +compressed data to a socket. + +``compressobj()`` may also perform extra memory reallocations in order to +dynamically adjust the sizes of the output chunks. Since ``chunker()`` output +chunks are all the same size (except for flushed or final chunks), there is +less memory allocation overhead. + +Batch Compression API +^^^^^^^^^^^^^^^^^^^^^ + +(Experimental. Not yet supported in CFFI bindings.) + +``multi_compress_to_buffer(data, [threads=0])`` performs compression of multiple +inputs as a single operation. + +Data to be compressed can be passed as a ``BufferWithSegmentsCollection``, a +``BufferWithSegments``, or a list containing byte like objects. Each element of +the container will be compressed individually using the configured parameters +on the ``ZstdCompressor`` instance. + +The ``threads`` argument controls how many threads to use for compression. The +default is ``0`` which means to use a single thread. Negative values use the +number of logical CPUs in the machine. + +The function returns a ``BufferWithSegmentsCollection``. This type represents +N discrete memory allocations, eaching holding 1 or more compressed frames. + +Output data is written to shared memory buffers. This means that unlike +regular Python objects, a reference to *any* object within the collection +keeps the shared buffer and therefore memory backing it alive. This can have +undesirable effects on process memory usage. + +The API and behavior of this function is experimental and will likely change. +Known deficiencies include: + +* If asked to use multiple threads, it will always spawn that many threads, + even if the input is too small to use them. It should automatically lower + the thread count when the extra threads would just add overhead. +* The buffer allocation strategy is fixed. There is room to make it dynamic, + perhaps even to allow one output buffer per input, facilitating a variation + of the API to return a list without the adverse effects of shared memory + buffers. + +ZstdDecompressor +---------------- + +The ``ZstdDecompressor`` class provides an interface for performing +decompression. It is effectively a wrapper around the ``ZSTD_DCtx`` type from +the C API. + +Each instance is associated with parameters that control decompression. These +come from the following named arguments (all optional): + +dict_data + Compression dictionary to use. +max_window_size + Sets an uppet limit on the window size for decompression operations in + kibibytes. This setting can be used to prevent large memory allocations + for inputs using large compression windows. +format + Set the format of data for the decoder. By default, this is + ``zstd.FORMAT_ZSTD1``. It can be set to ``zstd.FORMAT_ZSTD1_MAGICLESS`` to + allow decoding frames without the 4 byte magic header. Not all decompression + APIs support this mode. + +The interface of this class is very similar to ``ZstdCompressor`` (by design). + +Unless specified otherwise, assume that no two methods of ``ZstdDecompressor`` +instances can be called from multiple Python threads simultaneously. In other +words, assume instances are not thread safe unless stated otherwise. + +Utility Methods +^^^^^^^^^^^^^^^ + +``memory_size()`` obtains the size of the underlying zstd decompression context, +in bytes.:: + + dctx = zstd.ZstdDecompressor() + size = dctx.memory_size() + +Simple API +^^^^^^^^^^ + +``decompress(data)`` can be used to decompress an entire compressed zstd +frame in a single operation.:: + + dctx = zstd.ZstdDecompressor() + decompressed = dctx.decompress(data) + +By default, ``decompress(data)`` will only work on data written with the content +size encoded in its header (this is the default behavior of +``ZstdCompressor().compress()`` but may not be true for streaming compression). If +compressed data without an embedded content size is seen, ``zstd.ZstdError`` will +be raised. + +If the compressed data doesn't have its content size embedded within it, +decompression can be attempted by specifying the ``max_output_size`` +argument.:: + + dctx = zstd.ZstdDecompressor() + uncompressed = dctx.decompress(data, max_output_size=1048576) + +Ideally, ``max_output_size`` will be identical to the decompressed output +size. + +If ``max_output_size`` is too small to hold the decompressed data, +``zstd.ZstdError`` will be raised. + +If ``max_output_size`` is larger than the decompressed data, the allocated +output buffer will be resized to only use the space required. + +Please note that an allocation of the requested ``max_output_size`` will be +performed every time the method is called. Setting to a very large value could +result in a lot of work for the memory allocator and may result in +``MemoryError`` being raised if the allocation fails. + +.. important:: + + If the exact size of decompressed data is unknown (not passed in explicitly + and not stored in the zstandard frame), for performance reasons it is + encouraged to use a streaming API. + +Stream Reader API +^^^^^^^^^^^^^^^^^ + +``stream_reader(source)`` can be used to obtain an object conforming to the +``io.RawIOBase`` interface for reading decompressed output as a stream:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + reader = dctx.stream_reader(fh) + while True: + chunk = reader.read(16384) + if not chunk: + break + + # Do something with decompressed chunk. + +The stream can also be used as a context manager:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + with dctx.stream_reader(fh) as reader: + ... + +When used as a context manager, the stream is closed and the underlying +resources are released when the context manager exits. Future operations against +the stream will fail. + +The ``source`` argument to ``stream_reader()`` can be any object with a +``read(size)`` method or any object implementing the *buffer protocol*. + +If the ``source`` is a stream, you can specify how large ``read()`` requests +to that stream should be via the ``read_size`` argument. It defaults to +``zstandard.DECOMPRESSION_RECOMMENDED_INPUT_SIZE``.:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + # Will perform fh.read(8192) when obtaining data for the decompressor. + with dctx.stream_reader(fh, read_size=8192) as reader: + ... + +The stream returned by ``stream_reader()`` is not writable. + +The stream returned by ``stream_reader()`` is *partially* seekable. +Absolute and relative positions (``SEEK_SET`` and ``SEEK_CUR``) forward +of the current position are allowed. Offsets behind the current read +position and offsets relative to the end of stream are not allowed and +will raise ``ValueError`` if attempted. + +``tell()`` returns the number of decompressed bytes read so far. + +Not all I/O methods are implemented. Notably missing is support for +``readline()``, ``readlines()``, and linewise iteration support. This is +because streams operate on binary data - not text data. If you want to +convert decompressed output to text, you can chain an ``io.TextIOWrapper`` +to the stream:: + + with open(path, 'rb') as fh: + dctx = zstd.ZstdDecompressor() + stream_reader = dctx.stream_reader(fh) + text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8') + + for line in text_stream: + ... + +The ``read_across_frames`` argument to ``stream_reader()`` controls the +behavior of read operations when the end of a zstd *frame* is encountered. +When ``False`` (the default), a read will complete when the end of a +zstd *frame* is encountered. When ``True``, a read can potentially +return data spanning multiple zstd *frames*. + +Streaming Input API +^^^^^^^^^^^^^^^^^^^ + +``stream_writer(fh)`` allows you to *stream* data into a decompressor. + +Returned instances implement the ``io.RawIOBase`` interface. Only methods +that involve writing will do useful things. + +The argument to ``stream_writer()`` is typically an object that also implements +``io.RawIOBase``. But any object with a ``write(data)`` method will work. Many +common Python types conform to this interface, including open file handles +and ``io.BytesIO``. + +Behavior is similar to ``ZstdCompressor.stream_writer()``: compressed data +is sent to the decompressor by calling ``write(data)`` and decompressed +output is written to the underlying stream by calling its ``write(data)`` +method.:: + + dctx = zstd.ZstdDecompressor() + decompressor = dctx.stream_writer(fh) + + decompressor.write(compressed_data) + ... + + +Calls to ``write()`` will return the number of bytes written to the output +object. Not all inputs will result in bytes being written, so return values +of ``0`` are possible. + +Like the ``stream_writer()`` compressor, instances can be used as context +managers. However, context managers add no extra special behavior and offer +little to no benefit to being used. + +Calling ``close()`` will mark the stream as closed and subsequent I/O operations +will raise ``ValueError`` (per the documented behavior of ``io.RawIOBase``). +``close()`` will also call ``close()`` on the underlying stream if such a +method exists. + +The size of chunks being ``write()`` to the destination can be specified:: + + dctx = zstd.ZstdDecompressor() + with dctx.stream_writer(fh, write_size=16384) as decompressor: + pass + +You can see how much memory is being used by the decompressor:: + + dctx = zstd.ZstdDecompressor() + with dctx.stream_writer(fh) as decompressor: + byte_size = decompressor.memory_size() + +``stream_writer()`` accepts a ``write_return_read`` boolean argument to control +the return value of ``write()``. When ``False`` (the default)``, ``write()`` +returns the number of bytes that were ``write()``en to the underlying stream. +When ``True``, ``write()`` returns the number of bytes read from the input. +``True`` is the *proper* behavior for ``write()`` as specified by the +``io.RawIOBase`` interface and will become the default in a future release. + +Streaming Output API +^^^^^^^^^^^^^^^^^^^^ + +``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a +compressed source as an iterator of data chunks.:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_to_iter(fh): + # Do something with original data. + +``read_to_iter()`` accepts an object with a ``read(size)`` method that will +return compressed bytes or an object conforming to the buffer protocol that +can expose its data as a contiguous range of bytes. + +``read_to_iter()`` returns an iterator whose elements are chunks of the +decompressed data. + +The size of requested ``read()`` from the source can be specified:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_to_iter(fh, read_size=16384): + pass + +It is also possible to skip leading bytes in the input data:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_to_iter(fh, skip_bytes=1): + pass + +.. tip:: + + Skipping leading bytes is useful if the source data contains extra + *header* data. Traditionally, you would need to create a slice or + ``memoryview`` of the data you want to decompress. This would create + overhead. It is more efficient to pass the offset into this API. + +Similarly to ``ZstdCompressor.read_to_iter()``, the consumer of the iterator +controls when data is decompressed. If the iterator isn't consumed, +decompression is put on hold. + +When ``read_to_iter()`` is passed an object conforming to the buffer protocol, +the behavior may seem similar to what occurs when the simple decompression +API is used. However, this API works when the decompressed size is unknown. +Furthermore, if feeding large inputs, the decompressor will work in chunks +instead of performing a single operation. + +Stream Copying API +^^^^^^^^^^^^^^^^^^ + +``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while +performing decompression.:: + + dctx = zstd.ZstdDecompressor() + dctx.copy_stream(ifh, ofh) + +e.g. to decompress a file to another file:: + + dctx = zstd.ZstdDecompressor() + with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: + dctx.copy_stream(ifh, ofh) + +The size of chunks being ``read()`` and ``write()`` from and to the streams +can be specified:: + + dctx = zstd.ZstdDecompressor() + dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384) + +Decompressor API +^^^^^^^^^^^^^^^^ + +``decompressobj()`` returns an object that exposes a ``decompress(data)`` +method. Compressed data chunks are fed into ``decompress(data)`` and +uncompressed output (or an empty bytes) is returned. Output from subsequent +calls needs to be concatenated to reassemble the full decompressed byte +sequence. + +The purpose of ``decompressobj()`` is to provide an API-compatible interface +with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers +to swap in different decompressor objects while using the same API. + +Each object is single use: once an input frame is decoded, ``decompress()`` +can no longer be called. + +Here is how this API should be used:: + + dctx = zstd.ZstdDecompressor() + dobj = dctx.decompressobj() + data = dobj.decompress(compressed_chunk_0) + data = dobj.decompress(compressed_chunk_1) + +By default, calls to ``decompress()`` write output data in chunks of size +``DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE``. These chunks are concatenated +before being returned to the caller. It is possible to define the size of +these temporary chunks by passing ``write_size`` to ``decompressobj()``:: + + dctx = zstd.ZstdDecompressor() + dobj = dctx.decompressobj(write_size=1048576) + +.. note:: + + Because calls to ``decompress()`` may need to perform multiple + memory (re)allocations, this streaming decompression API isn't as + efficient as other APIs. + +For compatibility with the standard library APIs, instances expose a +``flush([length=None])`` method. This method no-ops and has no meaningful +side-effects, making it safe to call any time. + +Batch Decompression API +^^^^^^^^^^^^^^^^^^^^^^^ + +(Experimental. Not yet supported in CFFI bindings.) + +``multi_decompress_to_buffer()`` performs decompression of multiple +frames as a single operation and returns a ``BufferWithSegmentsCollection`` +containing decompressed data for all inputs. + +Compressed frames can be passed to the function as a ``BufferWithSegments``, +a ``BufferWithSegmentsCollection``, or as a list containing objects that +conform to the buffer protocol. For best performance, pass a +``BufferWithSegmentsCollection`` or a ``BufferWithSegments``, as +minimal input validation will be done for that type. If calling from +Python (as opposed to C), constructing one of these instances may add +overhead cancelling out the performance overhead of validation for list +inputs.:: + + dctx = zstd.ZstdDecompressor() + results = dctx.multi_decompress_to_buffer([b'...', b'...']) + +The decompressed size of each frame MUST be discoverable. It can either be +embedded within the zstd frame (``write_content_size=True`` argument to +``ZstdCompressor``) or passed in via the ``decompressed_sizes`` argument. + +The ``decompressed_sizes`` argument is an object conforming to the buffer +protocol which holds an array of 64-bit unsigned integers in the machine's +native format defining the decompressed sizes of each frame. If this argument +is passed, it avoids having to scan each frame for its decompressed size. +This frame scanning can add noticeable overhead in some scenarios.:: + + frames = [...] + sizes = struct.pack('=QQQQ', len0, len1, len2, len3) + + dctx = zstd.ZstdDecompressor() + results = dctx.multi_decompress_to_buffer(frames, decompressed_sizes=sizes) + +The ``threads`` argument controls the number of threads to use to perform +decompression operations. The default (``0``) or the value ``1`` means to +use a single thread. Negative values use the number of logical CPUs in the +machine. + +.. note:: + + It is possible to pass a ``mmap.mmap()`` instance into this function by + wrapping it with a ``BufferWithSegments`` instance (which will define the + offsets of frames within the memory mapped region). + +This function is logically equivalent to performing ``dctx.decompress()`` +on each input frame and returning the result. + +This function exists to perform decompression on multiple frames as fast +as possible by having as little overhead as possible. Since decompression is +performed as a single operation and since the decompressed output is stored in +a single buffer, extra memory allocations, Python objects, and Python function +calls are avoided. This is ideal for scenarios where callers know up front that +they need to access data for multiple frames, such as when *delta chains* are +being used. + +Currently, the implementation always spawns multiple threads when requested, +even if the amount of work to do is small. In the future, it will be smarter +about avoiding threads and their associated overhead when the amount of +work to do is small. + +Prefix Dictionary Chain Decompression +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``decompress_content_dict_chain(frames)`` performs decompression of a list of +zstd frames produced using chained *prefix* dictionary compression. Such +a list of frames is produced by compressing discrete inputs where each +non-initial input is compressed with a *prefix* dictionary consisting of the +content of the previous input. + +For example, say you have the following inputs:: + + inputs = [b'input 1', b'input 2', b'input 3'] + +The zstd frame chain consists of: + +1. ``b'input 1'`` compressed in standalone/discrete mode +2. ``b'input 2'`` compressed using ``b'input 1'`` as a *prefix* dictionary +3. ``b'input 3'`` compressed using ``b'input 2'`` as a *prefix* dictionary + +Each zstd frame **must** have the content size written. + +The following Python code can be used to produce a *prefix dictionary chain*:: + + def make_chain(inputs): + frames = [] + + # First frame is compressed in standalone/discrete mode. + zctx = zstd.ZstdCompressor() + frames.append(zctx.compress(inputs[0])) + + # Subsequent frames use the previous fulltext as a prefix dictionary + for i, raw in enumerate(inputs[1:]): + dict_data = zstd.ZstdCompressionDict( + inputs[i], dict_type=zstd.DICT_TYPE_RAWCONTENT) + zctx = zstd.ZstdCompressor(dict_data=dict_data) + frames.append(zctx.compress(raw)) + + return frames + +``decompress_content_dict_chain()`` returns the uncompressed data of the last +element in the input chain. + + +.. note:: + + It is possible to implement *prefix dictionary chain* decompression + on top of other APIs. However, this function will likely be faster - + especially for long input chains - as it avoids the overhead of instantiating + and passing around intermediate objects between C and Python. + +Multi-Threaded Compression +-------------------------- + +``ZstdCompressor`` accepts a ``threads`` argument that controls the number +of threads to use for compression. The way this works is that input is split +into segments and each segment is fed into a worker pool for compression. Once +a segment is compressed, it is flushed/appended to the output. + +.. note:: + + These threads are created at the C layer and are not Python threads. So they + work outside the GIL. It is therefore possible to CPU saturate multiple cores + from Python. + +The segment size for multi-threaded compression is chosen from the window size +of the compressor. This is derived from the ``window_log`` attribute of a +``ZstdCompressionParameters`` instance. By default, segment sizes are in the 1+MB +range. + +If multi-threaded compression is requested and the input is smaller than the +configured segment size, only a single compression thread will be used. If the +input is smaller than the segment size multiplied by the thread pool size or +if data cannot be delivered to the compressor fast enough, not all requested +compressor threads may be active simultaneously. + +Compared to non-multi-threaded compression, multi-threaded compression has +higher per-operation overhead. This includes extra memory operations, +thread creation, lock acquisition, etc. + +Due to the nature of multi-threaded compression using *N* compression +*states*, the output from multi-threaded compression will likely be larger +than non-multi-threaded compression. The difference is usually small. But +there is a CPU/wall time versus size trade off that may warrant investigation. + +Output from multi-threaded compression does not require any special handling +on the decompression side. To the decompressor, data generated with single +threaded compressor looks the same as data generated by a multi-threaded +compressor and does not require any special handling or additional resource +requirements. + +Dictionary Creation and Management +---------------------------------- + +Compression dictionaries are represented with the ``ZstdCompressionDict`` type. + +Instances can be constructed from bytes:: + + dict_data = zstd.ZstdCompressionDict(data) + +It is possible to construct a dictionary from *any* data. If the data doesn't +begin with a magic header, it will be treated as a *prefix* dictionary. +*Prefix* dictionaries allow compression operations to reference raw data +within the dictionary. + +It is possible to force the use of *prefix* dictionaries or to require a +dictionary header: + + dict_data = zstd.ZstdCompressionDict(data, + dict_type=zstd.DICT_TYPE_RAWCONTENT) + + dict_data = zstd.ZstdCompressionDict(data, + dict_type=zstd.DICT_TYPE_FULLDICT) + +You can see how many bytes are in the dictionary by calling ``len()``:: + + dict_data = zstd.train_dictionary(size, samples) + dict_size = len(dict_data) # will not be larger than ``size`` + +Once you have a dictionary, you can pass it to the objects performing +compression and decompression:: + + dict_data = zstd.train_dictionary(131072, samples) + + cctx = zstd.ZstdCompressor(dict_data=dict_data) + for source_data in input_data: + compressed = cctx.compress(source_data) + # Do something with compressed data. + + dctx = zstd.ZstdDecompressor(dict_data=dict_data) + for compressed_data in input_data: + buffer = io.BytesIO() + with dctx.stream_writer(buffer) as decompressor: + decompressor.write(compressed_data) + # Do something with raw data in ``buffer``. + +Dictionaries have unique integer IDs. You can retrieve this ID via:: + + dict_id = zstd.dictionary_id(dict_data) + +You can obtain the raw data in the dict (useful for persisting and constructing +a ``ZstdCompressionDict`` later) via ``as_bytes()``:: + + dict_data = zstd.train_dictionary(size, samples) + raw_data = dict_data.as_bytes() + +By default, when a ``ZstdCompressionDict`` is *attached* to a +``ZstdCompressor``, each ``ZstdCompressor`` performs work to prepare the +dictionary for use. This is fine if only 1 compression operation is being +performed or if the ``ZstdCompressor`` is being reused for multiple operations. +But if multiple ``ZstdCompressor`` instances are being used with the dictionary, +this can add overhead. + +It is possible to *precompute* the dictionary so it can readily be consumed +by multiple ``ZstdCompressor`` instances:: + + d = zstd.ZstdCompressionDict(data) + + # Precompute for compression level 3. + d.precompute_compress(level=3) + + # Precompute with specific compression parameters. + params = zstd.ZstdCompressionParameters(...) + d.precompute_compress(compression_params=params) + +.. note:: + + When a dictionary is precomputed, the compression parameters used to + precompute the dictionary overwrite some of the compression parameters + specified to ``ZstdCompressor.__init__``. + +Training Dictionaries +^^^^^^^^^^^^^^^^^^^^^ + +Unless using *prefix* dictionaries, dictionary data is produced by *training* +on existing data:: + + dict_data = zstd.train_dictionary(size, samples) + +This takes a target dictionary size and list of bytes instances and creates and +returns a ``ZstdCompressionDict``. + +The dictionary training mechanism is known as *cover*. More details about it are +available in the paper *Effective Construction of Relative Lempel-Ziv +Dictionaries* (authors: Liao, Petri, Moffat, Wirth). + +The cover algorithm takes parameters ``k` and ``d``. These are the +*segment size* and *dmer size*, respectively. The returned dictionary +instance created by this function has ``k`` and ``d`` attributes +containing the values for these parameters. If a ``ZstdCompressionDict`` +is constructed from raw bytes data (a content-only dictionary), the +``k`` and ``d`` attributes will be ``0``. + +The segment and dmer size parameters to the cover algorithm can either be +specified manually or ``train_dictionary()`` can try multiple values +and pick the best one, where *best* means the smallest compressed data size. +This later mode is called *optimization* mode. + +If none of ``k``, ``d``, ``steps``, ``threads``, ``level``, ``notifications``, +or ``dict_id`` (basically anything from the underlying ``ZDICT_cover_params_t`` +struct) are defined, *optimization* mode is used with default parameter +values. + +If ``steps`` or ``threads`` are defined, then *optimization* mode is engaged +with explicit control over those parameters. Specifying ``threads=0`` or +``threads=1`` can be used to engage *optimization* mode if other parameters +are not defined. + +Otherwise, non-*optimization* mode is used with the parameters specified. + +This function takes the following arguments: + +dict_size + Target size in bytes of the dictionary to generate. +samples + A list of bytes holding samples the dictionary will be trained from. +k + Parameter to cover algorithm defining the segment size. A reasonable range + is [16, 2048+]. +d + Parameter to cover algorithm defining the dmer size. A reasonable range is + [6, 16]. ``d`` must be less than or equal to ``k``. +dict_id + Integer dictionary ID for the produced dictionary. Default is 0, which uses + a random value. +steps + Number of steps through ``k`` values to perform when trying parameter + variations. +threads + Number of threads to use when trying parameter variations. Default is 0, + which means to use a single thread. A negative value can be specified to + use as many threads as there are detected logical CPUs. +level + Integer target compression level when trying parameter variations. +notifications + Controls writing of informational messages to ``stderr``. ``0`` (the + default) means to write nothing. ``1`` writes errors. ``2`` writes + progression info. ``3`` writes more details. And ``4`` writes all info. + +Explicit Compression Parameters +------------------------------- + +Zstandard offers a high-level *compression level* that maps to lower-level +compression parameters. For many consumers, this numeric level is the only +compression setting you'll need to touch. + +But for advanced use cases, it might be desirable to tweak these lower-level +settings. + +The ``ZstdCompressionParameters`` type represents these low-level compression +settings. + +Instances of this type can be constructed from a myriad of keyword arguments +(defined below) for complete low-level control over each adjustable +compression setting. + +From a higher level, one can construct a ``ZstdCompressionParameters`` instance +given a desired compression level and target input and dictionary size +using ``ZstdCompressionParameters.from_level()``. e.g.:: + + # Derive compression settings for compression level 7. + params = zstd.ZstdCompressionParameters.from_level(7) + + # With an input size of 1MB + params = zstd.ZstdCompressionParameters.from_level(7, source_size=1048576) + +Using ``from_level()``, it is also possible to override individual compression +parameters or to define additional settings that aren't automatically derived. +e.g.:: + + params = zstd.ZstdCompressionParameters.from_level(4, window_log=10) + params = zstd.ZstdCompressionParameters.from_level(5, threads=4) + +Or you can define low-level compression settings directly:: + + params = zstd.ZstdCompressionParameters(window_log=12, enable_ldm=True) + +Once a ``ZstdCompressionParameters`` instance is obtained, it can be used to +configure a compressor:: + + cctx = zstd.ZstdCompressor(compression_params=params) + +The named arguments and attributes of ``ZstdCompressionParameters`` are as +follows: + +* format +* compression_level +* window_log +* hash_log +* chain_log +* search_log +* min_match +* target_length +* strategy +* compression_strategy (deprecated: same as ``strategy``) +* write_content_size +* write_checksum +* write_dict_id +* job_size +* overlap_log +* overlap_size_log (deprecated: same as ``overlap_log``) +* force_max_window +* enable_ldm +* ldm_hash_log +* ldm_min_match +* ldm_bucket_size_log +* ldm_hash_rate_log +* ldm_hash_every_log (deprecated: same as ``ldm_hash_rate_log``) +* threads + +Some of these are very low-level settings. It may help to consult the official +zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants +in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h). + +Frame Inspection +---------------- + +Data emitted from zstd compression is encapsulated in a *frame*. This frame +begins with a 4 byte *magic number* header followed by 2 to 14 bytes describing +the frame in more detail. For more info, see +https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md. + +``zstd.get_frame_parameters(data)`` parses a zstd *frame* header from a bytes +instance and return a ``FrameParameters`` object describing the frame. + +Depending on which fields are present in the frame and their values, the +length of the frame parameters varies. If insufficient bytes are passed +in to fully parse the frame parameters, ``ZstdError`` is raised. To ensure +frame parameters can be parsed, pass in at least 18 bytes. + +``FrameParameters`` instances have the following attributes: + +content_size + Integer size of original, uncompressed content. This will be ``0`` if the + original content size isn't written to the frame (controlled with the + ``write_content_size`` argument to ``ZstdCompressor``) or if the input + content size was ``0``. + +window_size + Integer size of maximum back-reference distance in compressed data. + +dict_id + Integer of dictionary ID used for compression. ``0`` if no dictionary + ID was used or if the dictionary ID was ``0``. + +has_checksum + Bool indicating whether a 4 byte content checksum is stored at the end + of the frame. + +``zstd.frame_header_size(data)`` returns the size of the zstandard frame +header. + +``zstd.frame_content_size(data)`` returns the content size as parsed from +the frame header. ``-1`` means the content size is unknown. ``0`` means +an empty frame. The content size is usually correct. However, it may not +be accurate. + +Misc Functionality +------------------ + +estimate_decompression_context_size() +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Estimate the memory size requirements for a decompressor instance. + +Constants +--------- + +The following module constants/attributes are exposed: + +ZSTD_VERSION + This module attribute exposes a 3-tuple of the Zstandard version. e.g. + ``(1, 0, 0)`` +MAX_COMPRESSION_LEVEL + Integer max compression level accepted by compression functions +COMPRESSION_RECOMMENDED_INPUT_SIZE + Recommended chunk size to feed to compressor functions +COMPRESSION_RECOMMENDED_OUTPUT_SIZE + Recommended chunk size for compression output +DECOMPRESSION_RECOMMENDED_INPUT_SIZE + Recommended chunk size to feed into decompresor functions +DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE + Recommended chunk size for decompression output + +FRAME_HEADER + bytes containing header of the Zstandard frame +MAGIC_NUMBER + Frame header as an integer + +FLUSH_BLOCK + Flushing behavior that denotes to flush a zstd block. A decompressor will + be able to decode all data fed into the compressor so far. +FLUSH_FRAME + Flushing behavior that denotes to end a zstd frame. Any new data fed + to the compressor will start a new frame. + +CONTENTSIZE_UNKNOWN + Value for content size when the content size is unknown. +CONTENTSIZE_ERROR + Value for content size when content size couldn't be determined. + +WINDOWLOG_MIN + Minimum value for compression parameter +WINDOWLOG_MAX + Maximum value for compression parameter +CHAINLOG_MIN + Minimum value for compression parameter +CHAINLOG_MAX + Maximum value for compression parameter +HASHLOG_MIN + Minimum value for compression parameter +HASHLOG_MAX + Maximum value for compression parameter +SEARCHLOG_MIN + Minimum value for compression parameter +SEARCHLOG_MAX + Maximum value for compression parameter +MINMATCH_MIN + Minimum value for compression parameter +MINMATCH_MAX + Maximum value for compression parameter +SEARCHLENGTH_MIN + Minimum value for compression parameter + + Deprecated: use ``MINMATCH_MIN`` +SEARCHLENGTH_MAX + Maximum value for compression parameter + + Deprecated: use ``MINMATCH_MAX`` +TARGETLENGTH_MIN + Minimum value for compression parameter +STRATEGY_FAST + Compression strategy +STRATEGY_DFAST + Compression strategy +STRATEGY_GREEDY + Compression strategy +STRATEGY_LAZY + Compression strategy +STRATEGY_LAZY2 + Compression strategy +STRATEGY_BTLAZY2 + Compression strategy +STRATEGY_BTOPT + Compression strategy +STRATEGY_BTULTRA + Compression strategy +STRATEGY_BTULTRA2 + Compression strategy + +FORMAT_ZSTD1 + Zstandard frame format +FORMAT_ZSTD1_MAGICLESS + Zstandard frame format without magic header + +Performance Considerations +-------------------------- + +The ``ZstdCompressor`` and ``ZstdDecompressor`` types maintain state to a +persistent compression or decompression *context*. Reusing a ``ZstdCompressor`` +or ``ZstdDecompressor`` instance for multiple operations is faster than +instantiating a new ``ZstdCompressor`` or ``ZstdDecompressor`` for each +operation. The differences are magnified as the size of data decreases. For +example, the difference between *context* reuse and non-reuse for 100,000 +100 byte inputs will be significant (possiby over 10x faster to reuse contexts) +whereas 10 100,000,000 byte inputs will be more similar in speed (because the +time spent doing compression dwarfs time spent creating new *contexts*). + +Buffer Types +------------ + +The API exposes a handful of custom types for interfacing with memory buffers. +The primary goal of these types is to facilitate efficient multi-object +operations. + +The essential idea is to have a single memory allocation provide backing +storage for multiple logical objects. This has 2 main advantages: fewer +allocations and optimal memory access patterns. This avoids having to allocate +a Python object for each logical object and furthermore ensures that access of +data for objects can be sequential (read: fast) in memory. + +BufferWithSegments +^^^^^^^^^^^^^^^^^^ + +The ``BufferWithSegments`` type represents a memory buffer containing N +discrete items of known lengths (segments). It is essentially a fixed size +memory address and an array of 2-tuples of ``(offset, length)`` 64-bit +unsigned native endian integers defining the byte offset and length of each +segment within the buffer. + +Instances behave like containers. + +``len()`` returns the number of segments within the instance. + +``o[index]`` or ``__getitem__`` obtains a ``BufferSegment`` representing an +individual segment within the backing buffer. That returned object references +(not copies) memory. This means that iterating all objects doesn't copy +data within the buffer. + +The ``.size`` attribute contains the total size in bytes of the backing +buffer. + +Instances conform to the buffer protocol. So a reference to the backing bytes +can be obtained via ``memoryview(o)``. A *copy* of the backing bytes can also +be obtained via ``.tobytes()``. + +The ``.segments`` attribute exposes the array of ``(offset, length)`` for +segments within the buffer. It is a ``BufferSegments`` type. + +BufferSegment +^^^^^^^^^^^^^ + +The ``BufferSegment`` type represents a segment within a ``BufferWithSegments``. +It is essentially a reference to N bytes within a ``BufferWithSegments``. + +``len()`` returns the length of the segment in bytes. + +``.offset`` contains the byte offset of this segment within its parent +``BufferWithSegments`` instance. + +The object conforms to the buffer protocol. ``.tobytes()`` can be called to +obtain a ``bytes`` instance with a copy of the backing bytes. + +BufferSegments +^^^^^^^^^^^^^^ + +This type represents an array of ``(offset, length)`` integers defining segments +within a ``BufferWithSegments``. + +The array members are 64-bit unsigned integers using host/native bit order. + +Instances conform to the buffer protocol. + +BufferWithSegmentsCollection +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``BufferWithSegmentsCollection`` type represents a virtual spanning view +of multiple ``BufferWithSegments`` instances. + +Instances are constructed from 1 or more ``BufferWithSegments`` instances. The +resulting object behaves like an ordered sequence whose members are the +segments within each ``BufferWithSegments``. + +``len()`` returns the number of segments within all ``BufferWithSegments`` +instances. + +``o[index]`` and ``__getitem__(index)`` return the ``BufferSegment`` at +that offset as if all ``BufferWithSegments`` instances were a single +entity. + +If the object is composed of 2 ``BufferWithSegments`` instances with the +first having 2 segments and the second have 3 segments, then ``b[0]`` +and ``b[1]`` access segments in the first object and ``b[2]``, ``b[3]``, +and ``b[4]`` access segments from the second. + +Choosing an API +=============== + +There are multiple APIs for performing compression and decompression. This is +because different applications have different needs and the library wants to +facilitate optimal use in as many use cases as possible. + +From a high-level, APIs are divided into *one-shot* and *streaming*: either you +are operating on all data at once or you operate on it piecemeal. + +The *one-shot* APIs are useful for small data, where the input or output +size is known. (The size can come from a buffer length, file size, or +stored in the zstd frame header.) A limitation of the *one-shot* APIs is that +input and output must fit in memory simultaneously. For say a 4 GB input, +this is often not feasible. + +The *one-shot* APIs also perform all work as a single operation. So, if you +feed it large input, it could take a long time for the function to return. + +The streaming APIs do not have the limitations of the simple API. But the +price you pay for this flexibility is that they are more complex than a +single function call. + +The streaming APIs put the caller in control of compression and decompression +behavior by allowing them to directly control either the input or output side +of the operation. + +With the *streaming input*, *compressor*, and *decompressor* APIs, the caller +has full control over the input to the compression or decompression stream. +They can directly choose when new data is operated on. + +With the *streaming ouput* APIs, the caller has full control over the output +of the compression or decompression stream. It can choose when to receive +new data. + +When using the *streaming* APIs that operate on file-like or stream objects, +it is important to consider what happens in that object when I/O is requested. +There is potential for long pauses as data is read or written from the +underlying stream (say from interacting with a filesystem or network). This +could add considerable overhead. + +Thread Safety +============= + +``ZstdCompressor`` and ``ZstdDecompressor`` instances have no guarantees +about thread safety. Do not operate on the same ``ZstdCompressor`` and +``ZstdDecompressor`` instance simultaneously from different threads. It is +fine to have different threads call into a single instance, just not at the +same time. + +Some operations require multiple function calls to complete. e.g. streaming +operations. A single ``ZstdCompressor`` or ``ZstdDecompressor`` cannot be used +for simultaneously active operations. e.g. you must not start a streaming +operation when another streaming operation is already active. + +The C extension releases the GIL during non-trivial calls into the zstd C +API. Non-trivial calls are notably compression and decompression. Trivial +calls are things like parsing frame parameters. Where the GIL is released +is considered an implementation detail and can change in any release. + +APIs that accept bytes-like objects don't enforce that the underlying object +is read-only. However, it is assumed that the passed object is read-only for +the duration of the function call. It is possible to pass a mutable object +(like a ``bytearray``) to e.g. ``ZstdCompressor.compress()``, have the GIL +released, and mutate the object from another thread. Such a race condition +is a bug in the consumer of python-zstandard. Most Python data types are +immutable, so unless you are doing something fancy, you don't need to +worry about this. + +Note on Zstandard's *Experimental* API +====================================== + +Many of the Zstandard APIs used by this module are marked as *experimental* +within the Zstandard project. + +It is unclear how Zstandard's C API will evolve over time, especially with +regards to this *experimental* functionality. We will try to maintain +backwards compatibility at the Python API level. However, we cannot +guarantee this for things not under our control. + +Since a copy of the Zstandard source code is distributed with this +module and since we compile against it, the behavior of a specific +version of this module should be constant for all of time. So if you +pin the version of this module used in your projects (which is a Python +best practice), you should be shielded from unwanted future changes. + +Donate +====== + +A lot of time has been invested into this project by the author. + +If you find this project useful and would like to thank the author for +their work, consider donating some money. Any amount is appreciated. + +.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif + :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted + :alt: Donate via PayPal + +.. |ci-status| image:: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master + :target: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master diff --git a/contrib/python/zstandard/py2/c-ext/bufferutil.c b/contrib/python/zstandard/py2/c-ext/bufferutil.c new file mode 100644 index 00000000000..5094a4bb92f --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/bufferutil.c @@ -0,0 +1,792 @@ +/** +* Copyright (c) 2017-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(BufferWithSegments__doc__, +"BufferWithSegments - A memory buffer holding known sub-segments.\n" +"\n" +"This type represents a contiguous chunk of memory containing N discrete\n" +"items within sub-segments of that memory.\n" +"\n" +"Segments within the buffer are stored as an array of\n" +"``(offset, length)`` pairs, where each element is an unsigned 64-bit\n" +"integer using the host/native bit order representation.\n" +"\n" +"The type exists to facilitate operations against N>1 items without the\n" +"overhead of Python object creation and management.\n" +); + +static void BufferWithSegments_dealloc(ZstdBufferWithSegments* self) { + /* Backing memory is either canonically owned by a Py_buffer or by us. */ + if (self->parent.buf) { + PyBuffer_Release(&self->parent); + } + else if (self->useFree) { + free(self->data); + } + else { + PyMem_Free(self->data); + } + + self->data = NULL; + + if (self->useFree) { + free(self->segments); + } + else { + PyMem_Free(self->segments); + } + + self->segments = NULL; + + PyObject_Del(self); +} + +static int BufferWithSegments_init(ZstdBufferWithSegments* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + "segments", + NULL + }; + + Py_buffer segments; + Py_ssize_t segmentCount; + Py_ssize_t i; + + memset(&self->parent, 0, sizeof(self->parent)); + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*y*:BufferWithSegments", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*s*:BufferWithSegments", +#endif + kwlist, &self->parent, &segments)) { + return -1; + } + + if (!PyBuffer_IsContiguous(&self->parent, 'C') || self->parent.ndim > 1) { + PyErr_SetString(PyExc_ValueError, "data buffer should be contiguous and have a single dimension"); + goto except; + } + + if (!PyBuffer_IsContiguous(&segments, 'C') || segments.ndim > 1) { + PyErr_SetString(PyExc_ValueError, "segments buffer should be contiguous and have a single dimension"); + goto except; + } + + if (segments.len % sizeof(BufferSegment)) { + PyErr_Format(PyExc_ValueError, "segments array size is not a multiple of %zu", + sizeof(BufferSegment)); + goto except; + } + + segmentCount = segments.len / sizeof(BufferSegment); + + /* Validate segments data, as blindly trusting it could lead to arbitrary + memory access. */ + for (i = 0; i < segmentCount; i++) { + BufferSegment* segment = &((BufferSegment*)(segments.buf))[i]; + + if (segment->offset + segment->length > (unsigned long long)self->parent.len) { + PyErr_SetString(PyExc_ValueError, "offset within segments array references memory outside buffer"); + goto except; + return -1; + } + } + + /* Make a copy of the segments data. It is cheap to do so and is a guard + against caller changing offsets, which has security implications. */ + self->segments = PyMem_Malloc(segments.len); + if (!self->segments) { + PyErr_NoMemory(); + goto except; + } + + memcpy(self->segments, segments.buf, segments.len); + PyBuffer_Release(&segments); + + self->data = self->parent.buf; + self->dataSize = self->parent.len; + self->segmentCount = segmentCount; + + return 0; + +except: + PyBuffer_Release(&self->parent); + PyBuffer_Release(&segments); + return -1; +} + +/** + * Construct a BufferWithSegments from existing memory and offsets. + * + * Ownership of the backing memory and BufferSegments will be transferred to + * the created object and freed when the BufferWithSegments is destroyed. + */ +ZstdBufferWithSegments* BufferWithSegments_FromMemory(void* data, unsigned long long dataSize, + BufferSegment* segments, Py_ssize_t segmentsSize) { + ZstdBufferWithSegments* result = NULL; + Py_ssize_t i; + + if (NULL == data) { + PyErr_SetString(PyExc_ValueError, "data is NULL"); + return NULL; + } + + if (NULL == segments) { + PyErr_SetString(PyExc_ValueError, "segments is NULL"); + return NULL; + } + + for (i = 0; i < segmentsSize; i++) { + BufferSegment* segment = &segments[i]; + + if (segment->offset + segment->length > dataSize) { + PyErr_SetString(PyExc_ValueError, "offset in segments overflows buffer size"); + return NULL; + } + } + + result = PyObject_New(ZstdBufferWithSegments, &ZstdBufferWithSegmentsType); + if (NULL == result) { + return NULL; + } + + result->useFree = 0; + + memset(&result->parent, 0, sizeof(result->parent)); + result->data = data; + result->dataSize = dataSize; + result->segments = segments; + result->segmentCount = segmentsSize; + + return result; +} + +static Py_ssize_t BufferWithSegments_length(ZstdBufferWithSegments* self) { + return self->segmentCount; +} + +static ZstdBufferSegment* BufferWithSegments_item(ZstdBufferWithSegments* self, Py_ssize_t i) { + ZstdBufferSegment* result = NULL; + + if (i < 0) { + PyErr_SetString(PyExc_IndexError, "offset must be non-negative"); + return NULL; + } + + if (i >= self->segmentCount) { + PyErr_Format(PyExc_IndexError, "offset must be less than %zd", self->segmentCount); + return NULL; + } + + if (self->segments[i].length > PY_SSIZE_T_MAX) { + PyErr_Format(PyExc_ValueError, + "item at offset %zd is too large for this platform", i); + return NULL; + } + + result = (ZstdBufferSegment*)PyObject_CallObject((PyObject*)&ZstdBufferSegmentType, NULL); + if (NULL == result) { + return NULL; + } + + result->parent = (PyObject*)self; + Py_INCREF(self); + + result->data = (char*)self->data + self->segments[i].offset; + result->dataSize = (Py_ssize_t)self->segments[i].length; + result->offset = self->segments[i].offset; + + return result; +} + +#if PY_MAJOR_VERSION >= 3 +static int BufferWithSegments_getbuffer(ZstdBufferWithSegments* self, Py_buffer* view, int flags) { + if (self->dataSize > PY_SSIZE_T_MAX) { + view->obj = NULL; + PyErr_SetString(PyExc_BufferError, "buffer is too large for this platform"); + return -1; + } + + return PyBuffer_FillInfo(view, (PyObject*)self, self->data, (Py_ssize_t)self->dataSize, 1, flags); +} +#else +static Py_ssize_t BufferWithSegments_getreadbuffer(ZstdBufferWithSegments* self, Py_ssize_t segment, void **ptrptr) { + if (segment != 0) { + PyErr_SetString(PyExc_ValueError, "segment number must be 0"); + return -1; + } + + if (self->dataSize > PY_SSIZE_T_MAX) { + PyErr_SetString(PyExc_ValueError, "buffer is too large for this platform"); + return -1; + } + + *ptrptr = self->data; + return (Py_ssize_t)self->dataSize; +} + +static Py_ssize_t BufferWithSegments_getsegcount(ZstdBufferWithSegments* self, Py_ssize_t* len) { + if (len) { + *len = 1; + } + + return 1; +} +#endif + +PyDoc_STRVAR(BufferWithSegments_tobytes__doc__, +"Obtain a bytes instance for this buffer.\n" +); + +static PyObject* BufferWithSegments_tobytes(ZstdBufferWithSegments* self) { + if (self->dataSize > PY_SSIZE_T_MAX) { + PyErr_SetString(PyExc_ValueError, "buffer is too large for this platform"); + return NULL; + } + + return PyBytes_FromStringAndSize(self->data, (Py_ssize_t)self->dataSize); +} + +PyDoc_STRVAR(BufferWithSegments_segments__doc__, +"Obtain a BufferSegments describing segments in this sintance.\n" +); + +static ZstdBufferSegments* BufferWithSegments_segments(ZstdBufferWithSegments* self) { + ZstdBufferSegments* result = (ZstdBufferSegments*)PyObject_CallObject((PyObject*)&ZstdBufferSegmentsType, NULL); + if (NULL == result) { + return NULL; + } + + result->parent = (PyObject*)self; + Py_INCREF(self); + result->segments = self->segments; + result->segmentCount = self->segmentCount; + + return result; +} + +static PySequenceMethods BufferWithSegments_sq = { + (lenfunc)BufferWithSegments_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + (ssizeargfunc)BufferWithSegments_item, /* sq_item */ + 0, /* sq_ass_item */ + 0, /* sq_contains */ + 0, /* sq_inplace_concat */ + 0 /* sq_inplace_repeat */ +}; + +static PyBufferProcs BufferWithSegments_as_buffer = { +#if PY_MAJOR_VERSION >= 3 + (getbufferproc)BufferWithSegments_getbuffer, /* bf_getbuffer */ + 0 /* bf_releasebuffer */ +#else + (readbufferproc)BufferWithSegments_getreadbuffer, /* bf_getreadbuffer */ + 0, /* bf_getwritebuffer */ + (segcountproc)BufferWithSegments_getsegcount, /* bf_getsegcount */ + 0 /* bf_getcharbuffer */ +#endif +}; + +static PyMethodDef BufferWithSegments_methods[] = { + { "segments", (PyCFunction)BufferWithSegments_segments, + METH_NOARGS, BufferWithSegments_segments__doc__ }, + { "tobytes", (PyCFunction)BufferWithSegments_tobytes, + METH_NOARGS, BufferWithSegments_tobytes__doc__ }, + { NULL, NULL } +}; + +static PyMemberDef BufferWithSegments_members[] = { + { "size", T_ULONGLONG, offsetof(ZstdBufferWithSegments, dataSize), + READONLY, "total size of the buffer in bytes" }, + { NULL } +}; + +PyTypeObject ZstdBufferWithSegmentsType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.BufferWithSegments", /* tp_name */ + sizeof(ZstdBufferWithSegments),/* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)BufferWithSegments_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + &BufferWithSegments_sq, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + &BufferWithSegments_as_buffer, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + BufferWithSegments__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + BufferWithSegments_methods, /* tp_methods */ + BufferWithSegments_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)BufferWithSegments_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +PyDoc_STRVAR(BufferSegments__doc__, +"BufferSegments - Represents segments/offsets within a BufferWithSegments\n" +); + +static void BufferSegments_dealloc(ZstdBufferSegments* self) { + Py_CLEAR(self->parent); + PyObject_Del(self); +} + +#if PY_MAJOR_VERSION >= 3 +static int BufferSegments_getbuffer(ZstdBufferSegments* self, Py_buffer* view, int flags) { + return PyBuffer_FillInfo(view, (PyObject*)self, + (void*)self->segments, self->segmentCount * sizeof(BufferSegment), + 1, flags); +} +#else +static Py_ssize_t BufferSegments_getreadbuffer(ZstdBufferSegments* self, Py_ssize_t segment, void **ptrptr) { + if (segment != 0) { + PyErr_SetString(PyExc_ValueError, "segment number must be 0"); + return -1; + } + + *ptrptr = (void*)self->segments; + return self->segmentCount * sizeof(BufferSegment); +} + +static Py_ssize_t BufferSegments_getsegcount(ZstdBufferSegments* self, Py_ssize_t* len) { + if (len) { + *len = 1; + } + + return 1; +} +#endif + +static PyBufferProcs BufferSegments_as_buffer = { +#if PY_MAJOR_VERSION >= 3 + (getbufferproc)BufferSegments_getbuffer, + 0 +#else + (readbufferproc)BufferSegments_getreadbuffer, + 0, + (segcountproc)BufferSegments_getsegcount, + 0 +#endif +}; + +PyTypeObject ZstdBufferSegmentsType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.BufferSegments", /* tp_name */ + sizeof(ZstdBufferSegments),/* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)BufferSegments_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + &BufferSegments_as_buffer, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + BufferSegments__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +PyDoc_STRVAR(BufferSegment__doc__, + "BufferSegment - Represents a segment within a BufferWithSegments\n" +); + +static void BufferSegment_dealloc(ZstdBufferSegment* self) { + Py_CLEAR(self->parent); + PyObject_Del(self); +} + +static Py_ssize_t BufferSegment_length(ZstdBufferSegment* self) { + return self->dataSize; +} + +#if PY_MAJOR_VERSION >= 3 +static int BufferSegment_getbuffer(ZstdBufferSegment* self, Py_buffer* view, int flags) { + return PyBuffer_FillInfo(view, (PyObject*)self, + self->data, self->dataSize, 1, flags); +} +#else +static Py_ssize_t BufferSegment_getreadbuffer(ZstdBufferSegment* self, Py_ssize_t segment, void **ptrptr) { + if (segment != 0) { + PyErr_SetString(PyExc_ValueError, "segment number must be 0"); + return -1; + } + + *ptrptr = self->data; + return self->dataSize; +} + +static Py_ssize_t BufferSegment_getsegcount(ZstdBufferSegment* self, Py_ssize_t* len) { + if (len) { + *len = 1; + } + + return 1; +} +#endif + +PyDoc_STRVAR(BufferSegment_tobytes__doc__, +"Obtain a bytes instance for this segment.\n" +); + +static PyObject* BufferSegment_tobytes(ZstdBufferSegment* self) { + return PyBytes_FromStringAndSize(self->data, self->dataSize); +} + +static PySequenceMethods BufferSegment_sq = { + (lenfunc)BufferSegment_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + 0, /* sq_item */ + 0, /* sq_ass_item */ + 0, /* sq_contains */ + 0, /* sq_inplace_concat */ + 0 /* sq_inplace_repeat */ +}; + +static PyBufferProcs BufferSegment_as_buffer = { +#if PY_MAJOR_VERSION >= 3 + (getbufferproc)BufferSegment_getbuffer, + 0 +#else + (readbufferproc)BufferSegment_getreadbuffer, + 0, + (segcountproc)BufferSegment_getsegcount, + 0 +#endif +}; + +static PyMethodDef BufferSegment_methods[] = { + { "tobytes", (PyCFunction)BufferSegment_tobytes, + METH_NOARGS, BufferSegment_tobytes__doc__ }, + { NULL, NULL } +}; + +static PyMemberDef BufferSegment_members[] = { + { "offset", T_ULONGLONG, offsetof(ZstdBufferSegment, offset), READONLY, + "offset of segment within parent buffer" }, + { NULL } +}; + +PyTypeObject ZstdBufferSegmentType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.BufferSegment", /* tp_name */ + sizeof(ZstdBufferSegment),/* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)BufferSegment_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + &BufferSegment_sq, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + &BufferSegment_as_buffer, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + BufferSegment__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + BufferSegment_methods, /* tp_methods */ + BufferSegment_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +PyDoc_STRVAR(BufferWithSegmentsCollection__doc__, +"Represents a collection of BufferWithSegments.\n" +); + +static void BufferWithSegmentsCollection_dealloc(ZstdBufferWithSegmentsCollection* self) { + Py_ssize_t i; + + if (self->firstElements) { + PyMem_Free(self->firstElements); + self->firstElements = NULL; + } + + if (self->buffers) { + for (i = 0; i < self->bufferCount; i++) { + Py_CLEAR(self->buffers[i]); + } + + PyMem_Free(self->buffers); + self->buffers = NULL; + } + + PyObject_Del(self); +} + +static int BufferWithSegmentsCollection_init(ZstdBufferWithSegmentsCollection* self, PyObject* args) { + Py_ssize_t size; + Py_ssize_t i; + Py_ssize_t offset = 0; + + size = PyTuple_Size(args); + if (-1 == size) { + return -1; + } + + if (0 == size) { + PyErr_SetString(PyExc_ValueError, "must pass at least 1 argument"); + return -1; + } + + for (i = 0; i < size; i++) { + PyObject* item = PyTuple_GET_ITEM(args, i); + if (!PyObject_TypeCheck(item, &ZstdBufferWithSegmentsType)) { + PyErr_SetString(PyExc_TypeError, "arguments must be BufferWithSegments instances"); + return -1; + } + + if (0 == ((ZstdBufferWithSegments*)item)->segmentCount || + 0 == ((ZstdBufferWithSegments*)item)->dataSize) { + PyErr_SetString(PyExc_ValueError, "ZstdBufferWithSegments cannot be empty"); + return -1; + } + } + + self->buffers = PyMem_Malloc(size * sizeof(ZstdBufferWithSegments*)); + if (NULL == self->buffers) { + PyErr_NoMemory(); + return -1; + } + + self->firstElements = PyMem_Malloc(size * sizeof(Py_ssize_t)); + if (NULL == self->firstElements) { + PyMem_Free(self->buffers); + self->buffers = NULL; + PyErr_NoMemory(); + return -1; + } + + self->bufferCount = size; + + for (i = 0; i < size; i++) { + ZstdBufferWithSegments* item = (ZstdBufferWithSegments*)PyTuple_GET_ITEM(args, i); + + self->buffers[i] = item; + Py_INCREF(item); + + if (i > 0) { + self->firstElements[i - 1] = offset; + } + + offset += item->segmentCount; + } + + self->firstElements[size - 1] = offset; + + return 0; +} + +static PyObject* BufferWithSegmentsCollection_size(ZstdBufferWithSegmentsCollection* self) { + Py_ssize_t i; + Py_ssize_t j; + unsigned long long size = 0; + + for (i = 0; i < self->bufferCount; i++) { + for (j = 0; j < self->buffers[i]->segmentCount; j++) { + size += self->buffers[i]->segments[j].length; + } + } + + return PyLong_FromUnsignedLongLong(size); +} + +Py_ssize_t BufferWithSegmentsCollection_length(ZstdBufferWithSegmentsCollection* self) { + return self->firstElements[self->bufferCount - 1]; +} + +static ZstdBufferSegment* BufferWithSegmentsCollection_item(ZstdBufferWithSegmentsCollection* self, Py_ssize_t i) { + Py_ssize_t bufferOffset; + + if (i < 0) { + PyErr_SetString(PyExc_IndexError, "offset must be non-negative"); + return NULL; + } + + if (i >= BufferWithSegmentsCollection_length(self)) { + PyErr_Format(PyExc_IndexError, "offset must be less than %zd", + BufferWithSegmentsCollection_length(self)); + return NULL; + } + + for (bufferOffset = 0; bufferOffset < self->bufferCount; bufferOffset++) { + Py_ssize_t offset = 0; + + if (i < self->firstElements[bufferOffset]) { + if (bufferOffset > 0) { + offset = self->firstElements[bufferOffset - 1]; + } + + return BufferWithSegments_item(self->buffers[bufferOffset], i - offset); + } + } + + PyErr_SetString(ZstdError, "error resolving segment; this should not happen"); + return NULL; +} + +static PySequenceMethods BufferWithSegmentsCollection_sq = { + (lenfunc)BufferWithSegmentsCollection_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + (ssizeargfunc)BufferWithSegmentsCollection_item, /* sq_item */ + 0, /* sq_ass_item */ + 0, /* sq_contains */ + 0, /* sq_inplace_concat */ + 0 /* sq_inplace_repeat */ +}; + +static PyMethodDef BufferWithSegmentsCollection_methods[] = { + { "size", (PyCFunction)BufferWithSegmentsCollection_size, + METH_NOARGS, PyDoc_STR("total size in bytes of all segments") }, + { NULL, NULL } +}; + +PyTypeObject ZstdBufferWithSegmentsCollectionType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.BufferWithSegmentsCollection", /* tp_name */ + sizeof(ZstdBufferWithSegmentsCollection),/* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)BufferWithSegmentsCollection_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + &BufferWithSegmentsCollection_sq, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + BufferWithSegmentsCollection__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + /* TODO implement iterator for performance. */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + BufferWithSegmentsCollection_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)BufferWithSegmentsCollection_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void bufferutil_module_init(PyObject* mod) { + Py_TYPE(&ZstdBufferWithSegmentsType) = &PyType_Type; + if (PyType_Ready(&ZstdBufferWithSegmentsType) < 0) { + return; + } + + Py_INCREF(&ZstdBufferWithSegmentsType); + PyModule_AddObject(mod, "BufferWithSegments", (PyObject*)&ZstdBufferWithSegmentsType); + + Py_TYPE(&ZstdBufferSegmentsType) = &PyType_Type; + if (PyType_Ready(&ZstdBufferSegmentsType) < 0) { + return; + } + + Py_INCREF(&ZstdBufferSegmentsType); + PyModule_AddObject(mod, "BufferSegments", (PyObject*)&ZstdBufferSegmentsType); + + Py_TYPE(&ZstdBufferSegmentType) = &PyType_Type; + if (PyType_Ready(&ZstdBufferSegmentType) < 0) { + return; + } + + Py_INCREF(&ZstdBufferSegmentType); + PyModule_AddObject(mod, "BufferSegment", (PyObject*)&ZstdBufferSegmentType); + + Py_TYPE(&ZstdBufferWithSegmentsCollectionType) = &PyType_Type; + if (PyType_Ready(&ZstdBufferWithSegmentsCollectionType) < 0) { + return; + } + + Py_INCREF(&ZstdBufferWithSegmentsCollectionType); + PyModule_AddObject(mod, "BufferWithSegmentsCollection", (PyObject*)&ZstdBufferWithSegmentsCollectionType); +} diff --git a/contrib/python/zstandard/py2/c-ext/compressionchunker.c b/contrib/python/zstandard/py2/c-ext/compressionchunker.c new file mode 100644 index 00000000000..6677ebe59f1 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressionchunker.c @@ -0,0 +1,360 @@ +/** +* Copyright (c) 2018-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdCompressionChunkerIterator__doc__, + "Iterator of output chunks from ZstdCompressionChunker.\n" +); + +static void ZstdCompressionChunkerIterator_dealloc(ZstdCompressionChunkerIterator* self) { + Py_XDECREF(self->chunker); + + PyObject_Del(self); +} + +static PyObject* ZstdCompressionChunkerIterator_iter(PyObject* self) { + Py_INCREF(self); + return self; +} + +static PyObject* ZstdCompressionChunkerIterator_iternext(ZstdCompressionChunkerIterator* self) { + size_t zresult; + PyObject* chunk; + ZstdCompressionChunker* chunker = self->chunker; + ZSTD_EndDirective zFlushMode; + + if (self->mode != compressionchunker_mode_normal && chunker->input.pos != chunker->input.size) { + PyErr_SetString(ZstdError, "input should have been fully consumed before calling flush() or finish()"); + return NULL; + } + + if (chunker->finished) { + return NULL; + } + + /* If we have data left in the input, consume it. */ + while (chunker->input.pos < chunker->input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(chunker->compressor->cctx, &chunker->output, + &chunker->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + /* Input is fully consumed. */ + if (chunker->input.pos == chunker->input.size) { + chunker->input.src = NULL; + chunker->input.pos = 0; + chunker->input.size = 0; + PyBuffer_Release(&chunker->inBuffer); + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + /* If it produced a full output chunk, emit it. */ + if (chunker->output.pos == chunker->output.size) { + chunk = PyBytes_FromStringAndSize(chunker->output.dst, chunker->output.pos); + if (!chunk) { + return NULL; + } + + chunker->output.pos = 0; + + return chunk; + } + + /* Else continue to compress available input data. */ + } + + /* We also need this here for the special case of an empty input buffer. */ + if (chunker->input.pos == chunker->input.size) { + chunker->input.src = NULL; + chunker->input.pos = 0; + chunker->input.size = 0; + PyBuffer_Release(&chunker->inBuffer); + } + + /* No more input data. A partial chunk may be in chunker->output. + * If we're in normal compression mode, we're done. Otherwise if we're in + * flush or finish mode, we need to emit what data remains. + */ + if (self->mode == compressionchunker_mode_normal) { + /* We don't need to set StopIteration. */ + return NULL; + } + + if (self->mode == compressionchunker_mode_flush) { + zFlushMode = ZSTD_e_flush; + } + else if (self->mode == compressionchunker_mode_finish) { + zFlushMode = ZSTD_e_end; + } + else { + PyErr_SetString(ZstdError, "unhandled compression mode; this should never happen"); + return NULL; + } + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(chunker->compressor->cctx, &chunker->output, + &chunker->input, zFlushMode); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + if (!zresult && chunker->output.pos == 0) { + return NULL; + } + + chunk = PyBytes_FromStringAndSize(chunker->output.dst, chunker->output.pos); + if (!chunk) { + return NULL; + } + + chunker->output.pos = 0; + + if (!zresult && self->mode == compressionchunker_mode_finish) { + chunker->finished = 1; + } + + return chunk; +} + +PyTypeObject ZstdCompressionChunkerIteratorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionChunkerIterator", /* tp_name */ + sizeof(ZstdCompressionChunkerIterator), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionChunkerIterator_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressionChunkerIterator__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + ZstdCompressionChunkerIterator_iter, /* tp_iter */ + (iternextfunc)ZstdCompressionChunkerIterator_iternext, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +PyDoc_STRVAR(ZstdCompressionChunker__doc__, + "Compress chunks iteratively into exact chunk sizes.\n" +); + +static void ZstdCompressionChunker_dealloc(ZstdCompressionChunker* self) { + PyBuffer_Release(&self->inBuffer); + self->input.src = NULL; + + PyMem_Free(self->output.dst); + self->output.dst = NULL; + + Py_XDECREF(self->compressor); + + PyObject_Del(self); +} + +static ZstdCompressionChunkerIterator* ZstdCompressionChunker_compress(ZstdCompressionChunker* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + NULL + }; + + ZstdCompressionChunkerIterator* result; + + if (self->finished) { + PyErr_SetString(ZstdError, "cannot call compress() after compression finished"); + return NULL; + } + + if (self->inBuffer.obj) { + PyErr_SetString(ZstdError, + "cannot perform operation before consuming output from previous operation"); + return NULL; + } + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:compress", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:compress", +#endif + kwlist, &self->inBuffer)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&self->inBuffer, 'C') || self->inBuffer.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + PyBuffer_Release(&self->inBuffer); + return NULL; + } + + result = (ZstdCompressionChunkerIterator*)PyObject_CallObject((PyObject*)&ZstdCompressionChunkerIteratorType, NULL); + if (!result) { + PyBuffer_Release(&self->inBuffer); + return NULL; + } + + self->input.src = self->inBuffer.buf; + self->input.size = self->inBuffer.len; + self->input.pos = 0; + + result->chunker = self; + Py_INCREF(result->chunker); + + result->mode = compressionchunker_mode_normal; + + return result; +} + +static ZstdCompressionChunkerIterator* ZstdCompressionChunker_finish(ZstdCompressionChunker* self) { + ZstdCompressionChunkerIterator* result; + + if (self->finished) { + PyErr_SetString(ZstdError, "cannot call finish() after compression finished"); + return NULL; + } + + if (self->inBuffer.obj) { + PyErr_SetString(ZstdError, + "cannot call finish() before consuming output from previous operation"); + return NULL; + } + + result = (ZstdCompressionChunkerIterator*)PyObject_CallObject((PyObject*)&ZstdCompressionChunkerIteratorType, NULL); + if (!result) { + return NULL; + } + + result->chunker = self; + Py_INCREF(result->chunker); + + result->mode = compressionchunker_mode_finish; + + return result; +} + +static ZstdCompressionChunkerIterator* ZstdCompressionChunker_flush(ZstdCompressionChunker* self, PyObject* args, PyObject* kwargs) { + ZstdCompressionChunkerIterator* result; + + if (self->finished) { + PyErr_SetString(ZstdError, "cannot call flush() after compression finished"); + return NULL; + } + + if (self->inBuffer.obj) { + PyErr_SetString(ZstdError, + "cannot call flush() before consuming output from previous operation"); + return NULL; + } + + result = (ZstdCompressionChunkerIterator*)PyObject_CallObject((PyObject*)&ZstdCompressionChunkerIteratorType, NULL); + if (!result) { + return NULL; + } + + result->chunker = self; + Py_INCREF(result->chunker); + + result->mode = compressionchunker_mode_flush; + + return result; +} + +static PyMethodDef ZstdCompressionChunker_methods[] = { + { "compress", (PyCFunction)ZstdCompressionChunker_compress, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("compress data") }, + { "finish", (PyCFunction)ZstdCompressionChunker_finish, METH_NOARGS, + PyDoc_STR("finish compression operation") }, + { "flush", (PyCFunction)ZstdCompressionChunker_flush, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("finish compression operation") }, + { NULL, NULL } +}; + +PyTypeObject ZstdCompressionChunkerType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionChunkerType", /* tp_name */ + sizeof(ZstdCompressionChunker), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionChunker_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressionChunker__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionChunker_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressionchunker_module_init(PyObject* module) { + Py_TYPE(&ZstdCompressionChunkerIteratorType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionChunkerIteratorType) < 0) { + return; + } + + Py_TYPE(&ZstdCompressionChunkerType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionChunkerType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/compressiondict.c b/contrib/python/zstandard/py2/c-ext/compressiondict.c new file mode 100644 index 00000000000..1379861648e --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressiondict.c @@ -0,0 +1,411 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "dict_size", + "samples", + "k", + "d", + "notifications", + "dict_id", + "level", + "steps", + "threads", + NULL + }; + + size_t capacity; + PyObject* samples; + unsigned k = 0; + unsigned d = 0; + unsigned notifications = 0; + unsigned dictID = 0; + int level = 0; + unsigned steps = 0; + int threads = 0; + ZDICT_cover_params_t params; + Py_ssize_t samplesLen; + Py_ssize_t i; + size_t samplesSize = 0; + void* sampleBuffer = NULL; + size_t* sampleSizes = NULL; + void* sampleOffset; + Py_ssize_t sampleSize; + void* dict = NULL; + size_t zresult; + ZstdCompressionDict* result = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "nO!|IIIIiIi:train_dictionary", + kwlist, &capacity, &PyList_Type, &samples, + &k, &d, ¬ifications, &dictID, &level, &steps, &threads)) { + return NULL; + } + + if (threads < 0) { + threads = cpu_count(); + } + + memset(¶ms, 0, sizeof(params)); + params.k = k; + params.d = d; + params.steps = steps; + params.nbThreads = threads; + params.zParams.notificationLevel = notifications; + params.zParams.dictID = dictID; + params.zParams.compressionLevel = level; + + /* Figure out total size of input samples. */ + samplesLen = PyList_Size(samples); + for (i = 0; i < samplesLen; i++) { + PyObject* sampleItem = PyList_GET_ITEM(samples, i); + + if (!PyBytes_Check(sampleItem)) { + PyErr_SetString(PyExc_ValueError, "samples must be bytes"); + return NULL; + } + samplesSize += PyBytes_GET_SIZE(sampleItem); + } + + sampleBuffer = PyMem_Malloc(samplesSize); + if (!sampleBuffer) { + PyErr_NoMemory(); + goto finally; + } + + sampleSizes = PyMem_Malloc(samplesLen * sizeof(size_t)); + if (!sampleSizes) { + PyErr_NoMemory(); + goto finally; + } + + sampleOffset = sampleBuffer; + for (i = 0; i < samplesLen; i++) { + PyObject* sampleItem = PyList_GET_ITEM(samples, i); + sampleSize = PyBytes_GET_SIZE(sampleItem); + sampleSizes[i] = sampleSize; + memcpy(sampleOffset, PyBytes_AS_STRING(sampleItem), sampleSize); + sampleOffset = (char*)sampleOffset + sampleSize; + } + + dict = PyMem_Malloc(capacity); + if (!dict) { + PyErr_NoMemory(); + goto finally; + } + + Py_BEGIN_ALLOW_THREADS + /* No parameters uses the default function, which will use default params + and call ZDICT_optimizeTrainFromBuffer_cover under the hood. */ + if (!params.k && !params.d && !params.zParams.compressionLevel + && !params.zParams.notificationLevel && !params.zParams.dictID) { + zresult = ZDICT_trainFromBuffer(dict, capacity, sampleBuffer, + sampleSizes, (unsigned)samplesLen); + } + /* Use optimize mode if user controlled steps or threads explicitly. */ + else if (params.steps || params.nbThreads) { + zresult = ZDICT_optimizeTrainFromBuffer_cover(dict, capacity, + sampleBuffer, sampleSizes, (unsigned)samplesLen, ¶ms); + } + /* Non-optimize mode with explicit control. */ + else { + zresult = ZDICT_trainFromBuffer_cover(dict, capacity, + sampleBuffer, sampleSizes, (unsigned)samplesLen, params); + } + Py_END_ALLOW_THREADS + + if (ZDICT_isError(zresult)) { + PyMem_Free(dict); + PyErr_Format(ZstdError, "cannot train dict: %s", ZDICT_getErrorName(zresult)); + goto finally; + } + + result = PyObject_New(ZstdCompressionDict, &ZstdCompressionDictType); + if (!result) { + PyMem_Free(dict); + goto finally; + } + + result->dictData = dict; + result->dictSize = zresult; + result->dictType = ZSTD_dct_fullDict; + result->d = params.d; + result->k = params.k; + result->cdict = NULL; + result->ddict = NULL; + +finally: + PyMem_Free(sampleBuffer); + PyMem_Free(sampleSizes); + + return result; +} + +int ensure_ddict(ZstdCompressionDict* dict) { + if (dict->ddict) { + return 0; + } + + Py_BEGIN_ALLOW_THREADS + dict->ddict = ZSTD_createDDict_advanced(dict->dictData, dict->dictSize, + ZSTD_dlm_byRef, dict->dictType, ZSTD_defaultCMem); + Py_END_ALLOW_THREADS + if (!dict->ddict) { + PyErr_SetString(ZstdError, "could not create decompression dict"); + return 1; + } + + return 0; +} + +PyDoc_STRVAR(ZstdCompressionDict__doc__, +"ZstdCompressionDict(data) - Represents a computed compression dictionary\n" +"\n" +"This type holds the results of a computed Zstandard compression dictionary.\n" +"Instances are obtained by calling ``train_dictionary()`` or by passing\n" +"bytes obtained from another source into the constructor.\n" +); + +static int ZstdCompressionDict_init(ZstdCompressionDict* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + "dict_type", + NULL + }; + + int result = -1; + Py_buffer source; + unsigned dictType = ZSTD_dct_auto; + + self->dictData = NULL; + self->dictSize = 0; + self->cdict = NULL; + self->ddict = NULL; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|I:ZstdCompressionDict", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*|I:ZstdCompressionDict", +#endif + kwlist, &source, &dictType)) { + return -1; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + if (dictType != ZSTD_dct_auto && dictType != ZSTD_dct_rawContent + && dictType != ZSTD_dct_fullDict) { + PyErr_Format(PyExc_ValueError, + "invalid dictionary load mode: %d; must use DICT_TYPE_* constants", + dictType); + goto finally; + } + + self->dictType = dictType; + + self->dictData = PyMem_Malloc(source.len); + if (!self->dictData) { + PyErr_NoMemory(); + goto finally; + } + + memcpy(self->dictData, source.buf, source.len); + self->dictSize = source.len; + + result = 0; + +finally: + PyBuffer_Release(&source); + return result; +} + +static void ZstdCompressionDict_dealloc(ZstdCompressionDict* self) { + if (self->cdict) { + ZSTD_freeCDict(self->cdict); + self->cdict = NULL; + } + + if (self->ddict) { + ZSTD_freeDDict(self->ddict); + self->ddict = NULL; + } + + if (self->dictData) { + PyMem_Free(self->dictData); + self->dictData = NULL; + } + + PyObject_Del(self); +} + +PyDoc_STRVAR(ZstdCompressionDict_precompute_compress__doc__, +"Precompute a dictionary so it can be used by multiple compressors.\n" +); + +static PyObject* ZstdCompressionDict_precompute_compress(ZstdCompressionDict* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "level", + "compression_params", + NULL + }; + + int level = 0; + ZstdCompressionParametersObject* compressionParams = NULL; + ZSTD_compressionParameters cParams; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iO!:precompute_compress", kwlist, + &level, &ZstdCompressionParametersType, &compressionParams)) { + return NULL; + } + + if (level && compressionParams) { + PyErr_SetString(PyExc_ValueError, + "must only specify one of level or compression_params"); + return NULL; + } + + if (!level && !compressionParams) { + PyErr_SetString(PyExc_ValueError, + "must specify one of level or compression_params"); + return NULL; + } + + if (self->cdict) { + zresult = ZSTD_freeCDict(self->cdict); + self->cdict = NULL; + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to free CDict: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + } + + if (level) { + cParams = ZSTD_getCParams(level, 0, self->dictSize); + } + else { + if (to_cparams(compressionParams, &cParams)) { + return NULL; + } + } + + assert(!self->cdict); + self->cdict = ZSTD_createCDict_advanced(self->dictData, self->dictSize, + ZSTD_dlm_byRef, self->dictType, cParams, ZSTD_defaultCMem); + + if (!self->cdict) { + PyErr_SetString(ZstdError, "unable to precompute dictionary"); + return NULL; + } + + Py_RETURN_NONE; +} + +static PyObject* ZstdCompressionDict_dict_id(ZstdCompressionDict* self) { + unsigned dictID = ZDICT_getDictID(self->dictData, self->dictSize); + + return PyLong_FromLong(dictID); +} + +static PyObject* ZstdCompressionDict_as_bytes(ZstdCompressionDict* self) { + return PyBytes_FromStringAndSize(self->dictData, self->dictSize); +} + +static PyMethodDef ZstdCompressionDict_methods[] = { + { "dict_id", (PyCFunction)ZstdCompressionDict_dict_id, METH_NOARGS, + PyDoc_STR("dict_id() -- obtain the numeric dictionary ID") }, + { "as_bytes", (PyCFunction)ZstdCompressionDict_as_bytes, METH_NOARGS, + PyDoc_STR("as_bytes() -- obtain the raw bytes constituting the dictionary data") }, + { "precompute_compress", (PyCFunction)ZstdCompressionDict_precompute_compress, + METH_VARARGS | METH_KEYWORDS, ZstdCompressionDict_precompute_compress__doc__ }, + { NULL, NULL } +}; + +static PyMemberDef ZstdCompressionDict_members[] = { + { "k", T_UINT, offsetof(ZstdCompressionDict, k), READONLY, + "segment size" }, + { "d", T_UINT, offsetof(ZstdCompressionDict, d), READONLY, + "dmer size" }, + { NULL } +}; + +static Py_ssize_t ZstdCompressionDict_length(ZstdCompressionDict* self) { + return self->dictSize; +} + +static PySequenceMethods ZstdCompressionDict_sq = { + (lenfunc)ZstdCompressionDict_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + 0, /* sq_item */ + 0, /* sq_ass_item */ + 0, /* sq_contains */ + 0, /* sq_inplace_concat */ + 0 /* sq_inplace_repeat */ +}; + +PyTypeObject ZstdCompressionDictType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionDict", /* tp_name */ + sizeof(ZstdCompressionDict), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionDict_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + &ZstdCompressionDict_sq, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressionDict__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionDict_methods, /* tp_methods */ + ZstdCompressionDict_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)ZstdCompressionDict_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressiondict_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressionDictType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionDictType) < 0) { + return; + } + + Py_INCREF((PyObject*)&ZstdCompressionDictType); + PyModule_AddObject(mod, "ZstdCompressionDict", + (PyObject*)&ZstdCompressionDictType); +} diff --git a/contrib/python/zstandard/py2/c-ext/compressionparams.c b/contrib/python/zstandard/py2/c-ext/compressionparams.c new file mode 100644 index 00000000000..e5e8c55fea2 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressionparams.c @@ -0,0 +1,572 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +int set_parameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value) { + size_t zresult = ZSTD_CCtxParams_setParameter(params, param, value); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to set compression context parameter: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + + return 0; +} + +#define TRY_SET_PARAMETER(params, param, value) if (set_parameter(params, param, value)) return -1; + +#define TRY_COPY_PARAMETER(source, dest, param) { \ + int result; \ + size_t zresult = ZSTD_CCtxParams_getParameter(source, param, &result); \ + if (ZSTD_isError(zresult)) { \ + return 1; \ + } \ + zresult = ZSTD_CCtxParams_setParameter(dest, param, result); \ + if (ZSTD_isError(zresult)) { \ + return 1; \ + } \ +} + +int set_parameters(ZSTD_CCtx_params* params, ZstdCompressionParametersObject* obj) { + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_nbWorkers); + + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_format); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_compressionLevel); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_windowLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_hashLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_chainLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_searchLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_minMatch); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_targetLength); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_strategy); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_contentSizeFlag); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_checksumFlag); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_dictIDFlag); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_jobSize); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_overlapLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_forceMaxWindow); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_enableLongDistanceMatching); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmHashLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmMinMatch); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmBucketSizeLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmHashRateLog); + + return 0; +} + +int reset_params(ZstdCompressionParametersObject* params) { + if (params->params) { + ZSTD_CCtxParams_reset(params->params); + } + else { + params->params = ZSTD_createCCtxParams(); + if (!params->params) { + PyErr_NoMemory(); + return 1; + } + } + + return set_parameters(params->params, params); +} + +#define TRY_GET_PARAMETER(params, param, value) { \ + size_t zresult = ZSTD_CCtxParams_getParameter(params, param, value); \ + if (ZSTD_isError(zresult)) { \ + PyErr_Format(ZstdError, "unable to retrieve parameter: %s", ZSTD_getErrorName(zresult)); \ + return 1; \ + } \ +} + +int to_cparams(ZstdCompressionParametersObject* params, ZSTD_compressionParameters* cparams) { + int value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_windowLog, &value); + cparams->windowLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_chainLog, &value); + cparams->chainLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_hashLog, &value); + cparams->hashLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_searchLog, &value); + cparams->searchLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_minMatch, &value); + cparams->minMatch = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_targetLength, &value); + cparams->targetLength = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_strategy, &value); + cparams->strategy = value; + + return 0; +} + +static int ZstdCompressionParameters_init(ZstdCompressionParametersObject* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "format", + "compression_level", + "window_log", + "hash_log", + "chain_log", + "search_log", + "min_match", + "target_length", + "compression_strategy", + "strategy", + "write_content_size", + "write_checksum", + "write_dict_id", + "job_size", + "overlap_log", + "overlap_size_log", + "force_max_window", + "enable_ldm", + "ldm_hash_log", + "ldm_min_match", + "ldm_bucket_size_log", + "ldm_hash_rate_log", + "ldm_hash_every_log", + "threads", + NULL + }; + + int format = 0; + int compressionLevel = 0; + int windowLog = 0; + int hashLog = 0; + int chainLog = 0; + int searchLog = 0; + int minMatch = 0; + int targetLength = 0; + int compressionStrategy = -1; + int strategy = -1; + int contentSizeFlag = 1; + int checksumFlag = 0; + int dictIDFlag = 0; + int jobSize = 0; + int overlapLog = -1; + int overlapSizeLog = -1; + int forceMaxWindow = 0; + int enableLDM = 0; + int ldmHashLog = 0; + int ldmMinMatch = 0; + int ldmBucketSizeLog = 0; + int ldmHashRateLog = -1; + int ldmHashEveryLog = -1; + int threads = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, + "|iiiiiiiiiiiiiiiiiiiiiiii:CompressionParameters", + kwlist, &format, &compressionLevel, &windowLog, &hashLog, &chainLog, + &searchLog, &minMatch, &targetLength, &compressionStrategy, &strategy, + &contentSizeFlag, &checksumFlag, &dictIDFlag, &jobSize, &overlapLog, + &overlapSizeLog, &forceMaxWindow, &enableLDM, &ldmHashLog, &ldmMinMatch, + &ldmBucketSizeLog, &ldmHashRateLog, &ldmHashEveryLog, &threads)) { + return -1; + } + + if (reset_params(self)) { + return -1; + } + + if (threads < 0) { + threads = cpu_count(); + } + + /* We need to set ZSTD_c_nbWorkers before ZSTD_c_jobSize and ZSTD_c_overlapLog + * because setting ZSTD_c_nbWorkers resets the other parameters. */ + TRY_SET_PARAMETER(self->params, ZSTD_c_nbWorkers, threads); + + TRY_SET_PARAMETER(self->params, ZSTD_c_format, format); + TRY_SET_PARAMETER(self->params, ZSTD_c_compressionLevel, compressionLevel); + TRY_SET_PARAMETER(self->params, ZSTD_c_windowLog, windowLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_hashLog, hashLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_chainLog, chainLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_searchLog, searchLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_minMatch, minMatch); + TRY_SET_PARAMETER(self->params, ZSTD_c_targetLength, targetLength); + + if (compressionStrategy != -1 && strategy != -1) { + PyErr_SetString(PyExc_ValueError, "cannot specify both compression_strategy and strategy"); + return -1; + } + + if (compressionStrategy != -1) { + strategy = compressionStrategy; + } + else if (strategy == -1) { + strategy = 0; + } + + TRY_SET_PARAMETER(self->params, ZSTD_c_strategy, strategy); + TRY_SET_PARAMETER(self->params, ZSTD_c_contentSizeFlag, contentSizeFlag); + TRY_SET_PARAMETER(self->params, ZSTD_c_checksumFlag, checksumFlag); + TRY_SET_PARAMETER(self->params, ZSTD_c_dictIDFlag, dictIDFlag); + TRY_SET_PARAMETER(self->params, ZSTD_c_jobSize, jobSize); + + if (overlapLog != -1 && overlapSizeLog != -1) { + PyErr_SetString(PyExc_ValueError, "cannot specify both overlap_log and overlap_size_log"); + return -1; + } + + if (overlapSizeLog != -1) { + overlapLog = overlapSizeLog; + } + else if (overlapLog == -1) { + overlapLog = 0; + } + + TRY_SET_PARAMETER(self->params, ZSTD_c_overlapLog, overlapLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_forceMaxWindow, forceMaxWindow); + TRY_SET_PARAMETER(self->params, ZSTD_c_enableLongDistanceMatching, enableLDM); + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmHashLog, ldmHashLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmMinMatch, ldmMinMatch); + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmBucketSizeLog, ldmBucketSizeLog); + + if (ldmHashRateLog != -1 && ldmHashEveryLog != -1) { + PyErr_SetString(PyExc_ValueError, "cannot specify both ldm_hash_rate_log and ldm_hash_everyLog"); + return -1; + } + + if (ldmHashEveryLog != -1) { + ldmHashRateLog = ldmHashEveryLog; + } + else if (ldmHashRateLog == -1) { + ldmHashRateLog = 0; + } + + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmHashRateLog, ldmHashRateLog); + + return 0; +} + +PyDoc_STRVAR(ZstdCompressionParameters_from_level__doc__, +"Create a CompressionParameters from a compression level and target sizes\n" +); + +ZstdCompressionParametersObject* CompressionParameters_from_level(PyObject* undef, PyObject* args, PyObject* kwargs) { + int managedKwargs = 0; + int level; + PyObject* sourceSize = NULL; + PyObject* dictSize = NULL; + unsigned PY_LONG_LONG iSourceSize = 0; + Py_ssize_t iDictSize = 0; + PyObject* val; + ZSTD_compressionParameters params; + ZstdCompressionParametersObject* result = NULL; + int res; + + if (!PyArg_ParseTuple(args, "i:from_level", + &level)) { + return NULL; + } + + if (!kwargs) { + kwargs = PyDict_New(); + if (!kwargs) { + return NULL; + } + managedKwargs = 1; + } + + sourceSize = PyDict_GetItemString(kwargs, "source_size"); + if (sourceSize) { +#if PY_MAJOR_VERSION >= 3 + iSourceSize = PyLong_AsUnsignedLongLong(sourceSize); + if (iSourceSize == (unsigned PY_LONG_LONG)(-1)) { + goto cleanup; + } +#else + iSourceSize = PyInt_AsUnsignedLongLongMask(sourceSize); +#endif + + PyDict_DelItemString(kwargs, "source_size"); + } + + dictSize = PyDict_GetItemString(kwargs, "dict_size"); + if (dictSize) { +#if PY_MAJOR_VERSION >= 3 + iDictSize = PyLong_AsSsize_t(dictSize); +#else + iDictSize = PyInt_AsSsize_t(dictSize); +#endif + if (iDictSize == -1) { + goto cleanup; + } + + PyDict_DelItemString(kwargs, "dict_size"); + } + + + params = ZSTD_getCParams(level, iSourceSize, iDictSize); + + /* Values derived from the input level and sizes are passed along to the + constructor. But only if a value doesn't already exist. */ + val = PyDict_GetItemString(kwargs, "window_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.windowLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "window_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "chain_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.chainLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "chain_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "hash_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.hashLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "hash_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "search_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.searchLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "search_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "min_match"); + if (!val) { + val = PyLong_FromUnsignedLong(params.minMatch); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "min_match", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "target_length"); + if (!val) { + val = PyLong_FromUnsignedLong(params.targetLength); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "target_length", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "compression_strategy"); + if (!val) { + val = PyLong_FromUnsignedLong(params.strategy); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "compression_strategy", val); + Py_DECREF(val); + } + + result = PyObject_New(ZstdCompressionParametersObject, &ZstdCompressionParametersType); + if (!result) { + goto cleanup; + } + + result->params = NULL; + + val = PyTuple_New(0); + if (!val) { + Py_CLEAR(result); + goto cleanup; + } + + res = ZstdCompressionParameters_init(result, val, kwargs); + Py_DECREF(val); + + if (res) { + Py_CLEAR(result); + goto cleanup; + } + +cleanup: + if (managedKwargs) { + Py_DECREF(kwargs); + } + + return result; +} + +PyDoc_STRVAR(ZstdCompressionParameters_estimated_compression_context_size__doc__, +"Estimate the size in bytes of a compression context for compression parameters\n" +); + +PyObject* ZstdCompressionParameters_estimated_compression_context_size(ZstdCompressionParametersObject* self) { + return PyLong_FromSize_t(ZSTD_estimateCCtxSize_usingCCtxParams(self->params)); +} + +PyDoc_STRVAR(ZstdCompressionParameters__doc__, +"ZstdCompressionParameters: low-level control over zstd compression"); + +static void ZstdCompressionParameters_dealloc(ZstdCompressionParametersObject* self) { + if (self->params) { + ZSTD_freeCCtxParams(self->params); + self->params = NULL; + } + + PyObject_Del(self); +} + +#define PARAM_GETTER(name, param) PyObject* ZstdCompressionParameters_get_##name(PyObject* self, void* unused) { \ + int result; \ + size_t zresult; \ + ZstdCompressionParametersObject* p = (ZstdCompressionParametersObject*)(self); \ + zresult = ZSTD_CCtxParams_getParameter(p->params, param, &result); \ + if (ZSTD_isError(zresult)) { \ + PyErr_Format(ZstdError, "unable to get compression parameter: %s", \ + ZSTD_getErrorName(zresult)); \ + return NULL; \ + } \ + return PyLong_FromLong(result); \ +} + +PARAM_GETTER(format, ZSTD_c_format) +PARAM_GETTER(compression_level, ZSTD_c_compressionLevel) +PARAM_GETTER(window_log, ZSTD_c_windowLog) +PARAM_GETTER(hash_log, ZSTD_c_hashLog) +PARAM_GETTER(chain_log, ZSTD_c_chainLog) +PARAM_GETTER(search_log, ZSTD_c_searchLog) +PARAM_GETTER(min_match, ZSTD_c_minMatch) +PARAM_GETTER(target_length, ZSTD_c_targetLength) +PARAM_GETTER(compression_strategy, ZSTD_c_strategy) +PARAM_GETTER(write_content_size, ZSTD_c_contentSizeFlag) +PARAM_GETTER(write_checksum, ZSTD_c_checksumFlag) +PARAM_GETTER(write_dict_id, ZSTD_c_dictIDFlag) +PARAM_GETTER(job_size, ZSTD_c_jobSize) +PARAM_GETTER(overlap_log, ZSTD_c_overlapLog) +PARAM_GETTER(force_max_window, ZSTD_c_forceMaxWindow) +PARAM_GETTER(enable_ldm, ZSTD_c_enableLongDistanceMatching) +PARAM_GETTER(ldm_hash_log, ZSTD_c_ldmHashLog) +PARAM_GETTER(ldm_min_match, ZSTD_c_ldmMinMatch) +PARAM_GETTER(ldm_bucket_size_log, ZSTD_c_ldmBucketSizeLog) +PARAM_GETTER(ldm_hash_rate_log, ZSTD_c_ldmHashRateLog) +PARAM_GETTER(threads, ZSTD_c_nbWorkers) + +static PyMethodDef ZstdCompressionParameters_methods[] = { + { + "from_level", + (PyCFunction)CompressionParameters_from_level, + METH_VARARGS | METH_KEYWORDS | METH_STATIC, + ZstdCompressionParameters_from_level__doc__ + }, + { + "estimated_compression_context_size", + (PyCFunction)ZstdCompressionParameters_estimated_compression_context_size, + METH_NOARGS, + ZstdCompressionParameters_estimated_compression_context_size__doc__ + }, + { NULL, NULL } +}; + +#define GET_SET_ENTRY(name) { #name, ZstdCompressionParameters_get_##name, NULL, NULL, NULL } + +static PyGetSetDef ZstdCompressionParameters_getset[] = { + GET_SET_ENTRY(format), + GET_SET_ENTRY(compression_level), + GET_SET_ENTRY(window_log), + GET_SET_ENTRY(hash_log), + GET_SET_ENTRY(chain_log), + GET_SET_ENTRY(search_log), + GET_SET_ENTRY(min_match), + GET_SET_ENTRY(target_length), + GET_SET_ENTRY(compression_strategy), + GET_SET_ENTRY(write_content_size), + GET_SET_ENTRY(write_checksum), + GET_SET_ENTRY(write_dict_id), + GET_SET_ENTRY(threads), + GET_SET_ENTRY(job_size), + GET_SET_ENTRY(overlap_log), + /* TODO remove this deprecated attribute */ + { "overlap_size_log", ZstdCompressionParameters_get_overlap_log, NULL, NULL, NULL }, + GET_SET_ENTRY(force_max_window), + GET_SET_ENTRY(enable_ldm), + GET_SET_ENTRY(ldm_hash_log), + GET_SET_ENTRY(ldm_min_match), + GET_SET_ENTRY(ldm_bucket_size_log), + GET_SET_ENTRY(ldm_hash_rate_log), + /* TODO remove this deprecated attribute */ + { "ldm_hash_every_log", ZstdCompressionParameters_get_ldm_hash_rate_log, NULL, NULL, NULL }, + { NULL } +}; + +PyTypeObject ZstdCompressionParametersType = { + PyVarObject_HEAD_INIT(NULL, 0) + "ZstdCompressionParameters", /* tp_name */ + sizeof(ZstdCompressionParametersObject), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionParameters_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressionParameters__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionParameters_methods, /* tp_methods */ + 0, /* tp_members */ + ZstdCompressionParameters_getset, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)ZstdCompressionParameters_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressionparams_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressionParametersType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionParametersType) < 0) { + return; + } + + Py_INCREF(&ZstdCompressionParametersType); + PyModule_AddObject(mod, "ZstdCompressionParameters", + (PyObject*)&ZstdCompressionParametersType); + + /* TODO remove deprecated alias. */ + Py_INCREF(&ZstdCompressionParametersType); + PyModule_AddObject(mod, "CompressionParameters", + (PyObject*)&ZstdCompressionParametersType); +} diff --git a/contrib/python/zstandard/py2/c-ext/compressionreader.c b/contrib/python/zstandard/py2/c-ext/compressionreader.c new file mode 100644 index 00000000000..47bd3d77053 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressionreader.c @@ -0,0 +1,818 @@ +/** +* Copyright (c) 2017-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +static void set_unsupported_operation(void) { + PyObject* iomod; + PyObject* exc; + + iomod = PyImport_ImportModule("io"); + if (NULL == iomod) { + return; + } + + exc = PyObject_GetAttrString(iomod, "UnsupportedOperation"); + if (NULL == exc) { + Py_DECREF(iomod); + return; + } + + PyErr_SetNone(exc); + Py_DECREF(exc); + Py_DECREF(iomod); +} + +static void reader_dealloc(ZstdCompressionReader* self) { + Py_XDECREF(self->compressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + PyObject_Del(self); +} + +static ZstdCompressionReader* reader_enter(ZstdCompressionReader* self) { + if (self->entered) { + PyErr_SetString(PyExc_ValueError, "cannot __enter__ multiple times"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return self; +} + +static PyObject* reader_exit(ZstdCompressionReader* self, PyObject* args) { + PyObject* exc_type; + PyObject* exc_value; + PyObject* exc_tb; + + if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, &exc_tb)) { + return NULL; + } + + self->entered = 0; + self->closed = 1; + + /* Release resources associated with source. */ + Py_CLEAR(self->reader); + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + Py_CLEAR(self->compressor); + + Py_RETURN_FALSE; +} + +static PyObject* reader_readable(ZstdCompressionReader* self) { + Py_RETURN_TRUE; +} + +static PyObject* reader_writable(ZstdCompressionReader* self) { + Py_RETURN_FALSE; +} + +static PyObject* reader_seekable(ZstdCompressionReader* self) { + Py_RETURN_FALSE; +} + +static PyObject* reader_readline(PyObject* self, PyObject* args) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_readlines(PyObject* self, PyObject* args) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_write(PyObject* self, PyObject* args) { + PyErr_SetString(PyExc_OSError, "stream is not writable"); + return NULL; +} + +static PyObject* reader_writelines(PyObject* self, PyObject* args) { + PyErr_SetString(PyExc_OSError, "stream is not writable"); + return NULL; +} + +static PyObject* reader_isatty(PyObject* self) { + Py_RETURN_FALSE; +} + +static PyObject* reader_flush(PyObject* self) { + Py_RETURN_NONE; +} + +static PyObject* reader_close(ZstdCompressionReader* self) { + self->closed = 1; + Py_RETURN_NONE; +} + +static PyObject* reader_tell(ZstdCompressionReader* self) { + /* TODO should this raise OSError since stream isn't seekable? */ + return PyLong_FromUnsignedLongLong(self->bytesCompressed); +} + +int read_compressor_input(ZstdCompressionReader* self) { + if (self->finishedInput) { + return 0; + } + + if (self->input.pos != self->input.size) { + return 0; + } + + if (self->reader) { + Py_buffer buffer; + + assert(self->readResult == NULL); + + self->readResult = PyObject_CallMethod(self->reader, "read", + "k", self->readSize); + + if (NULL == self->readResult) { + return -1; + } + + memset(&buffer, 0, sizeof(buffer)); + + if (0 != PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) { + return -1; + } + + /* EOF */ + if (0 == buffer.len) { + self->finishedInput = 1; + Py_CLEAR(self->readResult); + } + else { + self->input.src = buffer.buf; + self->input.size = buffer.len; + self->input.pos = 0; + } + + PyBuffer_Release(&buffer); + } + else { + assert(self->buffer.buf); + + self->input.src = self->buffer.buf; + self->input.size = self->buffer.len; + self->input.pos = 0; + } + + return 1; +} + +int compress_input(ZstdCompressionReader* self, ZSTD_outBuffer* output) { + size_t oldPos; + size_t zresult; + + /* If we have data left over, consume it. */ + if (self->input.pos < self->input.size) { + oldPos = output->pos; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->compressor->cctx, + output, &self->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + self->bytesCompressed += output->pos - oldPos; + + /* Input exhausted. Clear out state tracking. */ + if (self->input.pos == self->input.size) { + memset(&self->input, 0, sizeof(self->input)); + Py_CLEAR(self->readResult); + + if (self->buffer.buf) { + self->finishedInput = 1; + } + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return -1; + } + } + + if (output->pos && output->pos == output->size) { + return 1; + } + else { + return 0; + } +} + +static PyObject* reader_read(ZstdCompressionReader* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "size", + NULL + }; + + Py_ssize_t size = -1; + PyObject* result = NULL; + char* resultBuffer; + Py_ssize_t resultSize; + size_t zresult; + size_t oldPos; + int readResult, compressResult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1"); + return NULL; + } + + if (size == -1) { + return PyObject_CallMethod((PyObject*)self, "readall", NULL); + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + self->output.dst = resultBuffer; + self->output.size = resultSize; + self->output.pos = 0; + +readinput: + + compressResult = compress_input(self, &self->output); + + if (-1 == compressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == compressResult) { + /* There is room in the output. We fall through to below, which will + * either get more input for us or will attempt to end the stream. + */ + } + else if (1 == compressResult) { + memset(&self->output, 0, sizeof(self->output)); + return result; + } + else { + assert(0); + } + + readResult = read_compressor_input(self); + + if (-1 == readResult) { + return NULL; + } + else if (0 == readResult) { } + else if (1 == readResult) { } + else { + assert(0); + } + + if (self->input.size) { + goto readinput; + } + + /* Else EOF */ + oldPos = self->output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &self->input, ZSTD_e_end); + + self->bytesCompressed += self->output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + Py_XDECREF(result); + return NULL; + } + + assert(self->output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + if (safe_pybytes_resize(&result, self->output.pos)) { + Py_XDECREF(result); + return NULL; + } + + memset(&self->output, 0, sizeof(self->output)); + + return result; +} + +static PyObject* reader_read1(ZstdCompressionReader* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "size", + NULL + }; + + Py_ssize_t size = -1; + PyObject* result = NULL; + char* resultBuffer; + Py_ssize_t resultSize; + ZSTD_outBuffer output; + int compressResult; + size_t oldPos; + size_t zresult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n:read1", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1"); + return NULL; + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + if (size == -1) { + size = ZSTD_CStreamOutSize(); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + output.dst = resultBuffer; + output.size = resultSize; + output.pos = 0; + + /* read1() is supposed to use at most 1 read() from the underlying stream. + However, we can't satisfy this requirement with compression because + not every input will generate output. We /could/ flush the compressor, + but this may not be desirable. We allow multiple read() from the + underlying stream. But unlike read(), we return as soon as output data + is available. + */ + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == compressResult || 1 == compressResult) { } + else { + assert(0); + } + + if (output.pos) { + goto finally; + } + + while (!self->finishedInput) { + int readResult = read_compressor_input(self); + + if (-1 == readResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == readResult || 1 == readResult) { } + else { + assert(0); + } + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == compressResult || 1 == compressResult) { } + else { + assert(0); + } + + if (output.pos) { + goto finally; + } + } + + /* EOF */ + oldPos = output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &output, &self->input, + ZSTD_e_end); + + self->bytesCompressed += output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + Py_XDECREF(result); + return NULL; + } + + if (zresult == 0) { + self->finishedOutput = 1; + } + +finally: + if (result) { + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + } + + return result; +} + +static PyObject* reader_readall(PyObject* self) { + PyObject* chunks = NULL; + PyObject* empty = NULL; + PyObject* result = NULL; + + /* Our strategy is to collect chunks into a list then join all the + * chunks at the end. We could potentially use e.g. an io.BytesIO. But + * this feels simple enough to implement and avoids potentially expensive + * reallocations of large buffers. + */ + chunks = PyList_New(0); + if (NULL == chunks) { + return NULL; + } + + while (1) { + PyObject* chunk = PyObject_CallMethod(self, "read", "i", 1048576); + if (NULL == chunk) { + Py_DECREF(chunks); + return NULL; + } + + if (!PyBytes_Size(chunk)) { + Py_DECREF(chunk); + break; + } + + if (PyList_Append(chunks, chunk)) { + Py_DECREF(chunk); + Py_DECREF(chunks); + return NULL; + } + + Py_DECREF(chunk); + } + + empty = PyBytes_FromStringAndSize("", 0); + if (NULL == empty) { + Py_DECREF(chunks); + return NULL; + } + + result = PyObject_CallMethod(empty, "join", "O", chunks); + + Py_DECREF(empty); + Py_DECREF(chunks); + + return result; +} + +static PyObject* reader_readinto(ZstdCompressionReader* self, PyObject* args) { + Py_buffer dest; + ZSTD_outBuffer output; + int readResult, compressResult; + PyObject* result = NULL; + size_t zresult; + size_t oldPos; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto", &dest)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "destination buffer should be contiguous and have at most one dimension"); + goto finally; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult) { } + else if (1 == compressResult) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + + while (!self->finishedInput) { + readResult = read_compressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult || 1 == readResult) {} + else { + assert(0); + } + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult) { } + else if (1 == compressResult) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + } + + /* EOF */ + oldPos = output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &output, &self->input, + ZSTD_e_end); + + self->bytesCompressed += self->output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + assert(output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject* reader_readinto1(ZstdCompressionReader* self, PyObject* args) { + Py_buffer dest; + PyObject* result = NULL; + ZSTD_outBuffer output; + int compressResult; + size_t oldPos; + size_t zresult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto1", &dest)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "destination buffer should be contiguous and have at most one dimension"); + goto finally; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult || 1 == compressResult) { } + else { + assert(0); + } + + if (output.pos) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + + while (!self->finishedInput) { + int readResult = read_compressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult || 1 == readResult) { } + else { + assert(0); + } + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult) { } + else if (1 == compressResult) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + + /* If we produced output and we're not done with input, emit + * that output now, as we've hit restrictions of read1(). + */ + if (output.pos && !self->finishedInput) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + + /* Otherwise we either have no output or we've exhausted the + * input. Either we try to get more input or we fall through + * to EOF below */ + } + + /* EOF */ + oldPos = output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &output, &self->input, + ZSTD_e_end); + + self->bytesCompressed += self->output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + assert(output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject* reader_iter(PyObject* self) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_iternext(PyObject* self) { + set_unsupported_operation(); + return NULL; +} + +static PyMethodDef reader_methods[] = { + { "__enter__", (PyCFunction)reader_enter, METH_NOARGS, + PyDoc_STR("Enter a compression context") }, + { "__exit__", (PyCFunction)reader_exit, METH_VARARGS, + PyDoc_STR("Exit a compression context") }, + { "close", (PyCFunction)reader_close, METH_NOARGS, + PyDoc_STR("Close the stream so it cannot perform any more operations") }, + { "flush", (PyCFunction)reader_flush, METH_NOARGS, PyDoc_STR("no-ops") }, + { "isatty", (PyCFunction)reader_isatty, METH_NOARGS, PyDoc_STR("Returns False") }, + { "readable", (PyCFunction)reader_readable, METH_NOARGS, + PyDoc_STR("Returns True") }, + { "read", (PyCFunction)reader_read, METH_VARARGS | METH_KEYWORDS, PyDoc_STR("read compressed data") }, + { "read1", (PyCFunction)reader_read1, METH_VARARGS | METH_KEYWORDS, NULL }, + { "readall", (PyCFunction)reader_readall, METH_NOARGS, PyDoc_STR("Not implemented") }, + { "readinto", (PyCFunction)reader_readinto, METH_VARARGS, NULL }, + { "readinto1", (PyCFunction)reader_readinto1, METH_VARARGS, NULL }, + { "readline", (PyCFunction)reader_readline, METH_VARARGS, PyDoc_STR("Not implemented") }, + { "readlines", (PyCFunction)reader_readlines, METH_VARARGS, PyDoc_STR("Not implemented") }, + { "seekable", (PyCFunction)reader_seekable, METH_NOARGS, + PyDoc_STR("Returns False") }, + { "tell", (PyCFunction)reader_tell, METH_NOARGS, + PyDoc_STR("Returns current number of bytes compressed") }, + { "writable", (PyCFunction)reader_writable, METH_NOARGS, + PyDoc_STR("Returns False") }, + { "write", reader_write, METH_VARARGS, PyDoc_STR("Raises OSError") }, + { "writelines", reader_writelines, METH_VARARGS, PyDoc_STR("Not implemented") }, + { NULL, NULL } +}; + +static PyMemberDef reader_members[] = { + { "closed", T_BOOL, offsetof(ZstdCompressionReader, closed), + READONLY, "whether stream is closed" }, + { NULL } +}; + +PyTypeObject ZstdCompressionReaderType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionReader", /* tp_name */ + sizeof(ZstdCompressionReader), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)reader_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + 0, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + reader_iter, /* tp_iter */ + reader_iternext, /* tp_iternext */ + reader_methods, /* tp_methods */ + reader_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressionreader_module_init(PyObject* mod) { + /* TODO make reader a sub-class of io.RawIOBase */ + + Py_TYPE(&ZstdCompressionReaderType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionReaderType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/compressionwriter.c b/contrib/python/zstandard/py2/c-ext/compressionwriter.c new file mode 100644 index 00000000000..fe8e55a0fe0 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressionwriter.c @@ -0,0 +1,372 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdCompresssionWriter__doc__, +"""A context manager used for writing compressed output to a writer.\n" +); + +static void ZstdCompressionWriter_dealloc(ZstdCompressionWriter* self) { + Py_XDECREF(self->compressor); + Py_XDECREF(self->writer); + + PyMem_Free(self->output.dst); + self->output.dst = NULL; + + PyObject_Del(self); +} + +static PyObject* ZstdCompressionWriter_enter(ZstdCompressionWriter* self) { + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->entered) { + PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return (PyObject*)self; +} + +static PyObject* ZstdCompressionWriter_exit(ZstdCompressionWriter* self, PyObject* args) { + PyObject* exc_type; + PyObject* exc_value; + PyObject* exc_tb; + + if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, &exc_tb)) { + return NULL; + } + + self->entered = 0; + + if (exc_type == Py_None && exc_value == Py_None && exc_tb == Py_None) { + PyObject* result = PyObject_CallMethod((PyObject*)self, "close", NULL); + + if (NULL == result) { + return NULL; + } + } + + Py_RETURN_FALSE; +} + +static PyObject* ZstdCompressionWriter_memory_size(ZstdCompressionWriter* self) { + return PyLong_FromSize_t(ZSTD_sizeof_CCtx(self->compressor->cctx)); +} + +static PyObject* ZstdCompressionWriter_write(ZstdCompressionWriter* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + NULL + }; + + PyObject* result = NULL; + Py_buffer source; + size_t zresult; + ZSTD_inBuffer input; + PyObject* res; + Py_ssize_t totalWrite = 0; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:write", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:write", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + self->output.pos = 0; + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + while (input.pos < (size_t)source.len) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, &input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + goto finally; + } + + /* Copy data from output buffer to writer. */ + if (self->output.pos) { +#if PY_MAJOR_VERSION >= 3 + res = PyObject_CallMethod(self->writer, "write", "y#", +#else + res = PyObject_CallMethod(self->writer, "write", "s#", +#endif + self->output.dst, self->output.pos); + Py_XDECREF(res); + totalWrite += self->output.pos; + self->bytesCompressed += self->output.pos; + } + self->output.pos = 0; + } + + if (self->writeReturnRead) { + result = PyLong_FromSize_t(input.pos); + } + else { + result = PyLong_FromSsize_t(totalWrite); + } + +finally: + PyBuffer_Release(&source); + return result; +} + +static PyObject* ZstdCompressionWriter_flush(ZstdCompressionWriter* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "flush_mode", + NULL + }; + + size_t zresult; + ZSTD_inBuffer input; + PyObject* res; + Py_ssize_t totalWrite = 0; + unsigned flush_mode = 0; + ZSTD_EndDirective flush; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|I:flush", + kwlist, &flush_mode)) { + return NULL; + } + + switch (flush_mode) { + case 0: + flush = ZSTD_e_flush; + break; + case 1: + flush = ZSTD_e_end; + break; + default: + PyErr_Format(PyExc_ValueError, "unknown flush_mode: %d", flush_mode); + return NULL; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + self->output.pos = 0; + + input.src = NULL; + input.size = 0; + input.pos = 0; + + while (1) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, &input, flush); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + /* Copy data from output buffer to writer. */ + if (self->output.pos) { +#if PY_MAJOR_VERSION >= 3 + res = PyObject_CallMethod(self->writer, "write", "y#", +#else + res = PyObject_CallMethod(self->writer, "write", "s#", +#endif + self->output.dst, self->output.pos); + Py_XDECREF(res); + totalWrite += self->output.pos; + self->bytesCompressed += self->output.pos; + } + + self->output.pos = 0; + + if (!zresult) { + break; + } + } + + return PyLong_FromSsize_t(totalWrite); +} + +static PyObject* ZstdCompressionWriter_close(ZstdCompressionWriter* self) { + PyObject* result; + + if (self->closed) { + Py_RETURN_NONE; + } + + result = PyObject_CallMethod((PyObject*)self, "flush", "I", 1); + self->closed = 1; + + if (NULL == result) { + return NULL; + } + + /* Call close on underlying stream as well. */ + if (PyObject_HasAttrString(self->writer, "close")) { + return PyObject_CallMethod(self->writer, "close", NULL); + } + + Py_RETURN_NONE; +} + +static PyObject* ZstdCompressionWriter_fileno(ZstdCompressionWriter* self) { + if (PyObject_HasAttrString(self->writer, "fileno")) { + return PyObject_CallMethod(self->writer, "fileno", NULL); + } + else { + PyErr_SetString(PyExc_OSError, "fileno not available on underlying writer"); + return NULL; + } +} + +static PyObject* ZstdCompressionWriter_tell(ZstdCompressionWriter* self) { + return PyLong_FromUnsignedLongLong(self->bytesCompressed); +} + +static PyObject* ZstdCompressionWriter_writelines(PyObject* self, PyObject* args) { + PyErr_SetNone(PyExc_NotImplementedError); + return NULL; +} + +static PyObject* ZstdCompressionWriter_false(PyObject* self, PyObject* args) { + Py_RETURN_FALSE; +} + +static PyObject* ZstdCompressionWriter_true(PyObject* self, PyObject* args) { + Py_RETURN_TRUE; +} + +static PyObject* ZstdCompressionWriter_unsupported(PyObject* self, PyObject* args, PyObject* kwargs) { + PyObject* iomod; + PyObject* exc; + + iomod = PyImport_ImportModule("io"); + if (NULL == iomod) { + return NULL; + } + + exc = PyObject_GetAttrString(iomod, "UnsupportedOperation"); + if (NULL == exc) { + Py_DECREF(iomod); + return NULL; + } + + PyErr_SetNone(exc); + Py_DECREF(exc); + Py_DECREF(iomod); + + return NULL; +} + +static PyMethodDef ZstdCompressionWriter_methods[] = { + { "__enter__", (PyCFunction)ZstdCompressionWriter_enter, METH_NOARGS, + PyDoc_STR("Enter a compression context.") }, + { "__exit__", (PyCFunction)ZstdCompressionWriter_exit, METH_VARARGS, + PyDoc_STR("Exit a compression context.") }, + { "close", (PyCFunction)ZstdCompressionWriter_close, METH_NOARGS, NULL }, + { "fileno", (PyCFunction)ZstdCompressionWriter_fileno, METH_NOARGS, NULL }, + { "isatty", (PyCFunction)ZstdCompressionWriter_false, METH_NOARGS, NULL }, + { "readable", (PyCFunction)ZstdCompressionWriter_false, METH_NOARGS, NULL }, + { "readline", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "readlines", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "seek", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "seekable", ZstdCompressionWriter_false, METH_NOARGS, NULL }, + { "truncate", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "writable", ZstdCompressionWriter_true, METH_NOARGS, NULL }, + { "writelines", ZstdCompressionWriter_writelines, METH_VARARGS, NULL }, + { "read", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "readall", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "readinto", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "memory_size", (PyCFunction)ZstdCompressionWriter_memory_size, METH_NOARGS, + PyDoc_STR("Obtain the memory size of the underlying compressor") }, + { "write", (PyCFunction)ZstdCompressionWriter_write, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("Compress data") }, + { "flush", (PyCFunction)ZstdCompressionWriter_flush, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("Flush data and finish a zstd frame") }, + { "tell", (PyCFunction)ZstdCompressionWriter_tell, METH_NOARGS, + PyDoc_STR("Returns current number of bytes compressed") }, + { NULL, NULL } +}; + +static PyMemberDef ZstdCompressionWriter_members[] = { + { "closed", T_BOOL, offsetof(ZstdCompressionWriter, closed), READONLY, NULL }, + { NULL } +}; + +PyTypeObject ZstdCompressionWriterType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionWriter", /* tp_name */ + sizeof(ZstdCompressionWriter), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionWriter_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompresssionWriter__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionWriter_methods, /* tp_methods */ + ZstdCompressionWriter_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressionwriter_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressionWriterType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionWriterType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/compressobj.c b/contrib/python/zstandard/py2/c-ext/compressobj.c new file mode 100644 index 00000000000..0c33dffc7af --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressobj.c @@ -0,0 +1,256 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdCompressionObj__doc__, +"Perform compression using a standard library compatible API.\n" +); + +static void ZstdCompressionObj_dealloc(ZstdCompressionObj* self) { + PyMem_Free(self->output.dst); + self->output.dst = NULL; + + Py_XDECREF(self->compressor); + + PyObject_Del(self); +} + +static PyObject* ZstdCompressionObj_compress(ZstdCompressionObj* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + NULL + }; + + Py_buffer source; + ZSTD_inBuffer input; + size_t zresult; + PyObject* result = NULL; + Py_ssize_t resultSize = 0; + + if (self->finished) { + PyErr_SetString(ZstdError, "cannot call compress() after compressor finished"); + return NULL; + } + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:compress", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:compress", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + while (input.pos < (size_t)source.len) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + Py_CLEAR(result); + goto finally; + } + + if (self->output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + + if (safe_pybytes_resize(&result, resultSize + self->output.pos)) { + Py_CLEAR(result); + goto finally; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, + self->output.dst, self->output.pos); + } + else { + result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + if (!result) { + goto finally; + } + } + + self->output.pos = 0; + } + } + + if (NULL == result) { + result = PyBytes_FromString(""); + } + +finally: + PyBuffer_Release(&source); + + return result; +} + +static PyObject* ZstdCompressionObj_flush(ZstdCompressionObj* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "flush_mode", + NULL + }; + + int flushMode = compressorobj_flush_finish; + size_t zresult; + PyObject* result = NULL; + Py_ssize_t resultSize = 0; + ZSTD_inBuffer input; + ZSTD_EndDirective zFlushMode; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|i:flush", kwlist, &flushMode)) { + return NULL; + } + + if (flushMode != compressorobj_flush_finish && flushMode != compressorobj_flush_block) { + PyErr_SetString(PyExc_ValueError, "flush mode not recognized"); + return NULL; + } + + if (self->finished) { + PyErr_SetString(ZstdError, "compressor object already finished"); + return NULL; + } + + switch (flushMode) { + case compressorobj_flush_block: + zFlushMode = ZSTD_e_flush; + break; + + case compressorobj_flush_finish: + zFlushMode = ZSTD_e_end; + self->finished = 1; + break; + + default: + PyErr_SetString(ZstdError, "unhandled flush mode"); + return NULL; + } + + assert(self->output.pos == 0); + + input.src = NULL; + input.size = 0; + input.pos = 0; + + while (1) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &input, zFlushMode); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + if (self->output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + + if (safe_pybytes_resize(&result, resultSize + self->output.pos)) { + Py_XDECREF(result); + return NULL; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, + self->output.dst, self->output.pos); + } + else { + result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + if (!result) { + return NULL; + } + } + + self->output.pos = 0; + } + + if (!zresult) { + break; + } + } + + if (result) { + return result; + } + else { + return PyBytes_FromString(""); + } +} + +static PyMethodDef ZstdCompressionObj_methods[] = { + { "compress", (PyCFunction)ZstdCompressionObj_compress, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("compress data") }, + { "flush", (PyCFunction)ZstdCompressionObj_flush, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("finish compression operation") }, + { NULL, NULL } +}; + +PyTypeObject ZstdCompressionObjType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionObj", /* tp_name */ + sizeof(ZstdCompressionObj), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionObj_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressionObj__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionObj_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressobj_module_init(PyObject* module) { + Py_TYPE(&ZstdCompressionObjType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionObjType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/compressor.c b/contrib/python/zstandard/py2/c-ext/compressor.c new file mode 100644 index 00000000000..fb729c17c4e --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressor.c @@ -0,0 +1,1676 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" +#include "pool.h" + +extern PyObject* ZstdError; + +int setup_cctx(ZstdCompressor* compressor) { + size_t zresult; + + assert(compressor); + assert(compressor->cctx); + assert(compressor->params); + + zresult = ZSTD_CCtx_setParametersUsingCCtxParams(compressor->cctx, compressor->params); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not set compression parameters: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + + if (compressor->dict) { + if (compressor->dict->cdict) { + zresult = ZSTD_CCtx_refCDict(compressor->cctx, compressor->dict->cdict); + } + else { + zresult = ZSTD_CCtx_loadDictionary_advanced(compressor->cctx, + compressor->dict->dictData, compressor->dict->dictSize, + ZSTD_dlm_byRef, compressor->dict->dictType); + } + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not load compression dictionary: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + } + + return 0; +} + +static PyObject* frame_progression(ZSTD_CCtx* cctx) { + PyObject* result = NULL; + PyObject* value; + ZSTD_frameProgression progression; + + result = PyTuple_New(3); + if (!result) { + return NULL; + } + + progression = ZSTD_getFrameProgression(cctx); + + value = PyLong_FromUnsignedLongLong(progression.ingested); + if (!value) { + Py_DECREF(result); + return NULL; + } + + PyTuple_SET_ITEM(result, 0, value); + + value = PyLong_FromUnsignedLongLong(progression.consumed); + if (!value) { + Py_DECREF(result); + return NULL; + } + + PyTuple_SET_ITEM(result, 1, value); + + value = PyLong_FromUnsignedLongLong(progression.produced); + if (!value) { + Py_DECREF(result); + return NULL; + } + + PyTuple_SET_ITEM(result, 2, value); + + return result; +} + +PyDoc_STRVAR(ZstdCompressor__doc__, +"ZstdCompressor(level=None, dict_data=None, compression_params=None)\n" +"\n" +"Create an object used to perform Zstandard compression.\n" +"\n" +"An instance can compress data various ways. Instances can be used multiple\n" +"times. Each compression operation will use the compression parameters\n" +"defined at construction time.\n" +"\n" +"Compression can be configured via the following names arguments:\n" +"\n" +"level\n" +" Integer compression level.\n" +"dict_data\n" +" A ``ZstdCompressionDict`` to be used to compress with dictionary data.\n" +"compression_params\n" +" A ``CompressionParameters`` instance defining low-level compression" +" parameters. If defined, this will overwrite the ``level`` argument.\n" +"write_checksum\n" +" If True, a 4 byte content checksum will be written with the compressed\n" +" data, allowing the decompressor to perform content verification.\n" +"write_content_size\n" +" If True (the default), the decompressed content size will be included in\n" +" the header of the compressed data. This data will only be written if the\n" +" compressor knows the size of the input data.\n" +"write_dict_id\n" +" Determines whether the dictionary ID will be written into the compressed\n" +" data. Defaults to True. Only adds content to the compressed data if\n" +" a dictionary is being used.\n" +"threads\n" +" Number of threads to use to compress data concurrently. When set,\n" +" compression operations are performed on multiple threads. The default\n" +" value (0) disables multi-threaded compression. A value of ``-1`` means to\n" +" set the number of threads to the number of detected logical CPUs.\n" +); + +static int ZstdCompressor_init(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "level", + "dict_data", + "compression_params", + "write_checksum", + "write_content_size", + "write_dict_id", + "threads", + NULL + }; + + int level = 3; + ZstdCompressionDict* dict = NULL; + ZstdCompressionParametersObject* params = NULL; + PyObject* writeChecksum = NULL; + PyObject* writeContentSize = NULL; + PyObject* writeDictID = NULL; + int threads = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iO!O!OOOi:ZstdCompressor", + kwlist, &level, &ZstdCompressionDictType, &dict, + &ZstdCompressionParametersType, ¶ms, + &writeChecksum, &writeContentSize, &writeDictID, &threads)) { + return -1; + } + + if (level > ZSTD_maxCLevel()) { + PyErr_Format(PyExc_ValueError, "level must be less than %d", + ZSTD_maxCLevel() + 1); + return -1; + } + + if (threads < 0) { + threads = cpu_count(); + } + + /* We create a ZSTD_CCtx for reuse among multiple operations to reduce the + overhead of each compression operation. */ + self->cctx = ZSTD_createCCtx(); + if (!self->cctx) { + PyErr_NoMemory(); + return -1; + } + + /* TODO stuff the original parameters away somewhere so we can reset later. This + will allow us to do things like automatically adjust cparams based on input + size (assuming zstd isn't doing that internally). */ + + self->params = ZSTD_createCCtxParams(); + if (!self->params) { + PyErr_NoMemory(); + return -1; + } + + if (params && writeChecksum) { + PyErr_SetString(PyExc_ValueError, + "cannot define compression_params and write_checksum"); + return -1; + } + + if (params && writeContentSize) { + PyErr_SetString(PyExc_ValueError, + "cannot define compression_params and write_content_size"); + return -1; + } + + if (params && writeDictID) { + PyErr_SetString(PyExc_ValueError, + "cannot define compression_params and write_dict_id"); + return -1; + } + + if (params && threads) { + PyErr_SetString(PyExc_ValueError, + "cannot define compression_params and threads"); + return -1; + } + + if (params) { + if (set_parameters(self->params, params)) { + return -1; + } + } + else { + if (set_parameter(self->params, ZSTD_c_compressionLevel, level)) { + return -1; + } + + if (set_parameter(self->params, ZSTD_c_contentSizeFlag, + writeContentSize ? PyObject_IsTrue(writeContentSize) : 1)) { + return -1; + } + + if (set_parameter(self->params, ZSTD_c_checksumFlag, + writeChecksum ? PyObject_IsTrue(writeChecksum) : 0)) { + return -1; + } + + if (set_parameter(self->params, ZSTD_c_dictIDFlag, + writeDictID ? PyObject_IsTrue(writeDictID) : 1)) { + return -1; + } + + if (threads) { + if (set_parameter(self->params, ZSTD_c_nbWorkers, threads)) { + return -1; + } + } + } + + if (dict) { + self->dict = dict; + Py_INCREF(dict); + } + + if (setup_cctx(self)) { + return -1; + } + + return 0; +} + +static void ZstdCompressor_dealloc(ZstdCompressor* self) { + if (self->cctx) { + ZSTD_freeCCtx(self->cctx); + self->cctx = NULL; + } + + if (self->params) { + ZSTD_freeCCtxParams(self->params); + self->params = NULL; + } + + Py_XDECREF(self->dict); + PyObject_Del(self); +} + +PyDoc_STRVAR(ZstdCompressor_memory_size__doc__, +"memory_size()\n" +"\n" +"Obtain the memory usage of this compressor, in bytes.\n" +); + +static PyObject* ZstdCompressor_memory_size(ZstdCompressor* self) { + if (self->cctx) { + return PyLong_FromSize_t(ZSTD_sizeof_CCtx(self->cctx)); + } + else { + PyErr_SetString(ZstdError, "no compressor context found; this should never happen"); + return NULL; + } +} + +PyDoc_STRVAR(ZstdCompressor_frame_progression__doc__, +"frame_progression()\n" +"\n" +"Return information on how much work the compressor has done.\n" +"\n" +"Returns a 3-tuple of (ingested, consumed, produced).\n" +); + +static PyObject* ZstdCompressor_frame_progression(ZstdCompressor* self) { + return frame_progression(self->cctx); +} + +PyDoc_STRVAR(ZstdCompressor_copy_stream__doc__, +"copy_stream(ifh, ofh[, size=0, read_size=default, write_size=default])\n" +"compress data between streams\n" +"\n" +"Data will be read from ``ifh``, compressed, and written to ``ofh``.\n" +"``ifh`` must have a ``read(size)`` method. ``ofh`` must have a ``write(data)``\n" +"method.\n" +"\n" +"An optional ``size`` argument specifies the size of the source stream.\n" +"If defined, compression parameters will be tuned based on the size.\n" +"\n" +"Optional arguments ``read_size`` and ``write_size`` define the chunk sizes\n" +"of ``read()`` and ``write()`` operations, respectively. By default, they use\n" +"the default compression stream input and output sizes, respectively.\n" +); + +static PyObject* ZstdCompressor_copy_stream(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "ifh", + "ofh", + "size", + "read_size", + "write_size", + NULL + }; + + PyObject* source; + PyObject* dest; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t inSize = ZSTD_CStreamInSize(); + size_t outSize = ZSTD_CStreamOutSize(); + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t totalRead = 0; + Py_ssize_t totalWrite = 0; + char* readBuffer; + Py_ssize_t readSize; + PyObject* readResult = NULL; + PyObject* res = NULL; + size_t zresult; + PyObject* writeResult; + PyObject* totalReadPy; + PyObject* totalWritePy; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|Kkk:copy_stream", kwlist, + &source, &dest, &sourceSize, &inSize, &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(source, "read")) { + PyErr_SetString(PyExc_ValueError, "first argument must have a read() method"); + return NULL; + } + + if (!PyObject_HasAttrString(dest, "write")) { + PyErr_SetString(PyExc_ValueError, "second argument must have a write() method"); + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + /* Prevent free on uninitialized memory in finally. */ + output.dst = PyMem_Malloc(outSize); + if (!output.dst) { + PyErr_NoMemory(); + res = NULL; + goto finally; + } + output.size = outSize; + output.pos = 0; + + input.src = NULL; + input.size = 0; + input.pos = 0; + + while (1) { + /* Try to read from source stream. */ + readResult = PyObject_CallMethod(source, "read", "n", inSize); + if (!readResult) { + PyErr_SetString(ZstdError, "could not read() from source"); + goto finally; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + + /* If no data was read, we're at EOF. */ + if (0 == readSize) { + break; + } + + totalRead += readSize; + + /* Send data to compressor */ + input.src = readBuffer; + input.size = readSize; + input.pos = 0; + + while (input.pos < input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->cctx, &output, &input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + res = NULL; + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + goto finally; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + writeResult = PyObject_CallMethod(dest, "write", "y#", +#else + writeResult = PyObject_CallMethod(dest, "write", "s#", +#endif + output.dst, output.pos); + Py_XDECREF(writeResult); + totalWrite += output.pos; + output.pos = 0; + } + } + + Py_CLEAR(readResult); + } + + /* We've finished reading. Now flush the compressor stream. */ + assert(input.pos == input.size); + + while (1) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->cctx, &output, &input, ZSTD_e_end); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + res = NULL; + goto finally; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + writeResult = PyObject_CallMethod(dest, "write", "y#", +#else + writeResult = PyObject_CallMethod(dest, "write", "s#", +#endif + output.dst, output.pos); + totalWrite += output.pos; + Py_XDECREF(writeResult); + output.pos = 0; + } + + if (!zresult) { + break; + } + } + + totalReadPy = PyLong_FromSsize_t(totalRead); + totalWritePy = PyLong_FromSsize_t(totalWrite); + res = PyTuple_Pack(2, totalReadPy, totalWritePy); + Py_DECREF(totalReadPy); + Py_DECREF(totalWritePy); + +finally: + if (output.dst) { + PyMem_Free(output.dst); + } + + Py_XDECREF(readResult); + + return res; +} + +PyDoc_STRVAR(ZstdCompressor_stream_reader__doc__, +"stream_reader(source, [size=0])\n" +"\n" +"Obtain an object that behaves like an I/O stream.\n" +"\n" +"The source object can be any object with a ``read(size)`` method\n" +"or an object that conforms to the buffer protocol.\n" +); + +static ZstdCompressionReader* ZstdCompressor_stream_reader(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "source", + "size", + "read_size", + NULL + }; + + PyObject* source; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t readSize = ZSTD_CStreamInSize(); + ZstdCompressionReader* result = NULL; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|Kk:stream_reader", kwlist, + &source, &sourceSize, &readSize)) { + return NULL; + } + + result = (ZstdCompressionReader*)PyObject_CallObject((PyObject*)&ZstdCompressionReaderType, NULL); + if (!result) { + return NULL; + } + + result->entered = 0; + result->closed = 0; + + if (PyObject_HasAttrString(source, "read")) { + result->reader = source; + Py_INCREF(source); + result->readSize = readSize; + } + else if (1 == PyObject_CheckBuffer(source)) { + if (0 != PyObject_GetBuffer(source, &result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + + assert(result->buffer.len >= 0); + + sourceSize = result->buffer.len; + } + else { + PyErr_SetString(PyExc_TypeError, + "must pass an object with a read() method or that conforms to the buffer protocol"); + goto except; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source source: %s", + ZSTD_getErrorName(zresult)); + goto except; + } + + result->compressor = self; + Py_INCREF(self); + + return result; + +except: + Py_CLEAR(result); + + return NULL; +} + +PyDoc_STRVAR(ZstdCompressor_compress__doc__, +"compress(data)\n" +"\n" +"Compress data in a single operation.\n" +"\n" +"This is the simplest mechanism to perform compression: simply pass in a\n" +"value and get a compressed value back. It is almost the most prone to abuse.\n" +"The input and output values must fit in memory, so passing in very large\n" +"values can result in excessive memory usage. For this reason, one of the\n" +"streaming based APIs is preferred for larger values.\n" +); + +static PyObject* ZstdCompressor_compress(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + NULL + }; + + Py_buffer source; + size_t destSize; + PyObject* output = NULL; + size_t zresult; + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|O:compress", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*|O:compress", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + destSize = ZSTD_compressBound(source.len); + output = PyBytes_FromStringAndSize(NULL, destSize); + if (!output) { + goto finally; + } + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, source.len); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + Py_CLEAR(output); + goto finally; + } + + inBuffer.src = source.buf; + inBuffer.size = source.len; + inBuffer.pos = 0; + + outBuffer.dst = PyBytes_AsString(output); + outBuffer.size = destSize; + outBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS + /* By avoiding ZSTD_compress(), we don't necessarily write out content + size. This means the argument to ZstdCompressor to control frame + parameters is honored. */ + zresult = ZSTD_compressStream2(self->cctx, &outBuffer, &inBuffer, ZSTD_e_end); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "cannot compress: %s", ZSTD_getErrorName(zresult)); + Py_CLEAR(output); + goto finally; + } + else if (zresult) { + PyErr_SetString(ZstdError, "unexpected partial frame flush"); + Py_CLEAR(output); + goto finally; + } + + Py_SIZE(output) = outBuffer.pos; + +finally: + PyBuffer_Release(&source); + return output; +} + +PyDoc_STRVAR(ZstdCompressionObj__doc__, +"compressobj()\n" +"\n" +"Return an object exposing ``compress(data)`` and ``flush()`` methods.\n" +"\n" +"The returned object exposes an API similar to ``zlib.compressobj`` and\n" +"``bz2.BZ2Compressor`` so that callers can swap in the zstd compressor\n" +"without changing how compression is performed.\n" +); + +static ZstdCompressionObj* ZstdCompressor_compressobj(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "size", + NULL + }; + + unsigned long long inSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t outSize = ZSTD_CStreamOutSize(); + ZstdCompressionObj* result = NULL; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|K:compressobj", kwlist, &inSize)) { + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, inSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + result = (ZstdCompressionObj*)PyObject_CallObject((PyObject*)&ZstdCompressionObjType, NULL); + if (!result) { + return NULL; + } + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + PyErr_NoMemory(); + Py_DECREF(result); + return NULL; + } + result->output.size = outSize; + result->compressor = self; + Py_INCREF(result->compressor); + + return result; +} + +PyDoc_STRVAR(ZstdCompressor_read_to_iter__doc__, +"read_to_iter(reader, [size=0, read_size=default, write_size=default])\n" +"Read uncompressed data from a reader and return an iterator\n" +"\n" +"Returns an iterator of compressed data produced from reading from ``reader``.\n" +"\n" +"Uncompressed data will be obtained from ``reader`` by calling the\n" +"``read(size)`` method of it. The source data will be streamed into a\n" +"compressor. As compressed data is available, it will be exposed to the\n" +"iterator.\n" +"\n" +"Data is read from the source in chunks of ``read_size``. Compressed chunks\n" +"are at most ``write_size`` bytes. Both values default to the zstd input and\n" +"and output defaults, respectively.\n" +"\n" +"The caller is partially in control of how fast data is fed into the\n" +"compressor by how it consumes the returned iterator. The compressor will\n" +"not consume from the reader unless the caller consumes from the iterator.\n" +); + +static ZstdCompressorIterator* ZstdCompressor_read_to_iter(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "reader", + "size", + "read_size", + "write_size", + NULL + }; + + PyObject* reader; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t inSize = ZSTD_CStreamInSize(); + size_t outSize = ZSTD_CStreamOutSize(); + ZstdCompressorIterator* result; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|Kkk:read_to_iter", kwlist, + &reader, &sourceSize, &inSize, &outSize)) { + return NULL; + } + + result = (ZstdCompressorIterator*)PyObject_CallObject((PyObject*)&ZstdCompressorIteratorType, NULL); + if (!result) { + return NULL; + } + if (PyObject_HasAttrString(reader, "read")) { + result->reader = reader; + Py_INCREF(result->reader); + } + else if (1 == PyObject_CheckBuffer(reader)) { + if (0 != PyObject_GetBuffer(reader, &result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + + sourceSize = result->buffer.len; + } + else { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a read() method or conforms to buffer protocol"); + goto except; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + result->compressor = self; + Py_INCREF(result->compressor); + + result->inSize = inSize; + result->outSize = outSize; + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + PyErr_NoMemory(); + goto except; + } + result->output.size = outSize; + + goto finally; + +except: + Py_CLEAR(result); + +finally: + return result; +} + +PyDoc_STRVAR(ZstdCompressor_stream_writer___doc__, +"Create a context manager to write compressed data to an object.\n" +"\n" +"The passed object must have a ``write()`` method.\n" +"\n" +"The caller feeds input data to the object by calling ``compress(data)``.\n" +"Compressed data is written to the argument given to this function.\n" +"\n" +"The function takes an optional ``size`` argument indicating the total size\n" +"of the eventual input. If specified, the size will influence compression\n" +"parameter tuning and could result in the size being written into the\n" +"header of the compressed data.\n" +"\n" +"An optional ``write_size`` argument is also accepted. It defines the maximum\n" +"byte size of chunks fed to ``write()``. By default, it uses the zstd default\n" +"for a compressor output stream.\n" +); + +static ZstdCompressionWriter* ZstdCompressor_stream_writer(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "writer", + "size", + "write_size", + "write_return_read", + NULL + }; + + PyObject* writer; + ZstdCompressionWriter* result; + size_t zresult; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t outSize = ZSTD_CStreamOutSize(); + PyObject* writeReturnRead = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|KkO:stream_writer", kwlist, + &writer, &sourceSize, &outSize, &writeReturnRead)) { + return NULL; + } + + if (!PyObject_HasAttrString(writer, "write")) { + PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method"); + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + result = (ZstdCompressionWriter*)PyObject_CallObject((PyObject*)&ZstdCompressionWriterType, NULL); + if (!result) { + return NULL; + } + + result->entered = 0; + result->closed = 0; + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + Py_DECREF(result); + return (ZstdCompressionWriter*)PyErr_NoMemory(); + } + + result->output.pos = 0; + result->output.size = outSize; + + result->compressor = self; + Py_INCREF(result->compressor); + + result->writer = writer; + Py_INCREF(result->writer); + + result->outSize = outSize; + result->bytesCompressed = 0; + result->writeReturnRead = writeReturnRead ? PyObject_IsTrue(writeReturnRead) : 0; + + return result; +} + +PyDoc_STRVAR(ZstdCompressor_chunker__doc__, +"Create an object for iterative compressing to same-sized chunks.\n" +); + +static ZstdCompressionChunker* ZstdCompressor_chunker(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "size", + "chunk_size", + NULL + }; + + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t chunkSize = ZSTD_CStreamOutSize(); + ZstdCompressionChunker* chunker; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|Kk:chunker", kwlist, + &sourceSize, &chunkSize)) { + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + chunker = (ZstdCompressionChunker*)PyObject_CallObject((PyObject*)&ZstdCompressionChunkerType, NULL); + if (!chunker) { + return NULL; + } + + chunker->output.dst = PyMem_Malloc(chunkSize); + if (!chunker->output.dst) { + PyErr_NoMemory(); + Py_DECREF(chunker); + return NULL; + } + chunker->output.size = chunkSize; + chunker->output.pos = 0; + + chunker->compressor = self; + Py_INCREF(chunker->compressor); + + chunker->chunkSize = chunkSize; + + return chunker; +} + +typedef struct { + void* sourceData; + size_t sourceSize; +} DataSource; + +typedef struct { + DataSource* sources; + Py_ssize_t sourcesSize; + unsigned long long totalSourceSize; +} DataSources; + +typedef struct { + void* dest; + Py_ssize_t destSize; + BufferSegment* segments; + Py_ssize_t segmentsSize; +} DestBuffer; + +typedef enum { + WorkerError_none = 0, + WorkerError_zstd = 1, + WorkerError_no_memory = 2, + WorkerError_nospace = 3, +} WorkerError; + +/** + * Holds state for an individual worker performing multi_compress_to_buffer work. + */ +typedef struct { + /* Used for compression. */ + ZSTD_CCtx* cctx; + + /* What to compress. */ + DataSource* sources; + Py_ssize_t sourcesSize; + Py_ssize_t startOffset; + Py_ssize_t endOffset; + unsigned long long totalSourceSize; + + /* Result storage. */ + DestBuffer* destBuffers; + Py_ssize_t destCount; + + /* Error tracking. */ + WorkerError error; + size_t zresult; + Py_ssize_t errorOffset; +} WorkerState; + +static void compress_worker(WorkerState* state) { + Py_ssize_t inputOffset = state->startOffset; + Py_ssize_t remainingItems = state->endOffset - state->startOffset + 1; + Py_ssize_t currentBufferStartOffset = state->startOffset; + size_t zresult; + void* newDest; + size_t allocationSize; + size_t boundSize; + Py_ssize_t destOffset = 0; + DataSource* sources = state->sources; + DestBuffer* destBuffer; + + assert(!state->destBuffers); + assert(0 == state->destCount); + + /* + * The total size of the compressed data is unknown until we actually + * compress data. That means we can't pre-allocate the exact size we need. + * + * There is a cost to every allocation and reallocation. So, it is in our + * interest to minimize the number of allocations. + * + * There is also a cost to too few allocations. If allocations are too + * large they may fail. If buffers are shared and all inputs become + * irrelevant at different lifetimes, then a reference to one segment + * in the buffer will keep the entire buffer alive. This leads to excessive + * memory usage. + * + * Our current strategy is to assume a compression ratio of 16:1 and + * allocate buffers of that size, rounded up to the nearest power of 2 + * (because computers like round numbers). That ratio is greater than what + * most inputs achieve. This is by design: we don't want to over-allocate. + * But we don't want to under-allocate and lead to too many buffers either. + */ + + state->destCount = 1; + + state->destBuffers = calloc(1, sizeof(DestBuffer)); + if (NULL == state->destBuffers) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer = &state->destBuffers[state->destCount - 1]; + + /* + * Rather than track bounds and grow the segments buffer, allocate space + * to hold remaining items then truncate when we're done with it. + */ + destBuffer->segments = calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + + assert(state->totalSourceSize <= SIZE_MAX); + allocationSize = roundpow2((size_t)state->totalSourceSize >> 4); + + /* If the maximum size of the output is larger than that, round up. */ + boundSize = ZSTD_compressBound(sources[inputOffset].sourceSize); + + if (boundSize > allocationSize) { + allocationSize = roundpow2(boundSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer->destSize = allocationSize; + + for (inputOffset = state->startOffset; inputOffset <= state->endOffset; inputOffset++) { + void* source = sources[inputOffset].sourceData; + size_t sourceSize = sources[inputOffset].sourceSize; + size_t destAvailable; + void* dest; + ZSTD_outBuffer opOutBuffer; + ZSTD_inBuffer opInBuffer; + + destAvailable = destBuffer->destSize - destOffset; + boundSize = ZSTD_compressBound(sourceSize); + + /* + * Not enough space in current buffer to hold largest compressed output. + * So allocate and switch to a new output buffer. + */ + if (boundSize > destAvailable) { + /* + * The downsizing of the existing buffer is optional. It should be cheap + * (unlike growing). So we just do it. + */ + if (destAvailable) { + newDest = realloc(destBuffer->dest, destOffset); + if (NULL == newDest) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer->dest = newDest; + destBuffer->destSize = destOffset; + } + + /* Truncate segments buffer. */ + newDest = realloc(destBuffer->segments, + (inputOffset - currentBufferStartOffset + 1) * sizeof(BufferSegment)); + if (NULL == newDest) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer->segments = newDest; + destBuffer->segmentsSize = inputOffset - currentBufferStartOffset; + + /* Grow space for new struct. */ + /* TODO consider over-allocating so we don't do this every time. */ + newDest = realloc(state->destBuffers, (state->destCount + 1) * sizeof(DestBuffer)); + if (NULL == newDest) { + state->error = WorkerError_no_memory; + return; + } + + state->destBuffers = newDest; + state->destCount++; + + destBuffer = &state->destBuffers[state->destCount - 1]; + + /* Don't take any chances with non-NULL pointers. */ + memset(destBuffer, 0, sizeof(DestBuffer)); + + /** + * We could dynamically update allocation size based on work done so far. + * For now, keep is simple. + */ + assert(state->totalSourceSize <= SIZE_MAX); + allocationSize = roundpow2((size_t)state->totalSourceSize >> 4); + + if (boundSize > allocationSize) { + allocationSize = roundpow2(boundSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer->destSize = allocationSize; + destAvailable = allocationSize; + destOffset = 0; + + destBuffer->segments = calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + currentBufferStartOffset = inputOffset; + } + + dest = (char*)destBuffer->dest + destOffset; + + opInBuffer.src = source; + opInBuffer.size = sourceSize; + opInBuffer.pos = 0; + + opOutBuffer.dst = dest; + opOutBuffer.size = destAvailable; + opOutBuffer.pos = 0; + + zresult = ZSTD_CCtx_setPledgedSrcSize(state->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + state->error = WorkerError_zstd; + state->zresult = zresult; + state->errorOffset = inputOffset; + break; + } + + zresult = ZSTD_compressStream2(state->cctx, &opOutBuffer, &opInBuffer, ZSTD_e_end); + if (ZSTD_isError(zresult)) { + state->error = WorkerError_zstd; + state->zresult = zresult; + state->errorOffset = inputOffset; + break; + } + else if (zresult) { + state->error = WorkerError_nospace; + state->errorOffset = inputOffset; + break; + } + + destBuffer->segments[inputOffset - currentBufferStartOffset].offset = destOffset; + destBuffer->segments[inputOffset - currentBufferStartOffset].length = opOutBuffer.pos; + + destOffset += opOutBuffer.pos; + remainingItems--; + } + + if (destBuffer->destSize > destOffset) { + newDest = realloc(destBuffer->dest, destOffset); + if (NULL == newDest) { + state->error = WorkerError_no_memory; + return; + } + + destBuffer->dest = newDest; + destBuffer->destSize = destOffset; + } +} + +ZstdBufferWithSegmentsCollection* compress_from_datasources(ZstdCompressor* compressor, + DataSources* sources, Py_ssize_t threadCount) { + unsigned long long bytesPerWorker; + POOL_ctx* pool = NULL; + WorkerState* workerStates = NULL; + Py_ssize_t i; + unsigned long long workerBytes = 0; + Py_ssize_t workerStartOffset = 0; + Py_ssize_t currentThread = 0; + int errored = 0; + Py_ssize_t segmentsCount = 0; + Py_ssize_t segmentIndex; + PyObject* segmentsArg = NULL; + ZstdBufferWithSegments* buffer; + ZstdBufferWithSegmentsCollection* result = NULL; + + assert(sources->sourcesSize > 0); + assert(sources->totalSourceSize > 0); + assert(threadCount >= 1); + + /* More threads than inputs makes no sense. */ + threadCount = sources->sourcesSize < threadCount ? sources->sourcesSize + : threadCount; + + /* TODO lower thread count when input size is too small and threads would add + overhead. */ + + workerStates = PyMem_Malloc(threadCount * sizeof(WorkerState)); + if (NULL == workerStates) { + PyErr_NoMemory(); + goto finally; + } + + memset(workerStates, 0, threadCount * sizeof(WorkerState)); + + if (threadCount > 1) { + pool = POOL_create(threadCount, 1); + if (NULL == pool) { + PyErr_SetString(ZstdError, "could not initialize zstd thread pool"); + goto finally; + } + } + + bytesPerWorker = sources->totalSourceSize / threadCount; + + for (i = 0; i < threadCount; i++) { + size_t zresult; + + workerStates[i].cctx = ZSTD_createCCtx(); + if (!workerStates[i].cctx) { + PyErr_NoMemory(); + goto finally; + } + + zresult = ZSTD_CCtx_setParametersUsingCCtxParams(workerStates[i].cctx, + compressor->params); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not set compression parameters: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + if (compressor->dict) { + if (compressor->dict->cdict) { + zresult = ZSTD_CCtx_refCDict(workerStates[i].cctx, compressor->dict->cdict); + } + else { + zresult = ZSTD_CCtx_loadDictionary_advanced( + workerStates[i].cctx, + compressor->dict->dictData, + compressor->dict->dictSize, + ZSTD_dlm_byRef, + compressor->dict->dictType); + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not load compression dictionary: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + } + + workerStates[i].sources = sources->sources; + workerStates[i].sourcesSize = sources->sourcesSize; + } + + Py_BEGIN_ALLOW_THREADS + for (i = 0; i < sources->sourcesSize; i++) { + workerBytes += sources->sources[i].sourceSize; + + /* + * The last worker/thread needs to handle all remaining work. Don't + * trigger it prematurely. Defer to the block outside of the loop + * to run the last worker/thread. But do still process this loop + * so workerBytes is correct. + */ + if (currentThread == threadCount - 1) { + continue; + } + + if (workerBytes >= bytesPerWorker) { + assert(currentThread < threadCount); + workerStates[currentThread].totalSourceSize = workerBytes; + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = i; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)compress_worker, &workerStates[currentThread]); + } + else { + compress_worker(&workerStates[currentThread]); + } + + currentThread++; + workerStartOffset = i + 1; + workerBytes = 0; + } + } + + if (workerBytes) { + assert(currentThread < threadCount); + workerStates[currentThread].totalSourceSize = workerBytes; + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = sources->sourcesSize - 1; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)compress_worker, &workerStates[currentThread]); + } + else { + compress_worker(&workerStates[currentThread]); + } + } + + if (threadCount > 1) { + POOL_free(pool); + pool = NULL; + } + + Py_END_ALLOW_THREADS + + for (i = 0; i < threadCount; i++) { + switch (workerStates[i].error) { + case WorkerError_no_memory: + PyErr_NoMemory(); + errored = 1; + break; + + case WorkerError_zstd: + PyErr_Format(ZstdError, "error compressing item %zd: %s", + workerStates[i].errorOffset, ZSTD_getErrorName(workerStates[i].zresult)); + errored = 1; + break; + + case WorkerError_nospace: + PyErr_Format(ZstdError, "error compressing item %zd: not enough space in output", + workerStates[i].errorOffset); + errored = 1; + break; + + default: + ; + } + + if (errored) { + break; + } + + } + + if (errored) { + goto finally; + } + + segmentsCount = 0; + for (i = 0; i < threadCount; i++) { + WorkerState* state = &workerStates[i]; + segmentsCount += state->destCount; + } + + segmentsArg = PyTuple_New(segmentsCount); + if (NULL == segmentsArg) { + goto finally; + } + + segmentIndex = 0; + + for (i = 0; i < threadCount; i++) { + Py_ssize_t j; + WorkerState* state = &workerStates[i]; + + for (j = 0; j < state->destCount; j++) { + DestBuffer* destBuffer = &state->destBuffers[j]; + buffer = BufferWithSegments_FromMemory(destBuffer->dest, destBuffer->destSize, + destBuffer->segments, destBuffer->segmentsSize); + + if (NULL == buffer) { + goto finally; + } + + /* Tell instance to use free() instsead of PyMem_Free(). */ + buffer->useFree = 1; + + /* + * BufferWithSegments_FromMemory takes ownership of the backing memory. + * Unset it here so it doesn't get freed below. + */ + destBuffer->dest = NULL; + destBuffer->segments = NULL; + + PyTuple_SET_ITEM(segmentsArg, segmentIndex++, (PyObject*)buffer); + } + } + + result = (ZstdBufferWithSegmentsCollection*)PyObject_CallObject( + (PyObject*)&ZstdBufferWithSegmentsCollectionType, segmentsArg); + +finally: + Py_CLEAR(segmentsArg); + + if (pool) { + POOL_free(pool); + } + + if (workerStates) { + Py_ssize_t j; + + for (i = 0; i < threadCount; i++) { + WorkerState state = workerStates[i]; + + if (state.cctx) { + ZSTD_freeCCtx(state.cctx); + } + + /* malloc() is used in worker thread. */ + + for (j = 0; j < state.destCount; j++) { + if (state.destBuffers) { + free(state.destBuffers[j].dest); + free(state.destBuffers[j].segments); + } + } + + + free(state.destBuffers); + } + + PyMem_Free(workerStates); + } + + return result; +} + +PyDoc_STRVAR(ZstdCompressor_multi_compress_to_buffer__doc__, +"Compress multiple pieces of data as a single operation\n" +"\n" +"Receives a ``BufferWithSegmentsCollection``, a ``BufferWithSegments``, or\n" +"a list of bytes like objects holding data to compress.\n" +"\n" +"Returns a ``BufferWithSegmentsCollection`` holding compressed data.\n" +"\n" +"This function is optimized to perform multiple compression operations as\n" +"as possible with as little overhead as possbile.\n" +); + +static ZstdBufferWithSegmentsCollection* ZstdCompressor_multi_compress_to_buffer(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + "threads", + NULL + }; + + PyObject* data; + int threads = 0; + Py_buffer* dataBuffers = NULL; + DataSources sources; + Py_ssize_t i; + Py_ssize_t sourceCount = 0; + ZstdBufferWithSegmentsCollection* result = NULL; + + memset(&sources, 0, sizeof(sources)); + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|i:multi_compress_to_buffer", kwlist, + &data, &threads)) { + return NULL; + } + + if (threads < 0) { + threads = cpu_count(); + } + + if (threads < 2) { + threads = 1; + } + + if (PyObject_TypeCheck(data, &ZstdBufferWithSegmentsType)) { + ZstdBufferWithSegments* buffer = (ZstdBufferWithSegments*)data; + + sources.sources = PyMem_Malloc(buffer->segmentCount * sizeof(DataSource)); + if (NULL == sources.sources) { + PyErr_NoMemory(); + goto finally; + } + + for (i = 0; i < buffer->segmentCount; i++) { + if (buffer->segments[i].length > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "buffer segment %zd is too large for this platform", i); + goto finally; + } + + sources.sources[i].sourceData = (char*)buffer->data + buffer->segments[i].offset; + sources.sources[i].sourceSize = (size_t)buffer->segments[i].length; + sources.totalSourceSize += buffer->segments[i].length; + } + + sources.sourcesSize = buffer->segmentCount; + } + else if (PyObject_TypeCheck(data, &ZstdBufferWithSegmentsCollectionType)) { + Py_ssize_t j; + Py_ssize_t offset = 0; + ZstdBufferWithSegments* buffer; + ZstdBufferWithSegmentsCollection* collection = (ZstdBufferWithSegmentsCollection*)data; + + sourceCount = BufferWithSegmentsCollection_length(collection); + + sources.sources = PyMem_Malloc(sourceCount * sizeof(DataSource)); + if (NULL == sources.sources) { + PyErr_NoMemory(); + goto finally; + } + + for (i = 0; i < collection->bufferCount; i++) { + buffer = collection->buffers[i]; + + for (j = 0; j < buffer->segmentCount; j++) { + if (buffer->segments[j].length > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "buffer segment %zd in buffer %zd is too large for this platform", + j, i); + goto finally; + } + + sources.sources[offset].sourceData = (char*)buffer->data + buffer->segments[j].offset; + sources.sources[offset].sourceSize = (size_t)buffer->segments[j].length; + sources.totalSourceSize += buffer->segments[j].length; + + offset++; + } + } + + sources.sourcesSize = sourceCount; + } + else if (PyList_Check(data)) { + sourceCount = PyList_GET_SIZE(data); + + sources.sources = PyMem_Malloc(sourceCount * sizeof(DataSource)); + if (NULL == sources.sources) { + PyErr_NoMemory(); + goto finally; + } + + dataBuffers = PyMem_Malloc(sourceCount * sizeof(Py_buffer)); + if (NULL == dataBuffers) { + PyErr_NoMemory(); + goto finally; + } + + memset(dataBuffers, 0, sourceCount * sizeof(Py_buffer)); + + for (i = 0; i < sourceCount; i++) { + if (0 != PyObject_GetBuffer(PyList_GET_ITEM(data, i), + &dataBuffers[i], PyBUF_CONTIG_RO)) { + PyErr_Clear(); + PyErr_Format(PyExc_TypeError, "item %zd not a bytes like object", i); + goto finally; + } + + sources.sources[i].sourceData = dataBuffers[i].buf; + sources.sources[i].sourceSize = dataBuffers[i].len; + sources.totalSourceSize += dataBuffers[i].len; + } + + sources.sourcesSize = sourceCount; + } + else { + PyErr_SetString(PyExc_TypeError, "argument must be list of BufferWithSegments"); + goto finally; + } + + if (0 == sources.sourcesSize) { + PyErr_SetString(PyExc_ValueError, "no source elements found"); + goto finally; + } + + if (0 == sources.totalSourceSize) { + PyErr_SetString(PyExc_ValueError, "source elements are empty"); + goto finally; + } + + if (sources.totalSourceSize > SIZE_MAX) { + PyErr_SetString(PyExc_ValueError, "sources are too large for this platform"); + goto finally; + } + + result = compress_from_datasources(self, &sources, threads); + +finally: + PyMem_Free(sources.sources); + + if (dataBuffers) { + for (i = 0; i < sourceCount; i++) { + PyBuffer_Release(&dataBuffers[i]); + } + + PyMem_Free(dataBuffers); + } + + return result; +} + +static PyMethodDef ZstdCompressor_methods[] = { + { "chunker", (PyCFunction)ZstdCompressor_chunker, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_chunker__doc__ }, + { "compress", (PyCFunction)ZstdCompressor_compress, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_compress__doc__ }, + { "compressobj", (PyCFunction)ZstdCompressor_compressobj, + METH_VARARGS | METH_KEYWORDS, ZstdCompressionObj__doc__ }, + { "copy_stream", (PyCFunction)ZstdCompressor_copy_stream, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_copy_stream__doc__ }, + { "stream_reader", (PyCFunction)ZstdCompressor_stream_reader, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_stream_reader__doc__ }, + { "stream_writer", (PyCFunction)ZstdCompressor_stream_writer, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_stream_writer___doc__ }, + { "read_to_iter", (PyCFunction)ZstdCompressor_read_to_iter, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_read_to_iter__doc__ }, + /* TODO Remove deprecated API */ + { "read_from", (PyCFunction)ZstdCompressor_read_to_iter, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_read_to_iter__doc__ }, + /* TODO remove deprecated API */ + { "write_to", (PyCFunction)ZstdCompressor_stream_writer, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_stream_writer___doc__ }, + { "multi_compress_to_buffer", (PyCFunction)ZstdCompressor_multi_compress_to_buffer, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_multi_compress_to_buffer__doc__ }, + { "memory_size", (PyCFunction)ZstdCompressor_memory_size, + METH_NOARGS, ZstdCompressor_memory_size__doc__ }, + { "frame_progression", (PyCFunction)ZstdCompressor_frame_progression, + METH_NOARGS, ZstdCompressor_frame_progression__doc__ }, + { NULL, NULL } +}; + +PyTypeObject ZstdCompressorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressor", /* tp_name */ + sizeof(ZstdCompressor), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressor_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressor__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressor_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)ZstdCompressor_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressor_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressorType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressorType) < 0) { + return; + } + + Py_INCREF((PyObject*)&ZstdCompressorType); + PyModule_AddObject(mod, "ZstdCompressor", + (PyObject*)&ZstdCompressorType); +} diff --git a/contrib/python/zstandard/py2/c-ext/compressoriterator.c b/contrib/python/zstandard/py2/c-ext/compressoriterator.c new file mode 100644 index 00000000000..24e31c5a4be --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/compressoriterator.c @@ -0,0 +1,235 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +#define min(a, b) (((a) < (b)) ? (a) : (b)) + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdCompressorIterator__doc__, +"Represents an iterator of compressed data.\n" +); + +static void ZstdCompressorIterator_dealloc(ZstdCompressorIterator* self) { + Py_XDECREF(self->readResult); + Py_XDECREF(self->compressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + if (self->output.dst) { + PyMem_Free(self->output.dst); + self->output.dst = NULL; + } + + PyObject_Del(self); +} + +static PyObject* ZstdCompressorIterator_iter(PyObject* self) { + Py_INCREF(self); + return self; +} + +static PyObject* ZstdCompressorIterator_iternext(ZstdCompressorIterator* self) { + size_t zresult; + PyObject* readResult = NULL; + PyObject* chunk; + char* readBuffer; + Py_ssize_t readSize = 0; + Py_ssize_t bufferRemaining; + + if (self->finishedOutput) { + PyErr_SetString(PyExc_StopIteration, "output flushed"); + return NULL; + } + +feedcompressor: + + /* If we have data left in the input, consume it. */ + if (self->input.pos < self->input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &self->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + /* Release the Python object holding the input buffer. */ + if (self->input.pos == self->input.size) { + self->input.src = NULL; + self->input.pos = 0; + self->input.size = 0; + Py_DECREF(self->readResult); + self->readResult = NULL; + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + /* If it produced output data, emit it. */ + if (self->output.pos) { + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; + } + } + + /* We should never have output data sitting around after a previous call. */ + assert(self->output.pos == 0); + + /* The code above should have either emitted a chunk and returned or consumed + the entire input buffer. So the state of the input buffer is not + relevant. */ + if (!self->finishedInput) { + if (self->reader) { + readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize); + if (!readResult) { + PyErr_SetString(ZstdError, "could not read() from source"); + return NULL; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + } + else { + assert(self->buffer.buf); + + /* Only support contiguous C arrays. */ + assert(self->buffer.strides == NULL && self->buffer.suboffsets == NULL); + assert(self->buffer.itemsize == 1); + + readBuffer = (char*)self->buffer.buf + self->bufferOffset; + bufferRemaining = self->buffer.len - self->bufferOffset; + readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); + self->bufferOffset += readSize; + } + + if (0 == readSize) { + Py_XDECREF(readResult); + self->finishedInput = 1; + } + else { + self->readResult = readResult; + } + } + + /* EOF */ + if (0 == readSize) { + self->input.src = NULL; + self->input.size = 0; + self->input.pos = 0; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &self->input, ZSTD_e_end); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + assert(self->output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; + } + + /* New data from reader. Feed into compressor. */ + self->input.src = readBuffer; + self->input.size = readSize; + self->input.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &self->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + /* The input buffer currently points to memory managed by Python + (readBuffer). This object was allocated by this function. If it wasn't + fully consumed, we need to release it in a subsequent function call. + If it is fully consumed, do that now. + */ + if (self->input.pos == self->input.size) { + self->input.src = NULL; + self->input.pos = 0; + self->input.size = 0; + Py_XDECREF(self->readResult); + self->readResult = NULL; + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + assert(self->input.pos <= self->input.size); + + /* If we didn't write anything, start the process over. */ + if (0 == self->output.pos) { + goto feedcompressor; + } + + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; +} + +PyTypeObject ZstdCompressorIteratorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressorIterator", /* tp_name */ + sizeof(ZstdCompressorIterator), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressorIterator_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressorIterator__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + ZstdCompressorIterator_iter, /* tp_iter */ + (iternextfunc)ZstdCompressorIterator_iternext, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressoriterator_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressorIteratorType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressorIteratorType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/constants.c b/contrib/python/zstandard/py2/c-ext/constants.c new file mode 100644 index 00000000000..bafdf1e469a --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/constants.c @@ -0,0 +1,109 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +static char frame_header[] = { + '\x28', + '\xb5', + '\x2f', + '\xfd', +}; + +void constants_module_init(PyObject* mod) { + PyObject* version; + PyObject* zstdVersion; + PyObject* frameHeader; + +#if PY_MAJOR_VERSION >= 3 + version = PyUnicode_FromString(PYTHON_ZSTANDARD_VERSION); +#else + version = PyString_FromString(PYTHON_ZSTANDARD_VERSION); +#endif + PyModule_AddObject(mod, "__version__", version); + + ZstdError = PyErr_NewException("zstd.ZstdError", NULL, NULL); + PyModule_AddObject(mod, "ZstdError", ZstdError); + + PyModule_AddIntConstant(mod, "FLUSH_BLOCK", 0); + PyModule_AddIntConstant(mod, "FLUSH_FRAME", 1); + + PyModule_AddIntConstant(mod, "COMPRESSOBJ_FLUSH_FINISH", compressorobj_flush_finish); + PyModule_AddIntConstant(mod, "COMPRESSOBJ_FLUSH_BLOCK", compressorobj_flush_block); + + /* For now, the version is a simple tuple instead of a dedicated type. */ + zstdVersion = PyTuple_New(3); + PyTuple_SetItem(zstdVersion, 0, PyLong_FromLong(ZSTD_VERSION_MAJOR)); + PyTuple_SetItem(zstdVersion, 1, PyLong_FromLong(ZSTD_VERSION_MINOR)); + PyTuple_SetItem(zstdVersion, 2, PyLong_FromLong(ZSTD_VERSION_RELEASE)); + PyModule_AddObject(mod, "ZSTD_VERSION", zstdVersion); + + frameHeader = PyBytes_FromStringAndSize(frame_header, sizeof(frame_header)); + if (frameHeader) { + PyModule_AddObject(mod, "FRAME_HEADER", frameHeader); + } + else { + PyErr_Format(PyExc_ValueError, "could not create frame header object"); + } + + PyModule_AddObject(mod, "CONTENTSIZE_UNKNOWN", + PyLong_FromUnsignedLongLong(ZSTD_CONTENTSIZE_UNKNOWN)); + PyModule_AddObject(mod, "CONTENTSIZE_ERROR", + PyLong_FromUnsignedLongLong(ZSTD_CONTENTSIZE_ERROR)); + + PyModule_AddIntConstant(mod, "MAX_COMPRESSION_LEVEL", ZSTD_maxCLevel()); + PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_INPUT_SIZE", + (long)ZSTD_CStreamInSize()); + PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_OUTPUT_SIZE", + (long)ZSTD_CStreamOutSize()); + PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_INPUT_SIZE", + (long)ZSTD_DStreamInSize()); + PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE", + (long)ZSTD_DStreamOutSize()); + + PyModule_AddIntConstant(mod, "MAGIC_NUMBER", ZSTD_MAGICNUMBER); + PyModule_AddIntConstant(mod, "BLOCKSIZELOG_MAX", ZSTD_BLOCKSIZELOG_MAX); + PyModule_AddIntConstant(mod, "BLOCKSIZE_MAX", ZSTD_BLOCKSIZE_MAX); + PyModule_AddIntConstant(mod, "WINDOWLOG_MIN", ZSTD_WINDOWLOG_MIN); + PyModule_AddIntConstant(mod, "WINDOWLOG_MAX", ZSTD_WINDOWLOG_MAX); + PyModule_AddIntConstant(mod, "CHAINLOG_MIN", ZSTD_CHAINLOG_MIN); + PyModule_AddIntConstant(mod, "CHAINLOG_MAX", ZSTD_CHAINLOG_MAX); + PyModule_AddIntConstant(mod, "HASHLOG_MIN", ZSTD_HASHLOG_MIN); + PyModule_AddIntConstant(mod, "HASHLOG_MAX", ZSTD_HASHLOG_MAX); + PyModule_AddIntConstant(mod, "SEARCHLOG_MIN", ZSTD_SEARCHLOG_MIN); + PyModule_AddIntConstant(mod, "SEARCHLOG_MAX", ZSTD_SEARCHLOG_MAX); + PyModule_AddIntConstant(mod, "MINMATCH_MIN", ZSTD_MINMATCH_MIN); + PyModule_AddIntConstant(mod, "MINMATCH_MAX", ZSTD_MINMATCH_MAX); + /* TODO SEARCHLENGTH_* is deprecated. */ + PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_MINMATCH_MIN); + PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_MINMATCH_MAX); + PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN); + PyModule_AddIntConstant(mod, "TARGETLENGTH_MAX", ZSTD_TARGETLENGTH_MAX); + PyModule_AddIntConstant(mod, "LDM_MINMATCH_MIN", ZSTD_LDM_MINMATCH_MIN); + PyModule_AddIntConstant(mod, "LDM_MINMATCH_MAX", ZSTD_LDM_MINMATCH_MAX); + PyModule_AddIntConstant(mod, "LDM_BUCKETSIZELOG_MAX", ZSTD_LDM_BUCKETSIZELOG_MAX); + + PyModule_AddIntConstant(mod, "STRATEGY_FAST", ZSTD_fast); + PyModule_AddIntConstant(mod, "STRATEGY_DFAST", ZSTD_dfast); + PyModule_AddIntConstant(mod, "STRATEGY_GREEDY", ZSTD_greedy); + PyModule_AddIntConstant(mod, "STRATEGY_LAZY", ZSTD_lazy); + PyModule_AddIntConstant(mod, "STRATEGY_LAZY2", ZSTD_lazy2); + PyModule_AddIntConstant(mod, "STRATEGY_BTLAZY2", ZSTD_btlazy2); + PyModule_AddIntConstant(mod, "STRATEGY_BTOPT", ZSTD_btopt); + PyModule_AddIntConstant(mod, "STRATEGY_BTULTRA", ZSTD_btultra); + PyModule_AddIntConstant(mod, "STRATEGY_BTULTRA2", ZSTD_btultra2); + + PyModule_AddIntConstant(mod, "DICT_TYPE_AUTO", ZSTD_dct_auto); + PyModule_AddIntConstant(mod, "DICT_TYPE_RAWCONTENT", ZSTD_dct_rawContent); + PyModule_AddIntConstant(mod, "DICT_TYPE_FULLDICT", ZSTD_dct_fullDict); + + PyModule_AddIntConstant(mod, "FORMAT_ZSTD1", ZSTD_f_zstd1); + PyModule_AddIntConstant(mod, "FORMAT_ZSTD1_MAGICLESS", ZSTD_f_zstd1_magicless); +} diff --git a/contrib/python/zstandard/py2/c-ext/decompressionreader.c b/contrib/python/zstandard/py2/c-ext/decompressionreader.c new file mode 100644 index 00000000000..792852f2abe --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/decompressionreader.c @@ -0,0 +1,781 @@ +/** +* Copyright (c) 2017-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +static void set_unsupported_operation(void) { + PyObject* iomod; + PyObject* exc; + + iomod = PyImport_ImportModule("io"); + if (NULL == iomod) { + return; + } + + exc = PyObject_GetAttrString(iomod, "UnsupportedOperation"); + if (NULL == exc) { + Py_DECREF(iomod); + return; + } + + PyErr_SetNone(exc); + Py_DECREF(exc); + Py_DECREF(iomod); +} + +static void reader_dealloc(ZstdDecompressionReader* self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + } + + PyObject_Del(self); +} + +static ZstdDecompressionReader* reader_enter(ZstdDecompressionReader* self) { + if (self->entered) { + PyErr_SetString(PyExc_ValueError, "cannot __enter__ multiple times"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return self; +} + +static PyObject* reader_exit(ZstdDecompressionReader* self, PyObject* args) { + PyObject* exc_type; + PyObject* exc_value; + PyObject* exc_tb; + + if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, &exc_tb)) { + return NULL; + } + + self->entered = 0; + self->closed = 1; + + /* Release resources. */ + Py_CLEAR(self->reader); + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + Py_CLEAR(self->decompressor); + + Py_RETURN_FALSE; +} + +static PyObject* reader_readable(PyObject* self) { + Py_RETURN_TRUE; +} + +static PyObject* reader_writable(PyObject* self) { + Py_RETURN_FALSE; +} + +static PyObject* reader_seekable(PyObject* self) { + Py_RETURN_TRUE; +} + +static PyObject* reader_close(ZstdDecompressionReader* self) { + self->closed = 1; + Py_RETURN_NONE; +} + +static PyObject* reader_flush(PyObject* self) { + Py_RETURN_NONE; +} + +static PyObject* reader_isatty(PyObject* self) { + Py_RETURN_FALSE; +} + +/** + * Read available input. + * + * Returns 0 if no data was added to input. + * Returns 1 if new input data is available. + * Returns -1 on error and sets a Python exception as a side-effect. + */ +int read_decompressor_input(ZstdDecompressionReader* self) { + if (self->finishedInput) { + return 0; + } + + if (self->input.pos != self->input.size) { + return 0; + } + + if (self->reader) { + Py_buffer buffer; + + assert(self->readResult == NULL); + self->readResult = PyObject_CallMethod(self->reader, "read", + "k", self->readSize); + if (NULL == self->readResult) { + return -1; + } + + memset(&buffer, 0, sizeof(buffer)); + + if (0 != PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) { + return -1; + } + + /* EOF */ + if (0 == buffer.len) { + self->finishedInput = 1; + Py_CLEAR(self->readResult); + } + else { + self->input.src = buffer.buf; + self->input.size = buffer.len; + self->input.pos = 0; + } + + PyBuffer_Release(&buffer); + } + else { + assert(self->buffer.buf); + /* + * We should only get here once since expectation is we always + * exhaust input buffer before reading again. + */ + assert(self->input.src == NULL); + + self->input.src = self->buffer.buf; + self->input.size = self->buffer.len; + self->input.pos = 0; + } + + return 1; +} + +/** + * Decompresses available input into an output buffer. + * + * Returns 0 if we need more input. + * Returns 1 if output buffer should be emitted. + * Returns -1 on error and sets a Python exception. + */ +int decompress_input(ZstdDecompressionReader* self, ZSTD_outBuffer* output) { + size_t zresult; + + if (self->input.pos >= self->input.size) { + return 0; + } + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->decompressor->dctx, output, &self->input); + Py_END_ALLOW_THREADS + + /* Input exhausted. Clear our state tracking. */ + if (self->input.pos == self->input.size) { + memset(&self->input, 0, sizeof(self->input)); + Py_CLEAR(self->readResult); + + if (self->buffer.buf) { + self->finishedInput = 1; + } + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompress error: %s", ZSTD_getErrorName(zresult)); + return -1; + } + + /* We fulfilled the full read request. Signal to emit. */ + if (output->pos && output->pos == output->size) { + return 1; + } + /* We're at the end of a frame and we aren't allowed to return data + spanning frames. */ + else if (output->pos && zresult == 0 && !self->readAcrossFrames) { + return 1; + } + + /* There is more room in the output. Signal to collect more data. */ + return 0; +} + +static PyObject* reader_read(ZstdDecompressionReader* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "size", + NULL + }; + + Py_ssize_t size = -1; + PyObject* result = NULL; + char* resultBuffer; + Py_ssize_t resultSize; + ZSTD_outBuffer output; + int decompressResult, readResult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1"); + return NULL; + } + + if (size == -1) { + return PyObject_CallMethod((PyObject*)self, "readall", NULL); + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + output.dst = resultBuffer; + output.size = resultSize; + output.pos = 0; + +readinput: + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == decompressResult) { } + else if (1 == decompressResult) { + self->bytesDecompressed += output.pos; + + if (output.pos != output.size) { + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + } + return result; + } + else { + assert(0); + } + + readResult = read_decompressor_input(self); + + if (-1 == readResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == readResult) {} + else if (1 == readResult) {} + else { + assert(0); + } + + if (self->input.size) { + goto readinput; + } + + /* EOF */ + self->bytesDecompressed += output.pos; + + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + + return result; +} + +static PyObject* reader_read1(ZstdDecompressionReader* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "size", + NULL + }; + + Py_ssize_t size = -1; + PyObject* result = NULL; + char* resultBuffer; + Py_ssize_t resultSize; + ZSTD_outBuffer output; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1"); + return NULL; + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + if (size == -1) { + size = ZSTD_DStreamOutSize(); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + output.dst = resultBuffer; + output.size = resultSize; + output.pos = 0; + + /* read1() is supposed to use at most 1 read() from the underlying stream. + * However, we can't satisfy this requirement with decompression due to the + * nature of how decompression works. Our strategy is to read + decompress + * until we get any output, at which point we return. This satisfies the + * intent of the read1() API to limit read operations. + */ + while (!self->finishedInput) { + int readResult, decompressResult; + + readResult = read_decompressor_input(self); + if (-1 == readResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == readResult || 1 == readResult) { } + else { + assert(0); + } + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == decompressResult || 1 == decompressResult) { } + else { + assert(0); + } + + if (output.pos) { + break; + } + } + + self->bytesDecompressed += output.pos; + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + + return result; +} + +static PyObject* reader_readinto(ZstdDecompressionReader* self, PyObject* args) { + Py_buffer dest; + ZSTD_outBuffer output; + int decompressResult, readResult; + PyObject* result = NULL; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto", &dest)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "destination buffer should be contiguous and have at most one dimension"); + goto finally; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + +readinput: + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + goto finally; + } + else if (0 == decompressResult) { } + else if (1 == decompressResult) { + self->bytesDecompressed += output.pos; + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + + readResult = read_decompressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult) {} + else if (1 == readResult) {} + else { + assert(0); + } + + if (self->input.size) { + goto readinput; + } + + /* EOF */ + self->bytesDecompressed += output.pos; + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject* reader_readinto1(ZstdDecompressionReader* self, PyObject* args) { + Py_buffer dest; + ZSTD_outBuffer output; + PyObject* result = NULL; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto1", &dest)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "destination buffer should be contiguous and have at most one dimension"); + goto finally; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + + while (!self->finishedInput && !self->finishedOutput) { + int decompressResult, readResult; + + readResult = read_decompressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult || 1 == readResult) {} + else { + assert(0); + } + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + goto finally; + } + else if (0 == decompressResult || 1 == decompressResult) {} + else { + assert(0); + } + + if (output.pos) { + break; + } + } + + self->bytesDecompressed += output.pos; + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject* reader_readall(PyObject* self) { + PyObject* chunks = NULL; + PyObject* empty = NULL; + PyObject* result = NULL; + + /* Our strategy is to collect chunks into a list then join all the + * chunks at the end. We could potentially use e.g. an io.BytesIO. But + * this feels simple enough to implement and avoids potentially expensive + * reallocations of large buffers. + */ + chunks = PyList_New(0); + if (NULL == chunks) { + return NULL; + } + + while (1) { + PyObject* chunk = PyObject_CallMethod(self, "read", "i", 1048576); + if (NULL == chunk) { + Py_DECREF(chunks); + return NULL; + } + + if (!PyBytes_Size(chunk)) { + Py_DECREF(chunk); + break; + } + + if (PyList_Append(chunks, chunk)) { + Py_DECREF(chunk); + Py_DECREF(chunks); + return NULL; + } + + Py_DECREF(chunk); + } + + empty = PyBytes_FromStringAndSize("", 0); + if (NULL == empty) { + Py_DECREF(chunks); + return NULL; + } + + result = PyObject_CallMethod(empty, "join", "O", chunks); + + Py_DECREF(empty); + Py_DECREF(chunks); + + return result; +} + +static PyObject* reader_readline(PyObject* self) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_readlines(PyObject* self) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_seek(ZstdDecompressionReader* self, PyObject* args) { + Py_ssize_t pos; + int whence = 0; + unsigned long long readAmount = 0; + size_t defaultOutSize = ZSTD_DStreamOutSize(); + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTuple(args, "n|i:seek", &pos, &whence)) { + return NULL; + } + + if (whence == SEEK_SET) { + if (pos < 0) { + PyErr_SetString(PyExc_ValueError, + "cannot seek to negative position with SEEK_SET"); + return NULL; + } + + if ((unsigned long long)pos < self->bytesDecompressed) { + PyErr_SetString(PyExc_ValueError, + "cannot seek zstd decompression stream backwards"); + return NULL; + } + + readAmount = pos - self->bytesDecompressed; + } + else if (whence == SEEK_CUR) { + if (pos < 0) { + PyErr_SetString(PyExc_ValueError, + "cannot seek zstd decompression stream backwards"); + return NULL; + } + + readAmount = pos; + } + else if (whence == SEEK_END) { + /* We /could/ support this with pos==0. But let's not do that until someone + needs it. */ + PyErr_SetString(PyExc_ValueError, + "zstd decompression streams cannot be seeked with SEEK_END"); + return NULL; + } + + /* It is a bit inefficient to do this via the Python API. But since there + is a bit of state tracking involved to read from this type, it is the + easiest to implement. */ + while (readAmount) { + Py_ssize_t readSize; + PyObject* readResult = PyObject_CallMethod((PyObject*)self, "read", "K", + readAmount < defaultOutSize ? readAmount : defaultOutSize); + + if (!readResult) { + return NULL; + } + + readSize = PyBytes_GET_SIZE(readResult); + + Py_CLEAR(readResult); + + /* Empty read means EOF. */ + if (!readSize) { + break; + } + + readAmount -= readSize; + } + + return PyLong_FromUnsignedLongLong(self->bytesDecompressed); +} + +static PyObject* reader_tell(ZstdDecompressionReader* self) { + /* TODO should this raise OSError since stream isn't seekable? */ + return PyLong_FromUnsignedLongLong(self->bytesDecompressed); +} + +static PyObject* reader_write(PyObject* self, PyObject* args) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_writelines(PyObject* self, PyObject* args) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_iter(PyObject* self) { + set_unsupported_operation(); + return NULL; +} + +static PyObject* reader_iternext(PyObject* self) { + set_unsupported_operation(); + return NULL; +} + +static PyMethodDef reader_methods[] = { + { "__enter__", (PyCFunction)reader_enter, METH_NOARGS, + PyDoc_STR("Enter a compression context") }, + { "__exit__", (PyCFunction)reader_exit, METH_VARARGS, + PyDoc_STR("Exit a compression context") }, + { "close", (PyCFunction)reader_close, METH_NOARGS, + PyDoc_STR("Close the stream so it cannot perform any more operations") }, + { "flush", (PyCFunction)reader_flush, METH_NOARGS, PyDoc_STR("no-ops") }, + { "isatty", (PyCFunction)reader_isatty, METH_NOARGS, PyDoc_STR("Returns False") }, + { "readable", (PyCFunction)reader_readable, METH_NOARGS, + PyDoc_STR("Returns True") }, + { "read", (PyCFunction)reader_read, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("read compressed data") }, + { "read1", (PyCFunction)reader_read1, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("read compressed data") }, + { "readinto", (PyCFunction)reader_readinto, METH_VARARGS, NULL }, + { "readinto1", (PyCFunction)reader_readinto1, METH_VARARGS, NULL }, + { "readall", (PyCFunction)reader_readall, METH_NOARGS, PyDoc_STR("Not implemented") }, + { "readline", (PyCFunction)reader_readline, METH_NOARGS, PyDoc_STR("Not implemented") }, + { "readlines", (PyCFunction)reader_readlines, METH_NOARGS, PyDoc_STR("Not implemented") }, + { "seek", (PyCFunction)reader_seek, METH_VARARGS, PyDoc_STR("Seek the stream") }, + { "seekable", (PyCFunction)reader_seekable, METH_NOARGS, + PyDoc_STR("Returns True") }, + { "tell", (PyCFunction)reader_tell, METH_NOARGS, + PyDoc_STR("Returns current number of bytes compressed") }, + { "writable", (PyCFunction)reader_writable, METH_NOARGS, + PyDoc_STR("Returns False") }, + { "write", (PyCFunction)reader_write, METH_VARARGS, PyDoc_STR("unsupported operation") }, + { "writelines", (PyCFunction)reader_writelines, METH_VARARGS, PyDoc_STR("unsupported operation") }, + { NULL, NULL } +}; + +static PyMemberDef reader_members[] = { + { "closed", T_BOOL, offsetof(ZstdDecompressionReader, closed), + READONLY, "whether stream is closed" }, + { NULL } +}; + +PyTypeObject ZstdDecompressionReaderType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressionReader", /* tp_name */ + sizeof(ZstdDecompressionReader), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)reader_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + 0, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + reader_iter, /* tp_iter */ + reader_iternext, /* tp_iternext */ + reader_methods, /* tp_methods */ + reader_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + + +void decompressionreader_module_init(PyObject* mod) { + /* TODO make reader a sub-class of io.RawIOBase */ + + Py_TYPE(&ZstdDecompressionReaderType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressionReaderType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/decompressionwriter.c b/contrib/python/zstandard/py2/c-ext/decompressionwriter.c new file mode 100644 index 00000000000..7df750ab06e --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/decompressionwriter.c @@ -0,0 +1,295 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdDecompressionWriter__doc, +"""A context manager used for writing decompressed output.\n" +); + +static void ZstdDecompressionWriter_dealloc(ZstdDecompressionWriter* self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->writer); + + PyObject_Del(self); +} + +static PyObject* ZstdDecompressionWriter_enter(ZstdDecompressionWriter* self) { + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->entered) { + PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return (PyObject*)self; +} + +static PyObject* ZstdDecompressionWriter_exit(ZstdDecompressionWriter* self, PyObject* args) { + self->entered = 0; + + if (NULL == PyObject_CallMethod((PyObject*)self, "close", NULL)) { + return NULL; + } + + Py_RETURN_FALSE; +} + +static PyObject* ZstdDecompressionWriter_memory_size(ZstdDecompressionWriter* self) { + return PyLong_FromSize_t(ZSTD_sizeof_DCtx(self->decompressor->dctx)); +} + +static PyObject* ZstdDecompressionWriter_write(ZstdDecompressionWriter* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + NULL + }; + + PyObject* result = NULL; + Py_buffer source; + size_t zresult = 0; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + PyObject* res; + Py_ssize_t totalWrite = 0; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:write", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:write", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + output.dst = PyMem_Malloc(self->outSize); + if (!output.dst) { + PyErr_NoMemory(); + goto finally; + } + output.size = self->outSize; + output.pos = 0; + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + while (input.pos < (size_t)source.len) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->decompressor->dctx, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyMem_Free(output.dst); + PyErr_Format(ZstdError, "zstd decompress error: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + res = PyObject_CallMethod(self->writer, "write", "y#", +#else + res = PyObject_CallMethod(self->writer, "write", "s#", +#endif + output.dst, output.pos); + Py_XDECREF(res); + totalWrite += output.pos; + output.pos = 0; + } + } + + PyMem_Free(output.dst); + + if (self->writeReturnRead) { + result = PyLong_FromSize_t(input.pos); + } + else { + result = PyLong_FromSsize_t(totalWrite); + } + +finally: + PyBuffer_Release(&source); + return result; +} + +static PyObject* ZstdDecompressionWriter_close(ZstdDecompressionWriter* self) { + PyObject* result; + + if (self->closed) { + Py_RETURN_NONE; + } + + result = PyObject_CallMethod((PyObject*)self, "flush", NULL); + self->closed = 1; + + if (NULL == result) { + return NULL; + } + + /* Call close on underlying stream as well. */ + if (PyObject_HasAttrString(self->writer, "close")) { + return PyObject_CallMethod(self->writer, "close", NULL); + } + + Py_RETURN_NONE; +} + +static PyObject* ZstdDecompressionWriter_fileno(ZstdDecompressionWriter* self) { + if (PyObject_HasAttrString(self->writer, "fileno")) { + return PyObject_CallMethod(self->writer, "fileno", NULL); + } + else { + PyErr_SetString(PyExc_OSError, "fileno not available on underlying writer"); + return NULL; + } +} + +static PyObject* ZstdDecompressionWriter_flush(ZstdDecompressionWriter* self) { + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (PyObject_HasAttrString(self->writer, "flush")) { + return PyObject_CallMethod(self->writer, "flush", NULL); + } + else { + Py_RETURN_NONE; + } +} + +static PyObject* ZstdDecompressionWriter_false(PyObject* self, PyObject* args) { + Py_RETURN_FALSE; +} + +static PyObject* ZstdDecompressionWriter_true(PyObject* self, PyObject* args) { + Py_RETURN_TRUE; +} + +static PyObject* ZstdDecompressionWriter_unsupported(PyObject* self, PyObject* args, PyObject* kwargs) { + PyObject* iomod; + PyObject* exc; + + iomod = PyImport_ImportModule("io"); + if (NULL == iomod) { + return NULL; + } + + exc = PyObject_GetAttrString(iomod, "UnsupportedOperation"); + if (NULL == exc) { + Py_DECREF(iomod); + return NULL; + } + + PyErr_SetNone(exc); + Py_DECREF(exc); + Py_DECREF(iomod); + + return NULL; +} + +static PyMethodDef ZstdDecompressionWriter_methods[] = { + { "__enter__", (PyCFunction)ZstdDecompressionWriter_enter, METH_NOARGS, + PyDoc_STR("Enter a decompression context.") }, + { "__exit__", (PyCFunction)ZstdDecompressionWriter_exit, METH_VARARGS, + PyDoc_STR("Exit a decompression context.") }, + { "memory_size", (PyCFunction)ZstdDecompressionWriter_memory_size, METH_NOARGS, + PyDoc_STR("Obtain the memory size in bytes of the underlying decompressor.") }, + { "close", (PyCFunction)ZstdDecompressionWriter_close, METH_NOARGS, NULL }, + { "fileno", (PyCFunction)ZstdDecompressionWriter_fileno, METH_NOARGS, NULL }, + { "flush", (PyCFunction)ZstdDecompressionWriter_flush, METH_NOARGS, NULL }, + { "isatty", ZstdDecompressionWriter_false, METH_NOARGS, NULL }, + { "readable", ZstdDecompressionWriter_false, METH_NOARGS, NULL }, + { "readline", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "readlines", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "seek", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "seekable", ZstdDecompressionWriter_false, METH_NOARGS, NULL }, + { "tell", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "truncate", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "writable", ZstdDecompressionWriter_true, METH_NOARGS, NULL }, + { "writelines" , (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "read", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "readall", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "readinto", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL }, + { "write", (PyCFunction)ZstdDecompressionWriter_write, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("Compress data") }, + { NULL, NULL } +}; + +static PyMemberDef ZstdDecompressionWriter_members[] = { + { "closed", T_BOOL, offsetof(ZstdDecompressionWriter, closed), READONLY, NULL }, + { NULL } +}; + +PyTypeObject ZstdDecompressionWriterType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressionWriter", /* tp_name */ + sizeof(ZstdDecompressionWriter),/* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdDecompressionWriter_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdDecompressionWriter__doc, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdDecompressionWriter_methods,/* tp_methods */ + ZstdDecompressionWriter_members,/* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressionwriter_module_init(PyObject* mod) { + Py_TYPE(&ZstdDecompressionWriterType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressionWriterType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/decompressobj.c b/contrib/python/zstandard/py2/c-ext/decompressobj.c new file mode 100644 index 00000000000..2a55e61f18b --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/decompressobj.c @@ -0,0 +1,202 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(DecompressionObj__doc__, +"Perform decompression using a standard library compatible API.\n" +); + +static void DecompressionObj_dealloc(ZstdDecompressionObj* self) { + Py_XDECREF(self->decompressor); + + PyObject_Del(self); +} + +static PyObject* DecompressionObj_decompress(ZstdDecompressionObj* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + NULL + }; + + Py_buffer source; + size_t zresult; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + PyObject* result = NULL; + Py_ssize_t resultSize = 0; + + output.dst = NULL; + + if (self->finished) { + PyErr_SetString(ZstdError, "cannot use a decompressobj multiple times"); + return NULL; + } + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:decompress", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:decompress", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + /* Special case of empty input. Output will always be empty. */ + if (source.len == 0) { + result = PyBytes_FromString(""); + goto finally; + } + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + output.dst = PyMem_Malloc(self->outSize); + if (!output.dst) { + PyErr_NoMemory(); + goto except; + } + output.size = self->outSize; + output.pos = 0; + + while (1) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->decompressor->dctx, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompressor error: %s", + ZSTD_getErrorName(zresult)); + goto except; + } + + if (0 == zresult) { + self->finished = 1; + } + + if (output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + if (-1 == safe_pybytes_resize(&result, resultSize + output.pos)) { + Py_XDECREF(result); + goto except; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, + output.dst, output.pos); + } + else { + result = PyBytes_FromStringAndSize(output.dst, output.pos); + if (!result) { + goto except; + } + } + } + + if (zresult == 0 || (input.pos == input.size && output.pos == 0)) { + break; + } + + output.pos = 0; + } + + if (!result) { + result = PyBytes_FromString(""); + } + + goto finally; + +except: + Py_CLEAR(result); + +finally: + PyMem_Free(output.dst); + PyBuffer_Release(&source); + + return result; +} + +static PyObject* DecompressionObj_flush(ZstdDecompressionObj* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "length", + NULL + }; + + PyObject* length = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O:flush", kwlist, &length)) { + return NULL; + } + + Py_RETURN_NONE; +} + +static PyMethodDef DecompressionObj_methods[] = { + { "decompress", (PyCFunction)DecompressionObj_decompress, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("decompress data") }, + { "flush", (PyCFunction)DecompressionObj_flush, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("no-op") }, + { NULL, NULL } +}; + +PyTypeObject ZstdDecompressionObjType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressionObj", /* tp_name */ + sizeof(ZstdDecompressionObj), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)DecompressionObj_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + DecompressionObj__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + DecompressionObj_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressobj_module_init(PyObject* module) { + Py_TYPE(&ZstdDecompressionObjType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressionObjType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/decompressor.c b/contrib/python/zstandard/py2/c-ext/decompressor.c new file mode 100644 index 00000000000..b1ef959ec58 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/decompressor.c @@ -0,0 +1,1828 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" +#include "pool.h" + +extern PyObject* ZstdError; + +/** + * Ensure the ZSTD_DCtx on a decompressor is initiated and ready for a new operation. + */ +int ensure_dctx(ZstdDecompressor* decompressor, int loadDict) { + size_t zresult; + + ZSTD_DCtx_reset(decompressor->dctx, ZSTD_reset_session_only); + + if (decompressor->maxWindowSize) { + zresult = ZSTD_DCtx_setMaxWindowSize(decompressor->dctx, decompressor->maxWindowSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to set max window size: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + } + + zresult = ZSTD_DCtx_setFormat(decompressor->dctx, decompressor->format); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to set decoding format: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + + if (loadDict && decompressor->dict) { + if (ensure_ddict(decompressor->dict)) { + return 1; + } + + zresult = ZSTD_DCtx_refDDict(decompressor->dctx, decompressor->dict->ddict); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to reference prepared dictionary: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + } + + return 0; +} + +PyDoc_STRVAR(Decompressor__doc__, +"ZstdDecompressor(dict_data=None)\n" +"\n" +"Create an object used to perform Zstandard decompression.\n" +"\n" +"An instance can perform multiple decompression operations." +); + +static int Decompressor_init(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "dict_data", + "max_window_size", + "format", + NULL + }; + + ZstdCompressionDict* dict = NULL; + Py_ssize_t maxWindowSize = 0; + ZSTD_format_e format = ZSTD_f_zstd1; + + self->dctx = NULL; + self->dict = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O!nI:ZstdDecompressor", kwlist, + &ZstdCompressionDictType, &dict, &maxWindowSize, &format)) { + return -1; + } + + self->dctx = ZSTD_createDCtx(); + if (!self->dctx) { + PyErr_NoMemory(); + goto except; + } + + self->maxWindowSize = maxWindowSize; + self->format = format; + + if (dict) { + self->dict = dict; + Py_INCREF(dict); + } + + if (ensure_dctx(self, 1)) { + goto except; + } + + return 0; + +except: + Py_CLEAR(self->dict); + + if (self->dctx) { + ZSTD_freeDCtx(self->dctx); + self->dctx = NULL; + } + + return -1; +} + +static void Decompressor_dealloc(ZstdDecompressor* self) { + Py_CLEAR(self->dict); + + if (self->dctx) { + ZSTD_freeDCtx(self->dctx); + self->dctx = NULL; + } + + PyObject_Del(self); +} + +PyDoc_STRVAR(Decompressor_memory_size__doc__, +"memory_size() -- Size of decompression context, in bytes\n" +); + +static PyObject* Decompressor_memory_size(ZstdDecompressor* self) { + if (self->dctx) { + return PyLong_FromSize_t(ZSTD_sizeof_DCtx(self->dctx)); + } + else { + PyErr_SetString(ZstdError, "no decompressor context found; this should never happen"); + return NULL; + } +} + +PyDoc_STRVAR(Decompressor_copy_stream__doc__, + "copy_stream(ifh, ofh[, read_size=default, write_size=default]) -- decompress data between streams\n" + "\n" + "Compressed data will be read from ``ifh``, decompressed, and written to\n" + "``ofh``. ``ifh`` must have a ``read(size)`` method. ``ofh`` must have a\n" + "``write(data)`` method.\n" + "\n" + "The optional ``read_size`` and ``write_size`` arguments control the chunk\n" + "size of data that is ``read()`` and ``write()`` between streams. They default\n" + "to the default input and output sizes of zstd decompressor streams.\n" +); + +static PyObject* Decompressor_copy_stream(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "ifh", + "ofh", + "read_size", + "write_size", + NULL + }; + + PyObject* source; + PyObject* dest; + size_t inSize = ZSTD_DStreamInSize(); + size_t outSize = ZSTD_DStreamOutSize(); + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t totalRead = 0; + Py_ssize_t totalWrite = 0; + char* readBuffer; + Py_ssize_t readSize; + PyObject* readResult = NULL; + PyObject* res = NULL; + size_t zresult = 0; + PyObject* writeResult; + PyObject* totalReadPy; + PyObject* totalWritePy; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|kk:copy_stream", kwlist, + &source, &dest, &inSize, &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(source, "read")) { + PyErr_SetString(PyExc_ValueError, "first argument must have a read() method"); + return NULL; + } + + if (!PyObject_HasAttrString(dest, "write")) { + PyErr_SetString(PyExc_ValueError, "second argument must have a write() method"); + return NULL; + } + + /* Prevent free on uninitialized memory in finally. */ + output.dst = NULL; + + if (ensure_dctx(self, 1)) { + res = NULL; + goto finally; + } + + output.dst = PyMem_Malloc(outSize); + if (!output.dst) { + PyErr_NoMemory(); + res = NULL; + goto finally; + } + output.size = outSize; + output.pos = 0; + + /* Read source stream until EOF */ + while (1) { + readResult = PyObject_CallMethod(source, "read", "n", inSize); + if (!readResult) { + PyErr_SetString(ZstdError, "could not read() from source"); + goto finally; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + + /* If no data was read, we're at EOF. */ + if (0 == readSize) { + break; + } + + totalRead += readSize; + + /* Send data to decompressor */ + input.src = readBuffer; + input.size = readSize; + input.pos = 0; + + while (input.pos < input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dctx, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompressor error: %s", + ZSTD_getErrorName(zresult)); + res = NULL; + goto finally; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + writeResult = PyObject_CallMethod(dest, "write", "y#", +#else + writeResult = PyObject_CallMethod(dest, "write", "s#", +#endif + output.dst, output.pos); + + Py_XDECREF(writeResult); + totalWrite += output.pos; + output.pos = 0; + } + } + + Py_CLEAR(readResult); + } + + /* Source stream is exhausted. Finish up. */ + + totalReadPy = PyLong_FromSsize_t(totalRead); + totalWritePy = PyLong_FromSsize_t(totalWrite); + res = PyTuple_Pack(2, totalReadPy, totalWritePy); + Py_DECREF(totalReadPy); + Py_DECREF(totalWritePy); + +finally: + if (output.dst) { + PyMem_Free(output.dst); + } + + Py_XDECREF(readResult); + + return res; +} + +PyDoc_STRVAR(Decompressor_decompress__doc__, +"decompress(data[, max_output_size=None]) -- Decompress data in its entirety\n" +"\n" +"This method will decompress the entirety of the argument and return the\n" +"result.\n" +"\n" +"The input bytes are expected to contain a full Zstandard frame (something\n" +"compressed with ``ZstdCompressor.compress()`` or similar). If the input does\n" +"not contain a full frame, an exception will be raised.\n" +"\n" +"If the frame header of the compressed data does not contain the content size\n" +"``max_output_size`` must be specified or ``ZstdError`` will be raised. An\n" +"allocation of size ``max_output_size`` will be performed and an attempt will\n" +"be made to perform decompression into that buffer. If the buffer is too\n" +"small or cannot be allocated, ``ZstdError`` will be raised. The buffer will\n" +"be resized if it is too large.\n" +"\n" +"Uncompressed data could be much larger than compressed data. As a result,\n" +"calling this function could result in a very large memory allocation being\n" +"performed to hold the uncompressed data. Therefore it is **highly**\n" +"recommended to use a streaming decompression method instead of this one.\n" +); + +PyObject* Decompressor_decompress(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + "max_output_size", + NULL + }; + + Py_buffer source; + Py_ssize_t maxOutputSize = 0; + unsigned long long decompressedSize; + size_t destCapacity; + PyObject* result = NULL; + size_t zresult; + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|n:decompress", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*|n:decompress", +#endif + kwlist, &source, &maxOutputSize)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + if (ensure_dctx(self, 1)) { + goto finally; + } + + decompressedSize = ZSTD_getFrameContentSize(source.buf, source.len); + + if (ZSTD_CONTENTSIZE_ERROR == decompressedSize) { + PyErr_SetString(ZstdError, "error determining content size from frame header"); + goto finally; + } + /* Special case of empty frame. */ + else if (0 == decompressedSize) { + result = PyBytes_FromStringAndSize("", 0); + goto finally; + } + /* Missing content size in frame header. */ + if (ZSTD_CONTENTSIZE_UNKNOWN == decompressedSize) { + if (0 == maxOutputSize) { + PyErr_SetString(ZstdError, "could not determine content size in frame header"); + goto finally; + } + + result = PyBytes_FromStringAndSize(NULL, maxOutputSize); + destCapacity = maxOutputSize; + decompressedSize = 0; + } + /* Size is recorded in frame header. */ + else { + assert(SIZE_MAX >= PY_SSIZE_T_MAX); + if (decompressedSize > PY_SSIZE_T_MAX) { + PyErr_SetString(ZstdError, "frame is too large to decompress on this platform"); + goto finally; + } + + result = PyBytes_FromStringAndSize(NULL, (Py_ssize_t)decompressedSize); + destCapacity = (size_t)decompressedSize; + } + + if (!result) { + goto finally; + } + + outBuffer.dst = PyBytes_AsString(result); + outBuffer.size = destCapacity; + outBuffer.pos = 0; + + inBuffer.src = source.buf; + inBuffer.size = source.len; + inBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "decompression error: %s", ZSTD_getErrorName(zresult)); + Py_CLEAR(result); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, "decompression error: did not decompress full frame"); + Py_CLEAR(result); + goto finally; + } + else if (decompressedSize && outBuffer.pos != decompressedSize) { + PyErr_Format(ZstdError, "decompression error: decompressed %zu bytes; expected %llu", + zresult, decompressedSize); + Py_CLEAR(result); + goto finally; + } + else if (outBuffer.pos < destCapacity) { + if (safe_pybytes_resize(&result, outBuffer.pos)) { + Py_CLEAR(result); + goto finally; + } + } + +finally: + PyBuffer_Release(&source); + return result; +} + +PyDoc_STRVAR(Decompressor_decompressobj__doc__, +"decompressobj([write_size=default])\n" +"\n" +"Incrementally feed data into a decompressor.\n" +"\n" +"The returned object exposes a ``decompress(data)`` method. This makes it\n" +"compatible with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor`` so that\n" +"callers can swap in the zstd decompressor while using the same API.\n" +); + +static ZstdDecompressionObj* Decompressor_decompressobj(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "write_size", + NULL + }; + + ZstdDecompressionObj* result = NULL; + size_t outSize = ZSTD_DStreamOutSize(); + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|k:decompressobj", kwlist, &outSize)) { + return NULL; + } + + if (!outSize) { + PyErr_SetString(PyExc_ValueError, "write_size must be positive"); + return NULL; + } + + result = (ZstdDecompressionObj*)PyObject_CallObject((PyObject*)&ZstdDecompressionObjType, NULL); + if (!result) { + return NULL; + } + + if (ensure_dctx(self, 1)) { + Py_DECREF(result); + return NULL; + } + + result->decompressor = self; + Py_INCREF(result->decompressor); + result->outSize = outSize; + + return result; +} + +PyDoc_STRVAR(Decompressor_read_to_iter__doc__, +"read_to_iter(reader[, read_size=default, write_size=default, skip_bytes=0])\n" +"Read compressed data and return an iterator\n" +"\n" +"Returns an iterator of decompressed data chunks produced from reading from\n" +"the ``reader``.\n" +"\n" +"Compressed data will be obtained from ``reader`` by calling the\n" +"``read(size)`` method of it. The source data will be streamed into a\n" +"decompressor. As decompressed data is available, it will be exposed to the\n" +"returned iterator.\n" +"\n" +"Data is ``read()`` in chunks of size ``read_size`` and exposed to the\n" +"iterator in chunks of size ``write_size``. The default values are the input\n" +"and output sizes for a zstd streaming decompressor.\n" +"\n" +"There is also support for skipping the first ``skip_bytes`` of data from\n" +"the source.\n" +); + +static ZstdDecompressorIterator* Decompressor_read_to_iter(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "reader", + "read_size", + "write_size", + "skip_bytes", + NULL + }; + + PyObject* reader; + size_t inSize = ZSTD_DStreamInSize(); + size_t outSize = ZSTD_DStreamOutSize(); + ZstdDecompressorIterator* result; + size_t skipBytes = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kkk:read_to_iter", kwlist, + &reader, &inSize, &outSize, &skipBytes)) { + return NULL; + } + + if (skipBytes >= inSize) { + PyErr_SetString(PyExc_ValueError, + "skip_bytes must be smaller than read_size"); + return NULL; + } + + result = (ZstdDecompressorIterator*)PyObject_CallObject((PyObject*)&ZstdDecompressorIteratorType, NULL); + if (!result) { + return NULL; + } + + if (PyObject_HasAttrString(reader, "read")) { + result->reader = reader; + Py_INCREF(result->reader); + } + else if (1 == PyObject_CheckBuffer(reader)) { + /* Object claims it is a buffer. Try to get a handle to it. */ + if (0 != PyObject_GetBuffer(reader, &result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + } + else { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a read() method or conforms to buffer protocol"); + goto except; + } + + result->decompressor = self; + Py_INCREF(result->decompressor); + + result->inSize = inSize; + result->outSize = outSize; + result->skipBytes = skipBytes; + + if (ensure_dctx(self, 1)) { + goto except; + } + + result->input.src = PyMem_Malloc(inSize); + if (!result->input.src) { + PyErr_NoMemory(); + goto except; + } + + goto finally; + +except: + Py_CLEAR(result); + +finally: + + return result; +} + +PyDoc_STRVAR(Decompressor_stream_reader__doc__, +"stream_reader(source, [read_size=default, [read_across_frames=False]])\n" +"\n" +"Obtain an object that behaves like an I/O stream that can be used for\n" +"reading decompressed output from an object.\n" +"\n" +"The source object can be any object with a ``read(size)`` method or that\n" +"conforms to the buffer protocol.\n" +"\n" +"``read_across_frames`` controls the behavior of ``read()`` when the end\n" +"of a zstd frame is reached. When ``True``, ``read()`` can potentially\n" +"return data belonging to multiple zstd frames. When ``False``, ``read()``\n" +"will return when the end of a frame is reached.\n" +); + +static ZstdDecompressionReader* Decompressor_stream_reader(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "source", + "read_size", + "read_across_frames", + NULL + }; + + PyObject* source; + size_t readSize = ZSTD_DStreamInSize(); + PyObject* readAcrossFrames = NULL; + ZstdDecompressionReader* result; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kO:stream_reader", kwlist, + &source, &readSize, &readAcrossFrames)) { + return NULL; + } + + if (ensure_dctx(self, 1)) { + return NULL; + } + + result = (ZstdDecompressionReader*)PyObject_CallObject((PyObject*)&ZstdDecompressionReaderType, NULL); + if (NULL == result) { + return NULL; + } + + result->entered = 0; + result->closed = 0; + + if (PyObject_HasAttrString(source, "read")) { + result->reader = source; + Py_INCREF(source); + result->readSize = readSize; + } + else if (1 == PyObject_CheckBuffer(source)) { + if (0 != PyObject_GetBuffer(source, &result->buffer, PyBUF_CONTIG_RO)) { + Py_CLEAR(result); + return NULL; + } + } + else { + PyErr_SetString(PyExc_TypeError, + "must pass an object with a read() method or that conforms to the buffer protocol"); + Py_CLEAR(result); + return NULL; + } + + result->decompressor = self; + Py_INCREF(self); + result->readAcrossFrames = readAcrossFrames ? PyObject_IsTrue(readAcrossFrames) : 0; + + return result; +} + +PyDoc_STRVAR(Decompressor_stream_writer__doc__, +"Create a context manager to write decompressed data to an object.\n" +"\n" +"The passed object must have a ``write()`` method.\n" +"\n" +"The caller feeds intput data to the object by calling ``write(data)``.\n" +"Decompressed data is written to the argument given as it is decompressed.\n" +"\n" +"An optional ``write_size`` argument defines the size of chunks to\n" +"``write()`` to the writer. It defaults to the default output size for a zstd\n" +"streaming decompressor.\n" +); + +static ZstdDecompressionWriter* Decompressor_stream_writer(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "writer", + "write_size", + "write_return_read", + NULL + }; + + PyObject* writer; + size_t outSize = ZSTD_DStreamOutSize(); + PyObject* writeReturnRead = NULL; + ZstdDecompressionWriter* result; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kO:stream_writer", kwlist, + &writer, &outSize, &writeReturnRead)) { + return NULL; + } + + if (!PyObject_HasAttrString(writer, "write")) { + PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method"); + return NULL; + } + + if (ensure_dctx(self, 1)) { + return NULL; + } + + result = (ZstdDecompressionWriter*)PyObject_CallObject((PyObject*)&ZstdDecompressionWriterType, NULL); + if (!result) { + return NULL; + } + + result->entered = 0; + result->closed = 0; + + result->decompressor = self; + Py_INCREF(result->decompressor); + + result->writer = writer; + Py_INCREF(result->writer); + + result->outSize = outSize; + result->writeReturnRead = writeReturnRead ? PyObject_IsTrue(writeReturnRead) : 0; + + return result; +} + +PyDoc_STRVAR(Decompressor_decompress_content_dict_chain__doc__, +"Decompress a series of chunks using the content dictionary chaining technique\n" +); + +static PyObject* Decompressor_decompress_content_dict_chain(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "frames", + NULL + }; + + PyObject* chunks; + Py_ssize_t chunksLen; + Py_ssize_t chunkIndex; + char parity = 0; + PyObject* chunk; + char* chunkData; + Py_ssize_t chunkSize; + size_t zresult; + ZSTD_frameHeader frameHeader; + void* buffer1 = NULL; + size_t buffer1Size = 0; + size_t buffer1ContentSize = 0; + void* buffer2 = NULL; + size_t buffer2Size = 0; + size_t buffer2ContentSize = 0; + void* destBuffer = NULL; + PyObject* result = NULL; + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O!:decompress_content_dict_chain", + kwlist, &PyList_Type, &chunks)) { + return NULL; + } + + chunksLen = PyList_Size(chunks); + if (!chunksLen) { + PyErr_SetString(PyExc_ValueError, "empty input chain"); + return NULL; + } + + /* The first chunk should not be using a dictionary. We handle it specially. */ + chunk = PyList_GetItem(chunks, 0); + if (!PyBytes_Check(chunk)) { + PyErr_SetString(PyExc_ValueError, "chunk 0 must be bytes"); + return NULL; + } + + /* We require that all chunks be zstd frames and that they have content size set. */ + PyBytes_AsStringAndSize(chunk, &chunkData, &chunkSize); + zresult = ZSTD_getFrameHeader(&frameHeader, (void*)chunkData, chunkSize); + if (ZSTD_isError(zresult)) { + PyErr_SetString(PyExc_ValueError, "chunk 0 is not a valid zstd frame"); + return NULL; + } + else if (zresult) { + PyErr_SetString(PyExc_ValueError, "chunk 0 is too small to contain a zstd frame"); + return NULL; + } + + if (ZSTD_CONTENTSIZE_UNKNOWN == frameHeader.frameContentSize) { + PyErr_SetString(PyExc_ValueError, "chunk 0 missing content size in frame"); + return NULL; + } + + assert(ZSTD_CONTENTSIZE_ERROR != frameHeader.frameContentSize); + + /* We check against PY_SSIZE_T_MAX here because we ultimately cast the + * result to a Python object and it's length can be no greater than + * Py_ssize_t. In theory, we could have an intermediate frame that is + * larger. But a) why would this API be used for frames that large b) + * it isn't worth the complexity to support. */ + assert(SIZE_MAX >= PY_SSIZE_T_MAX); + if (frameHeader.frameContentSize > PY_SSIZE_T_MAX) { + PyErr_SetString(PyExc_ValueError, + "chunk 0 is too large to decompress on this platform"); + return NULL; + } + + if (ensure_dctx(self, 0)) { + goto finally; + } + + buffer1Size = (size_t)frameHeader.frameContentSize; + buffer1 = PyMem_Malloc(buffer1Size); + if (!buffer1) { + goto finally; + } + + outBuffer.dst = buffer1; + outBuffer.size = buffer1Size; + outBuffer.pos = 0; + + inBuffer.src = chunkData; + inBuffer.size = chunkSize; + inBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not decompress chunk 0: %s", ZSTD_getErrorName(zresult)); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, "chunk 0 did not decompress full frame"); + goto finally; + } + + buffer1ContentSize = outBuffer.pos; + + /* Special case of a simple chain. */ + if (1 == chunksLen) { + result = PyBytes_FromStringAndSize(buffer1, buffer1Size); + goto finally; + } + + /* This should ideally look at next chunk. But this is slightly simpler. */ + buffer2Size = (size_t)frameHeader.frameContentSize; + buffer2 = PyMem_Malloc(buffer2Size); + if (!buffer2) { + goto finally; + } + + /* For each subsequent chunk, use the previous fulltext as a content dictionary. + Our strategy is to have 2 buffers. One holds the previous fulltext (to be + used as a content dictionary) and the other holds the new fulltext. The + buffers grow when needed but never decrease in size. This limits the + memory allocator overhead. + */ + for (chunkIndex = 1; chunkIndex < chunksLen; chunkIndex++) { + chunk = PyList_GetItem(chunks, chunkIndex); + if (!PyBytes_Check(chunk)) { + PyErr_Format(PyExc_ValueError, "chunk %zd must be bytes", chunkIndex); + goto finally; + } + + PyBytes_AsStringAndSize(chunk, &chunkData, &chunkSize); + zresult = ZSTD_getFrameHeader(&frameHeader, (void*)chunkData, chunkSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(PyExc_ValueError, "chunk %zd is not a valid zstd frame", chunkIndex); + goto finally; + } + else if (zresult) { + PyErr_Format(PyExc_ValueError, "chunk %zd is too small to contain a zstd frame", chunkIndex); + goto finally; + } + + if (ZSTD_CONTENTSIZE_UNKNOWN == frameHeader.frameContentSize) { + PyErr_Format(PyExc_ValueError, "chunk %zd missing content size in frame", chunkIndex); + goto finally; + } + + assert(ZSTD_CONTENTSIZE_ERROR != frameHeader.frameContentSize); + + if (frameHeader.frameContentSize > PY_SSIZE_T_MAX) { + PyErr_Format(PyExc_ValueError, + "chunk %zd is too large to decompress on this platform", chunkIndex); + goto finally; + } + + inBuffer.src = chunkData; + inBuffer.size = chunkSize; + inBuffer.pos = 0; + + parity = chunkIndex % 2; + + /* This could definitely be abstracted to reduce code duplication. */ + if (parity) { + /* Resize destination buffer to hold larger content. */ + if (buffer2Size < frameHeader.frameContentSize) { + buffer2Size = (size_t)frameHeader.frameContentSize; + destBuffer = PyMem_Realloc(buffer2, buffer2Size); + if (!destBuffer) { + goto finally; + } + buffer2 = destBuffer; + } + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_DCtx_refPrefix_advanced(self->dctx, + buffer1, buffer1ContentSize, ZSTD_dct_rawContent); + Py_END_ALLOW_THREADS + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, + "failed to load prefix dictionary at chunk %zd", chunkIndex); + goto finally; + } + + outBuffer.dst = buffer2; + outBuffer.size = buffer2Size; + outBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not decompress chunk %zd: %s", + chunkIndex, ZSTD_getErrorName(zresult)); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, "chunk %zd did not decompress full frame", + chunkIndex); + goto finally; + } + + buffer2ContentSize = outBuffer.pos; + } + else { + if (buffer1Size < frameHeader.frameContentSize) { + buffer1Size = (size_t)frameHeader.frameContentSize; + destBuffer = PyMem_Realloc(buffer1, buffer1Size); + if (!destBuffer) { + goto finally; + } + buffer1 = destBuffer; + } + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_DCtx_refPrefix_advanced(self->dctx, + buffer2, buffer2ContentSize, ZSTD_dct_rawContent); + Py_END_ALLOW_THREADS + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, + "failed to load prefix dictionary at chunk %zd", chunkIndex); + goto finally; + } + + outBuffer.dst = buffer1; + outBuffer.size = buffer1Size; + outBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not decompress chunk %zd: %s", + chunkIndex, ZSTD_getErrorName(zresult)); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, "chunk %zd did not decompress full frame", + chunkIndex); + goto finally; + } + + buffer1ContentSize = outBuffer.pos; + } + } + + result = PyBytes_FromStringAndSize(parity ? buffer2 : buffer1, + parity ? buffer2ContentSize : buffer1ContentSize); + +finally: + if (buffer2) { + PyMem_Free(buffer2); + } + if (buffer1) { + PyMem_Free(buffer1); + } + + return result; +} + +typedef struct { + void* sourceData; + size_t sourceSize; + size_t destSize; +} FramePointer; + +typedef struct { + FramePointer* frames; + Py_ssize_t framesSize; + unsigned long long compressedSize; +} FrameSources; + +typedef struct { + void* dest; + Py_ssize_t destSize; + BufferSegment* segments; + Py_ssize_t segmentsSize; +} DestBuffer; + +typedef enum { + WorkerError_none = 0, + WorkerError_zstd = 1, + WorkerError_memory = 2, + WorkerError_sizeMismatch = 3, + WorkerError_unknownSize = 4, +} WorkerError; + +typedef struct { + /* Source records and length */ + FramePointer* framePointers; + /* Which records to process. */ + Py_ssize_t startOffset; + Py_ssize_t endOffset; + unsigned long long totalSourceSize; + + /* Compression state and settings. */ + ZSTD_DCtx* dctx; + int requireOutputSizes; + + /* Output storage. */ + DestBuffer* destBuffers; + Py_ssize_t destCount; + + /* Item that error occurred on. */ + Py_ssize_t errorOffset; + /* If an error occurred. */ + WorkerError error; + /* result from zstd decompression operation */ + size_t zresult; +} WorkerState; + +static void decompress_worker(WorkerState* state) { + size_t allocationSize; + DestBuffer* destBuffer; + Py_ssize_t frameIndex; + Py_ssize_t localOffset = 0; + Py_ssize_t currentBufferStartIndex = state->startOffset; + Py_ssize_t remainingItems = state->endOffset - state->startOffset + 1; + void* tmpBuf; + Py_ssize_t destOffset = 0; + FramePointer* framePointers = state->framePointers; + size_t zresult; + unsigned long long totalOutputSize = 0; + + assert(NULL == state->destBuffers); + assert(0 == state->destCount); + assert(state->endOffset - state->startOffset >= 0); + + /* We could get here due to the way work is allocated. Ideally we wouldn't + get here. But that would require a bit of a refactor in the caller. */ + if (state->totalSourceSize > SIZE_MAX) { + state->error = WorkerError_memory; + state->errorOffset = 0; + return; + } + + /* + * We need to allocate a buffer to hold decompressed data. How we do this + * depends on what we know about the output. The following scenarios are + * possible: + * + * 1. All structs defining frames declare the output size. + * 2. The decompressed size is embedded within the zstd frame. + * 3. The decompressed size is not stored anywhere. + * + * For now, we only support #1 and #2. + */ + + /* Resolve ouput segments. */ + for (frameIndex = state->startOffset; frameIndex <= state->endOffset; frameIndex++) { + FramePointer* fp = &framePointers[frameIndex]; + unsigned long long decompressedSize; + + if (0 == fp->destSize) { + decompressedSize = ZSTD_getFrameContentSize(fp->sourceData, fp->sourceSize); + + if (ZSTD_CONTENTSIZE_ERROR == decompressedSize) { + state->error = WorkerError_unknownSize; + state->errorOffset = frameIndex; + return; + } + else if (ZSTD_CONTENTSIZE_UNKNOWN == decompressedSize) { + if (state->requireOutputSizes) { + state->error = WorkerError_unknownSize; + state->errorOffset = frameIndex; + return; + } + + /* This will fail the assert for .destSize > 0 below. */ + decompressedSize = 0; + } + + if (decompressedSize > SIZE_MAX) { + state->error = WorkerError_memory; + state->errorOffset = frameIndex; + return; + } + + fp->destSize = (size_t)decompressedSize; + } + + totalOutputSize += fp->destSize; + } + + state->destBuffers = calloc(1, sizeof(DestBuffer)); + if (NULL == state->destBuffers) { + state->error = WorkerError_memory; + return; + } + + state->destCount = 1; + + destBuffer = &state->destBuffers[state->destCount - 1]; + + assert(framePointers[state->startOffset].destSize > 0); /* For now. */ + + allocationSize = roundpow2((size_t)state->totalSourceSize); + + if (framePointers[state->startOffset].destSize > allocationSize) { + allocationSize = roundpow2(framePointers[state->startOffset].destSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = WorkerError_memory; + return; + } + + destBuffer->destSize = allocationSize; + + destBuffer->segments = calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + /* Caller will free state->dest as part of cleanup. */ + state->error = WorkerError_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + + for (frameIndex = state->startOffset; frameIndex <= state->endOffset; frameIndex++) { + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + const void* source = framePointers[frameIndex].sourceData; + const size_t sourceSize = framePointers[frameIndex].sourceSize; + void* dest; + const size_t decompressedSize = framePointers[frameIndex].destSize; + size_t destAvailable = destBuffer->destSize - destOffset; + + assert(decompressedSize > 0); /* For now. */ + + /* + * Not enough space in current buffer. Finish current before and allocate and + * switch to a new one. + */ + if (decompressedSize > destAvailable) { + /* + * Shrinking the destination buffer is optional. But it should be cheap, + * so we just do it. + */ + if (destAvailable) { + tmpBuf = realloc(destBuffer->dest, destOffset); + if (NULL == tmpBuf) { + state->error = WorkerError_memory; + return; + } + + destBuffer->dest = tmpBuf; + destBuffer->destSize = destOffset; + } + + /* Truncate segments buffer. */ + tmpBuf = realloc(destBuffer->segments, + (frameIndex - currentBufferStartIndex) * sizeof(BufferSegment)); + if (NULL == tmpBuf) { + state->error = WorkerError_memory; + return; + } + + destBuffer->segments = tmpBuf; + destBuffer->segmentsSize = frameIndex - currentBufferStartIndex; + + /* Grow space for new DestBuffer. */ + tmpBuf = realloc(state->destBuffers, (state->destCount + 1) * sizeof(DestBuffer)); + if (NULL == tmpBuf) { + state->error = WorkerError_memory; + return; + } + + state->destBuffers = tmpBuf; + state->destCount++; + + destBuffer = &state->destBuffers[state->destCount - 1]; + + /* Don't take any chances will non-NULL pointers. */ + memset(destBuffer, 0, sizeof(DestBuffer)); + + allocationSize = roundpow2((size_t)state->totalSourceSize); + + if (decompressedSize > allocationSize) { + allocationSize = roundpow2(decompressedSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = WorkerError_memory; + return; + } + + destBuffer->destSize = allocationSize; + destAvailable = allocationSize; + destOffset = 0; + localOffset = 0; + + destBuffer->segments = calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + state->error = WorkerError_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + currentBufferStartIndex = frameIndex; + } + + dest = (char*)destBuffer->dest + destOffset; + + outBuffer.dst = dest; + outBuffer.size = decompressedSize; + outBuffer.pos = 0; + + inBuffer.src = source; + inBuffer.size = sourceSize; + inBuffer.pos = 0; + + zresult = ZSTD_decompressStream(state->dctx, &outBuffer, &inBuffer); + if (ZSTD_isError(zresult)) { + state->error = WorkerError_zstd; + state->zresult = zresult; + state->errorOffset = frameIndex; + return; + } + else if (zresult || outBuffer.pos != decompressedSize) { + state->error = WorkerError_sizeMismatch; + state->zresult = outBuffer.pos; + state->errorOffset = frameIndex; + return; + } + + destBuffer->segments[localOffset].offset = destOffset; + destBuffer->segments[localOffset].length = outBuffer.pos; + destOffset += outBuffer.pos; + localOffset++; + remainingItems--; + } + + if (destBuffer->destSize > destOffset) { + tmpBuf = realloc(destBuffer->dest, destOffset); + if (NULL == tmpBuf) { + state->error = WorkerError_memory; + return; + } + + destBuffer->dest = tmpBuf; + destBuffer->destSize = destOffset; + } +} + +ZstdBufferWithSegmentsCollection* decompress_from_framesources(ZstdDecompressor* decompressor, FrameSources* frames, + Py_ssize_t threadCount) { + Py_ssize_t i = 0; + int errored = 0; + Py_ssize_t segmentsCount; + ZstdBufferWithSegments* bws = NULL; + PyObject* resultArg = NULL; + Py_ssize_t resultIndex; + ZstdBufferWithSegmentsCollection* result = NULL; + FramePointer* framePointers = frames->frames; + unsigned long long workerBytes = 0; + Py_ssize_t currentThread = 0; + Py_ssize_t workerStartOffset = 0; + POOL_ctx* pool = NULL; + WorkerState* workerStates = NULL; + unsigned long long bytesPerWorker; + + /* Caller should normalize 0 and negative values to 1 or larger. */ + assert(threadCount >= 1); + + /* More threads than inputs makes no sense under any conditions. */ + threadCount = frames->framesSize < threadCount ? frames->framesSize + : threadCount; + + /* TODO lower thread count if input size is too small and threads would just + add overhead. */ + + if (decompressor->dict) { + if (ensure_ddict(decompressor->dict)) { + return NULL; + } + } + + /* If threadCount==1, we don't start a thread pool. But we do leverage the + same API for dispatching work. */ + workerStates = PyMem_Malloc(threadCount * sizeof(WorkerState)); + if (NULL == workerStates) { + PyErr_NoMemory(); + goto finally; + } + + memset(workerStates, 0, threadCount * sizeof(WorkerState)); + + if (threadCount > 1) { + pool = POOL_create(threadCount, 1); + if (NULL == pool) { + PyErr_SetString(ZstdError, "could not initialize zstd thread pool"); + goto finally; + } + } + + bytesPerWorker = frames->compressedSize / threadCount; + + if (bytesPerWorker > SIZE_MAX) { + PyErr_SetString(ZstdError, "too much data per worker for this platform"); + goto finally; + } + + for (i = 0; i < threadCount; i++) { + size_t zresult; + + workerStates[i].dctx = ZSTD_createDCtx(); + if (NULL == workerStates[i].dctx) { + PyErr_NoMemory(); + goto finally; + } + + ZSTD_copyDCtx(workerStates[i].dctx, decompressor->dctx); + + if (decompressor->dict) { + zresult = ZSTD_DCtx_refDDict(workerStates[i].dctx, decompressor->dict->ddict); + if (zresult) { + PyErr_Format(ZstdError, "unable to reference prepared dictionary: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + } + + workerStates[i].framePointers = framePointers; + workerStates[i].requireOutputSizes = 1; + } + + Py_BEGIN_ALLOW_THREADS + /* There are many ways to split work among workers. + + For now, we take a simple approach of splitting work so each worker + gets roughly the same number of input bytes. This will result in more + starvation than running N>threadCount jobs. But it avoids complications + around state tracking, which could involve extra locking. + */ + for (i = 0; i < frames->framesSize; i++) { + workerBytes += frames->frames[i].sourceSize; + + /* + * The last worker/thread needs to handle all remaining work. Don't + * trigger it prematurely. Defer to the block outside of the loop. + * (But still process this loop so workerBytes is correct. + */ + if (currentThread == threadCount - 1) { + continue; + } + + if (workerBytes >= bytesPerWorker) { + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = i; + workerStates[currentThread].totalSourceSize = workerBytes; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)decompress_worker, &workerStates[currentThread]); + } + else { + decompress_worker(&workerStates[currentThread]); + } + currentThread++; + workerStartOffset = i + 1; + workerBytes = 0; + } + } + + if (workerBytes) { + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = frames->framesSize - 1; + workerStates[currentThread].totalSourceSize = workerBytes; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)decompress_worker, &workerStates[currentThread]); + } + else { + decompress_worker(&workerStates[currentThread]); + } + } + + if (threadCount > 1) { + POOL_free(pool); + pool = NULL; + } + Py_END_ALLOW_THREADS + + for (i = 0; i < threadCount; i++) { + switch (workerStates[i].error) { + case WorkerError_none: + break; + + case WorkerError_zstd: + PyErr_Format(ZstdError, "error decompressing item %zd: %s", + workerStates[i].errorOffset, ZSTD_getErrorName(workerStates[i].zresult)); + errored = 1; + break; + + case WorkerError_memory: + PyErr_NoMemory(); + errored = 1; + break; + + case WorkerError_sizeMismatch: + PyErr_Format(ZstdError, "error decompressing item %zd: decompressed %zu bytes; expected %zu", + workerStates[i].errorOffset, workerStates[i].zresult, + framePointers[workerStates[i].errorOffset].destSize); + errored = 1; + break; + + case WorkerError_unknownSize: + PyErr_Format(PyExc_ValueError, "could not determine decompressed size of item %zd", + workerStates[i].errorOffset); + errored = 1; + break; + + default: + PyErr_Format(ZstdError, "unhandled error type: %d; this is a bug", + workerStates[i].error); + errored = 1; + break; + } + + if (errored) { + break; + } + } + + if (errored) { + goto finally; + } + + segmentsCount = 0; + for (i = 0; i < threadCount; i++) { + segmentsCount += workerStates[i].destCount; + } + + resultArg = PyTuple_New(segmentsCount); + if (NULL == resultArg) { + goto finally; + } + + resultIndex = 0; + + for (i = 0; i < threadCount; i++) { + Py_ssize_t bufferIndex; + WorkerState* state = &workerStates[i]; + + for (bufferIndex = 0; bufferIndex < state->destCount; bufferIndex++) { + DestBuffer* destBuffer = &state->destBuffers[bufferIndex]; + + bws = BufferWithSegments_FromMemory(destBuffer->dest, destBuffer->destSize, + destBuffer->segments, destBuffer->segmentsSize); + if (NULL == bws) { + goto finally; + } + + /* + * Memory for buffer and segments was allocated using malloc() in worker + * and the memory is transferred to the BufferWithSegments instance. So + * tell instance to use free() and NULL the reference in the state struct + * so it isn't freed below. + */ + bws->useFree = 1; + destBuffer->dest = NULL; + destBuffer->segments = NULL; + + PyTuple_SET_ITEM(resultArg, resultIndex++, (PyObject*)bws); + } + } + + result = (ZstdBufferWithSegmentsCollection*)PyObject_CallObject( + (PyObject*)&ZstdBufferWithSegmentsCollectionType, resultArg); + +finally: + Py_CLEAR(resultArg); + + if (workerStates) { + for (i = 0; i < threadCount; i++) { + Py_ssize_t bufferIndex; + WorkerState* state = &workerStates[i]; + + if (state->dctx) { + ZSTD_freeDCtx(state->dctx); + } + + for (bufferIndex = 0; bufferIndex < state->destCount; bufferIndex++) { + if (state->destBuffers) { + /* + * Will be NULL if memory transfered to a BufferWithSegments. + * Otherwise it is left over after an error occurred. + */ + free(state->destBuffers[bufferIndex].dest); + free(state->destBuffers[bufferIndex].segments); + } + } + + free(state->destBuffers); + } + + PyMem_Free(workerStates); + } + + POOL_free(pool); + + return result; +} + +PyDoc_STRVAR(Decompressor_multi_decompress_to_buffer__doc__, +"Decompress multiple frames to output buffers\n" +"\n" +"Receives a ``BufferWithSegments``, a ``BufferWithSegmentsCollection`` or a\n" +"list of bytes-like objects. Each item in the passed collection should be a\n" +"compressed zstd frame.\n" +"\n" +"Unless ``decompressed_sizes`` is specified, the content size *must* be\n" +"written into the zstd frame header. If ``decompressed_sizes`` is specified,\n" +"it is an object conforming to the buffer protocol that represents an array\n" +"of 64-bit unsigned integers in the machine's native format. Specifying\n" +"``decompressed_sizes`` avoids a pre-scan of each frame to determine its\n" +"output size.\n" +"\n" +"Returns a ``BufferWithSegmentsCollection`` containing the decompressed\n" +"data. All decompressed data is allocated in a single memory buffer. The\n" +"``BufferWithSegments`` instance tracks which objects are at which offsets\n" +"and their respective lengths.\n" +"\n" +"The ``threads`` argument controls how many threads to use for operations.\n" +"Negative values will use the same number of threads as logical CPUs on the\n" +"machine.\n" +); + +static ZstdBufferWithSegmentsCollection* Decompressor_multi_decompress_to_buffer(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "frames", + "decompressed_sizes", + "threads", + NULL + }; + + PyObject* frames; + Py_buffer frameSizes; + int threads = 0; + Py_ssize_t frameCount; + Py_buffer* frameBuffers = NULL; + FramePointer* framePointers = NULL; + unsigned long long* frameSizesP = NULL; + unsigned long long totalInputSize = 0; + FrameSources frameSources; + ZstdBufferWithSegmentsCollection* result = NULL; + Py_ssize_t i; + + memset(&frameSizes, 0, sizeof(frameSizes)); + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|y*i:multi_decompress_to_buffer", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|s*i:multi_decompress_to_buffer", +#endif + kwlist, &frames, &frameSizes, &threads)) { + return NULL; + } + + if (frameSizes.buf) { + if (!PyBuffer_IsContiguous(&frameSizes, 'C') || frameSizes.ndim > 1) { + PyErr_SetString(PyExc_ValueError, "decompressed_sizes buffer should be contiguous and have a single dimension"); + goto finally; + } + + frameSizesP = (unsigned long long*)frameSizes.buf; + } + + if (threads < 0) { + threads = cpu_count(); + } + + if (threads < 2) { + threads = 1; + } + + if (PyObject_TypeCheck(frames, &ZstdBufferWithSegmentsType)) { + ZstdBufferWithSegments* buffer = (ZstdBufferWithSegments*)frames; + frameCount = buffer->segmentCount; + + if (frameSizes.buf && frameSizes.len != frameCount * (Py_ssize_t)sizeof(unsigned long long)) { + PyErr_Format(PyExc_ValueError, "decompressed_sizes size mismatch; expected %zd, got %zd", + frameCount * sizeof(unsigned long long), frameSizes.len); + goto finally; + } + + framePointers = PyMem_Malloc(frameCount * sizeof(FramePointer)); + if (!framePointers) { + PyErr_NoMemory(); + goto finally; + } + + for (i = 0; i < frameCount; i++) { + void* sourceData; + unsigned long long sourceSize; + unsigned long long decompressedSize = 0; + + if (buffer->segments[i].offset + buffer->segments[i].length > buffer->dataSize) { + PyErr_Format(PyExc_ValueError, "item %zd has offset outside memory area", i); + goto finally; + } + + sourceData = (char*)buffer->data + buffer->segments[i].offset; + sourceSize = buffer->segments[i].length; + totalInputSize += sourceSize; + + if (frameSizesP) { + decompressedSize = frameSizesP[i]; + } + + if (sourceSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "item %zd is too large for this platform", i); + goto finally; + } + + if (decompressedSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "decompressed size of item %zd is too large for this platform", i); + goto finally; + } + + framePointers[i].sourceData = sourceData; + framePointers[i].sourceSize = (size_t)sourceSize; + framePointers[i].destSize = (size_t)decompressedSize; + } + } + else if (PyObject_TypeCheck(frames, &ZstdBufferWithSegmentsCollectionType)) { + Py_ssize_t offset = 0; + ZstdBufferWithSegments* buffer; + ZstdBufferWithSegmentsCollection* collection = (ZstdBufferWithSegmentsCollection*)frames; + + frameCount = BufferWithSegmentsCollection_length(collection); + + if (frameSizes.buf && frameSizes.len != frameCount) { + PyErr_Format(PyExc_ValueError, + "decompressed_sizes size mismatch; expected %zd; got %zd", + frameCount * sizeof(unsigned long long), frameSizes.len); + goto finally; + } + + framePointers = PyMem_Malloc(frameCount * sizeof(FramePointer)); + if (NULL == framePointers) { + PyErr_NoMemory(); + goto finally; + } + + /* Iterate the data structure directly because it is faster. */ + for (i = 0; i < collection->bufferCount; i++) { + Py_ssize_t segmentIndex; + buffer = collection->buffers[i]; + + for (segmentIndex = 0; segmentIndex < buffer->segmentCount; segmentIndex++) { + unsigned long long decompressedSize = frameSizesP ? frameSizesP[offset] : 0; + + if (buffer->segments[segmentIndex].offset + buffer->segments[segmentIndex].length > buffer->dataSize) { + PyErr_Format(PyExc_ValueError, "item %zd has offset outside memory area", + offset); + goto finally; + } + + if (buffer->segments[segmentIndex].length > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "item %zd in buffer %zd is too large for this platform", + segmentIndex, i); + goto finally; + } + + if (decompressedSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "decompressed size of item %zd in buffer %zd is too large for this platform", + segmentIndex, i); + goto finally; + } + + totalInputSize += buffer->segments[segmentIndex].length; + + framePointers[offset].sourceData = (char*)buffer->data + buffer->segments[segmentIndex].offset; + framePointers[offset].sourceSize = (size_t)buffer->segments[segmentIndex].length; + framePointers[offset].destSize = (size_t)decompressedSize; + + offset++; + } + } + } + else if (PyList_Check(frames)) { + frameCount = PyList_GET_SIZE(frames); + + if (frameSizes.buf && frameSizes.len != frameCount * (Py_ssize_t)sizeof(unsigned long long)) { + PyErr_Format(PyExc_ValueError, "decompressed_sizes size mismatch; expected %zd, got %zd", + frameCount * sizeof(unsigned long long), frameSizes.len); + goto finally; + } + + framePointers = PyMem_Malloc(frameCount * sizeof(FramePointer)); + if (!framePointers) { + PyErr_NoMemory(); + goto finally; + } + + frameBuffers = PyMem_Malloc(frameCount * sizeof(Py_buffer)); + if (NULL == frameBuffers) { + PyErr_NoMemory(); + goto finally; + } + + memset(frameBuffers, 0, frameCount * sizeof(Py_buffer)); + + /* Do a pass to assemble info about our input buffers and output sizes. */ + for (i = 0; i < frameCount; i++) { + unsigned long long decompressedSize = frameSizesP ? frameSizesP[i] : 0; + + if (0 != PyObject_GetBuffer(PyList_GET_ITEM(frames, i), + &frameBuffers[i], PyBUF_CONTIG_RO)) { + PyErr_Clear(); + PyErr_Format(PyExc_TypeError, "item %zd not a bytes like object", i); + goto finally; + } + + if (decompressedSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "decompressed size of item %zd is too large for this platform", i); + goto finally; + } + + totalInputSize += frameBuffers[i].len; + + framePointers[i].sourceData = frameBuffers[i].buf; + framePointers[i].sourceSize = frameBuffers[i].len; + framePointers[i].destSize = (size_t)decompressedSize; + } + } + else { + PyErr_SetString(PyExc_TypeError, "argument must be list or BufferWithSegments"); + goto finally; + } + + /* We now have an array with info about our inputs and outputs. Feed it into + our generic decompression function. */ + frameSources.frames = framePointers; + frameSources.framesSize = frameCount; + frameSources.compressedSize = totalInputSize; + + result = decompress_from_framesources(self, &frameSources, threads); + +finally: + if (frameSizes.buf) { + PyBuffer_Release(&frameSizes); + } + PyMem_Free(framePointers); + + if (frameBuffers) { + for (i = 0; i < frameCount; i++) { + PyBuffer_Release(&frameBuffers[i]); + } + + PyMem_Free(frameBuffers); + } + + return result; +} + +static PyMethodDef Decompressor_methods[] = { + { "copy_stream", (PyCFunction)Decompressor_copy_stream, METH_VARARGS | METH_KEYWORDS, + Decompressor_copy_stream__doc__ }, + { "decompress", (PyCFunction)Decompressor_decompress, METH_VARARGS | METH_KEYWORDS, + Decompressor_decompress__doc__ }, + { "decompressobj", (PyCFunction)Decompressor_decompressobj, METH_VARARGS | METH_KEYWORDS, + Decompressor_decompressobj__doc__ }, + { "read_to_iter", (PyCFunction)Decompressor_read_to_iter, METH_VARARGS | METH_KEYWORDS, + Decompressor_read_to_iter__doc__ }, + /* TODO Remove deprecated API */ + { "read_from", (PyCFunction)Decompressor_read_to_iter, METH_VARARGS | METH_KEYWORDS, + Decompressor_read_to_iter__doc__ }, + { "stream_reader", (PyCFunction)Decompressor_stream_reader, + METH_VARARGS | METH_KEYWORDS, Decompressor_stream_reader__doc__ }, + { "stream_writer", (PyCFunction)Decompressor_stream_writer, METH_VARARGS | METH_KEYWORDS, + Decompressor_stream_writer__doc__ }, + /* TODO remove deprecated API */ + { "write_to", (PyCFunction)Decompressor_stream_writer, METH_VARARGS | METH_KEYWORDS, + Decompressor_stream_writer__doc__ }, + { "decompress_content_dict_chain", (PyCFunction)Decompressor_decompress_content_dict_chain, + METH_VARARGS | METH_KEYWORDS, Decompressor_decompress_content_dict_chain__doc__ }, + { "multi_decompress_to_buffer", (PyCFunction)Decompressor_multi_decompress_to_buffer, + METH_VARARGS | METH_KEYWORDS, Decompressor_multi_decompress_to_buffer__doc__ }, + { "memory_size", (PyCFunction)Decompressor_memory_size, METH_NOARGS, + Decompressor_memory_size__doc__ }, + { NULL, NULL } +}; + +PyTypeObject ZstdDecompressorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressor", /* tp_name */ + sizeof(ZstdDecompressor), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)Decompressor_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + Decompressor__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + Decompressor_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)Decompressor_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressor_module_init(PyObject* mod) { + Py_TYPE(&ZstdDecompressorType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressorType) < 0) { + return; + } + + Py_INCREF((PyObject*)&ZstdDecompressorType); + PyModule_AddObject(mod, "ZstdDecompressor", + (PyObject*)&ZstdDecompressorType); +} diff --git a/contrib/python/zstandard/py2/c-ext/decompressoriterator.c b/contrib/python/zstandard/py2/c-ext/decompressoriterator.c new file mode 100644 index 00000000000..54b56581de1 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/decompressoriterator.c @@ -0,0 +1,249 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +#define min(a, b) (((a) < (b)) ? (a) : (b)) + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdDecompressorIterator__doc__, +"Represents an iterator of decompressed data.\n" +); + +static void ZstdDecompressorIterator_dealloc(ZstdDecompressorIterator* self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + if (self->input.src) { + PyMem_Free((void*)self->input.src); + self->input.src = NULL; + } + + PyObject_Del(self); +} + +static PyObject* ZstdDecompressorIterator_iter(PyObject* self) { + Py_INCREF(self); + return self; +} + +static DecompressorIteratorResult read_decompressor_iterator(ZstdDecompressorIterator* self) { + size_t zresult; + PyObject* chunk; + DecompressorIteratorResult result; + size_t oldInputPos = self->input.pos; + + result.chunk = NULL; + + chunk = PyBytes_FromStringAndSize(NULL, self->outSize); + if (!chunk) { + result.errored = 1; + return result; + } + + self->output.dst = PyBytes_AsString(chunk); + self->output.size = self->outSize; + self->output.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->decompressor->dctx, &self->output, &self->input); + Py_END_ALLOW_THREADS + + /* We're done with the pointer. Nullify to prevent anyone from getting a + handle on a Python object. */ + self->output.dst = NULL; + + if (ZSTD_isError(zresult)) { + Py_DECREF(chunk); + PyErr_Format(ZstdError, "zstd decompress error: %s", + ZSTD_getErrorName(zresult)); + result.errored = 1; + return result; + } + + self->readCount += self->input.pos - oldInputPos; + + /* Frame is fully decoded. Input exhausted and output sitting in buffer. */ + if (0 == zresult) { + self->finishedInput = 1; + self->finishedOutput = 1; + } + + /* If it produced output data, return it. */ + if (self->output.pos) { + if (self->output.pos < self->outSize) { + if (safe_pybytes_resize(&chunk, self->output.pos)) { + Py_XDECREF(chunk); + result.errored = 1; + return result; + } + } + } + else { + Py_DECREF(chunk); + chunk = NULL; + } + + result.errored = 0; + result.chunk = chunk; + + return result; +} + +static PyObject* ZstdDecompressorIterator_iternext(ZstdDecompressorIterator* self) { + PyObject* readResult = NULL; + char* readBuffer; + Py_ssize_t readSize; + Py_ssize_t bufferRemaining; + DecompressorIteratorResult result; + + if (self->finishedOutput) { + PyErr_SetString(PyExc_StopIteration, "output flushed"); + return NULL; + } + + /* If we have data left in the input, consume it. */ + if (self->input.pos < self->input.size) { + result = read_decompressor_iterator(self); + if (result.chunk || result.errored) { + return result.chunk; + } + + /* Else fall through to get more data from input. */ + } + +read_from_source: + + if (!self->finishedInput) { + if (self->reader) { + readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize); + if (!readResult) { + return NULL; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + } + else { + assert(self->buffer.buf); + + /* Only support contiguous C arrays for now */ + assert(self->buffer.strides == NULL && self->buffer.suboffsets == NULL); + assert(self->buffer.itemsize == 1); + + /* TODO avoid memcpy() below */ + readBuffer = (char *)self->buffer.buf + self->bufferOffset; + bufferRemaining = self->buffer.len - self->bufferOffset; + readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); + self->bufferOffset += readSize; + } + + if (readSize) { + if (!self->readCount && self->skipBytes) { + assert(self->skipBytes < self->inSize); + if ((Py_ssize_t)self->skipBytes >= readSize) { + PyErr_SetString(PyExc_ValueError, + "skip_bytes larger than first input chunk; " + "this scenario is currently unsupported"); + Py_XDECREF(readResult); + return NULL; + } + + readBuffer = readBuffer + self->skipBytes; + readSize -= self->skipBytes; + } + + /* Copy input into previously allocated buffer because it can live longer + than a single function call and we don't want to keep a ref to a Python + object around. This could be changed... */ + memcpy((void*)self->input.src, readBuffer, readSize); + self->input.size = readSize; + self->input.pos = 0; + } + /* No bytes on first read must mean an empty input stream. */ + else if (!self->readCount) { + self->finishedInput = 1; + self->finishedOutput = 1; + Py_XDECREF(readResult); + PyErr_SetString(PyExc_StopIteration, "empty input"); + return NULL; + } + else { + self->finishedInput = 1; + } + + /* We've copied the data managed by memory. Discard the Python object. */ + Py_XDECREF(readResult); + } + + result = read_decompressor_iterator(self); + if (result.errored || result.chunk) { + return result.chunk; + } + + /* No new output data. Try again unless we know there is no more data. */ + if (!self->finishedInput) { + goto read_from_source; + } + + PyErr_SetString(PyExc_StopIteration, "input exhausted"); + return NULL; +} + +PyTypeObject ZstdDecompressorIteratorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressorIterator", /* tp_name */ + sizeof(ZstdDecompressorIterator), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdDecompressorIterator_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdDecompressorIterator__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + ZstdDecompressorIterator_iter, /* tp_iter */ + (iternextfunc)ZstdDecompressorIterator_iternext, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressoriterator_module_init(PyObject* mod) { + Py_TYPE(&ZstdDecompressorIteratorType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressorIteratorType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py2/c-ext/frameparams.c b/contrib/python/zstandard/py2/c-ext/frameparams.c new file mode 100644 index 00000000000..35ca3ca5900 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/frameparams.c @@ -0,0 +1,138 @@ +/** +* Copyright (c) 2017-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(FrameParameters__doc__, + "FrameParameters: information about a zstd frame"); + +FrameParametersObject* get_frame_parameters(PyObject* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + NULL + }; + + Py_buffer source; + ZSTD_frameHeader header; + FrameParametersObject* result = NULL; + size_t zresult; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:get_frame_parameters", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:get_frame_parameters", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + zresult = ZSTD_getFrameHeader(&header, source.buf, source.len); + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "cannot get frame parameters: %s", ZSTD_getErrorName(zresult)); + goto finally; + } + + if (zresult) { + PyErr_Format(ZstdError, "not enough data for frame parameters; need %zu bytes", zresult); + goto finally; + } + + result = PyObject_New(FrameParametersObject, &FrameParametersType); + if (!result) { + goto finally; + } + + result->frameContentSize = header.frameContentSize; + result->windowSize = header.windowSize; + result->dictID = header.dictID; + result->checksumFlag = header.checksumFlag ? 1 : 0; + +finally: + PyBuffer_Release(&source); + return result; +} + +static void FrameParameters_dealloc(PyObject* self) { + PyObject_Del(self); +} + +static PyMemberDef FrameParameters_members[] = { + { "content_size", T_ULONGLONG, + offsetof(FrameParametersObject, frameContentSize), READONLY, + "frame content size" }, + { "window_size", T_ULONGLONG, + offsetof(FrameParametersObject, windowSize), READONLY, + "window size" }, + { "dict_id", T_UINT, + offsetof(FrameParametersObject, dictID), READONLY, + "dictionary ID" }, + { "has_checksum", T_BOOL, + offsetof(FrameParametersObject, checksumFlag), READONLY, + "checksum flag" }, + { NULL } +}; + +PyTypeObject FrameParametersType = { + PyVarObject_HEAD_INIT(NULL, 0) + "FrameParameters", /* tp_name */ + sizeof(FrameParametersObject), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)FrameParameters_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + FrameParameters__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + 0, /* tp_methods */ + FrameParameters_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + 0, /* tp_new */ +}; + +void frameparams_module_init(PyObject* mod) { + Py_TYPE(&FrameParametersType) = &PyType_Type; + if (PyType_Ready(&FrameParametersType) < 0) { + return; + } + + Py_INCREF(&FrameParametersType); + PyModule_AddObject(mod, "FrameParameters", (PyObject*)&FrameParametersType); +} diff --git a/contrib/python/zstandard/py2/c-ext/python-zstandard.h b/contrib/python/zstandard/py2/c-ext/python-zstandard.h new file mode 100644 index 00000000000..bd1cb4dcad8 --- /dev/null +++ b/contrib/python/zstandard/py2/c-ext/python-zstandard.h @@ -0,0 +1,359 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#define PY_SSIZE_T_CLEAN +#include <Python.h> +#include "structmember.h" + +#define ZSTD_STATIC_LINKING_ONLY +#define ZDICT_STATIC_LINKING_ONLY +#include <zstd.h> +#include <zdict.h> + +/* Remember to change the string in zstandard/__init__ as well */ +#define PYTHON_ZSTANDARD_VERSION "0.14.1" + +typedef enum { + compressorobj_flush_finish, + compressorobj_flush_block, +} CompressorObj_Flush; + +/* + Represents a ZstdCompressionParameters type. + + This type holds all the low-level compression parameters that can be set. +*/ +typedef struct { + PyObject_HEAD + ZSTD_CCtx_params* params; +} ZstdCompressionParametersObject; + +extern PyTypeObject ZstdCompressionParametersType; + +/* + Represents a FrameParameters type. + + This type is basically a wrapper around ZSTD_frameParams. +*/ +typedef struct { + PyObject_HEAD + unsigned long long frameContentSize; + unsigned long long windowSize; + unsigned dictID; + char checksumFlag; +} FrameParametersObject; + +extern PyTypeObject FrameParametersType; + +/* + Represents a ZstdCompressionDict type. + + Instances hold data used for a zstd compression dictionary. +*/ +typedef struct { + PyObject_HEAD + + /* Pointer to dictionary data. Owned by self. */ + void* dictData; + /* Size of dictionary data. */ + size_t dictSize; + ZSTD_dictContentType_e dictType; + /* k parameter for cover dictionaries. Only populated by train_cover_dict(). */ + unsigned k; + /* d parameter for cover dictionaries. Only populated by train_cover_dict(). */ + unsigned d; + /* Digested dictionary, suitable for reuse. */ + ZSTD_CDict* cdict; + ZSTD_DDict* ddict; +} ZstdCompressionDict; + +extern PyTypeObject ZstdCompressionDictType; + +/* + Represents a ZstdCompressor type. +*/ +typedef struct { + PyObject_HEAD + + /* Number of threads to use for operations. */ + unsigned int threads; + /* Pointer to compression dictionary to use. NULL if not using dictionary + compression. */ + ZstdCompressionDict* dict; + /* Compression context to use. Populated during object construction. */ + ZSTD_CCtx* cctx; + /* Compression parameters in use. */ + ZSTD_CCtx_params* params; +} ZstdCompressor; + +extern PyTypeObject ZstdCompressorType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + ZSTD_outBuffer output; + int finished; +} ZstdCompressionObj; + +extern PyTypeObject ZstdCompressionObjType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + PyObject* writer; + ZSTD_outBuffer output; + size_t outSize; + int entered; + int closed; + int writeReturnRead; + unsigned long long bytesCompressed; +} ZstdCompressionWriter; + +extern PyTypeObject ZstdCompressionWriterType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + PyObject* reader; + Py_buffer buffer; + Py_ssize_t bufferOffset; + size_t inSize; + size_t outSize; + + ZSTD_inBuffer input; + ZSTD_outBuffer output; + int finishedOutput; + int finishedInput; + PyObject* readResult; +} ZstdCompressorIterator; + +extern PyTypeObject ZstdCompressorIteratorType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + PyObject* reader; + Py_buffer buffer; + size_t readSize; + + int entered; + int closed; + unsigned long long bytesCompressed; + + ZSTD_inBuffer input; + ZSTD_outBuffer output; + int finishedInput; + int finishedOutput; + PyObject* readResult; +} ZstdCompressionReader; + +extern PyTypeObject ZstdCompressionReaderType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_buffer inBuffer; + int finished; + size_t chunkSize; +} ZstdCompressionChunker; + +extern PyTypeObject ZstdCompressionChunkerType; + +typedef enum { + compressionchunker_mode_normal, + compressionchunker_mode_flush, + compressionchunker_mode_finish, +} CompressionChunkerMode; + +typedef struct { + PyObject_HEAD + + ZstdCompressionChunker* chunker; + CompressionChunkerMode mode; +} ZstdCompressionChunkerIterator; + +extern PyTypeObject ZstdCompressionChunkerIteratorType; + +typedef struct { + PyObject_HEAD + + ZSTD_DCtx* dctx; + ZstdCompressionDict* dict; + size_t maxWindowSize; + ZSTD_format_e format; +} ZstdDecompressor; + +extern PyTypeObject ZstdDecompressorType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor* decompressor; + size_t outSize; + int finished; +} ZstdDecompressionObj; + +extern PyTypeObject ZstdDecompressionObjType; + +typedef struct { + PyObject_HEAD + + /* Parent decompressor to which this object is associated. */ + ZstdDecompressor* decompressor; + /* Object to read() from (if reading from a stream). */ + PyObject* reader; + /* Size for read() operations on reader. */ + size_t readSize; + /* Whether a read() can return data spanning multiple zstd frames. */ + int readAcrossFrames; + /* Buffer to read from (if reading from a buffer). */ + Py_buffer buffer; + + /* Whether the context manager is active. */ + int entered; + /* Whether we've closed the stream. */ + int closed; + + /* Number of bytes decompressed and returned to user. */ + unsigned long long bytesDecompressed; + + /* Tracks data going into decompressor. */ + ZSTD_inBuffer input; + + /* Holds output from read() operation on reader. */ + PyObject* readResult; + + /* Whether all input has been sent to the decompressor. */ + int finishedInput; + /* Whether all output has been flushed from the decompressor. */ + int finishedOutput; +} ZstdDecompressionReader; + +extern PyTypeObject ZstdDecompressionReaderType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor* decompressor; + PyObject* writer; + size_t outSize; + int entered; + int closed; + int writeReturnRead; +} ZstdDecompressionWriter; + +extern PyTypeObject ZstdDecompressionWriterType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor* decompressor; + PyObject* reader; + Py_buffer buffer; + Py_ssize_t bufferOffset; + size_t inSize; + size_t outSize; + size_t skipBytes; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t readCount; + int finishedInput; + int finishedOutput; +} ZstdDecompressorIterator; + +extern PyTypeObject ZstdDecompressorIteratorType; + +typedef struct { + int errored; + PyObject* chunk; +} DecompressorIteratorResult; + +typedef struct { + /* The public API is that these are 64-bit unsigned integers. So these can't + * be size_t, even though values larger than SIZE_MAX or PY_SSIZE_T_MAX may + * be nonsensical for this platform. */ + unsigned long long offset; + unsigned long long length; +} BufferSegment; + +typedef struct { + PyObject_HEAD + + PyObject* parent; + BufferSegment* segments; + Py_ssize_t segmentCount; +} ZstdBufferSegments; + +extern PyTypeObject ZstdBufferSegmentsType; + +typedef struct { + PyObject_HEAD + + PyObject* parent; + void* data; + Py_ssize_t dataSize; + unsigned long long offset; +} ZstdBufferSegment; + +extern PyTypeObject ZstdBufferSegmentType; + +typedef struct { + PyObject_HEAD + + Py_buffer parent; + void* data; + unsigned long long dataSize; + BufferSegment* segments; + Py_ssize_t segmentCount; + int useFree; +} ZstdBufferWithSegments; + +extern PyTypeObject ZstdBufferWithSegmentsType; + +/** + * An ordered collection of BufferWithSegments exposed as a squashed collection. + * + * This type provides a virtual view spanning multiple BufferWithSegments + * instances. It allows multiple instances to be "chained" together and + * exposed as a single collection. e.g. if there are 2 buffers holding + * 10 segments each, then o[14] will access the 5th segment in the 2nd buffer. + */ +typedef struct { + PyObject_HEAD + + /* An array of buffers that should be exposed through this instance. */ + ZstdBufferWithSegments** buffers; + /* Number of elements in buffers array. */ + Py_ssize_t bufferCount; + /* Array of first offset in each buffer instance. 0th entry corresponds + to number of elements in the 0th buffer. 1st entry corresponds to the + sum of elements in 0th and 1st buffers. */ + Py_ssize_t* firstElements; +} ZstdBufferWithSegmentsCollection; + +extern PyTypeObject ZstdBufferWithSegmentsCollectionType; + +int set_parameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value); +int set_parameters(ZSTD_CCtx_params* params, ZstdCompressionParametersObject* obj); +int to_cparams(ZstdCompressionParametersObject* params, ZSTD_compressionParameters* cparams); +FrameParametersObject* get_frame_parameters(PyObject* self, PyObject* args, PyObject* kwargs); +int ensure_ddict(ZstdCompressionDict* dict); +int ensure_dctx(ZstdDecompressor* decompressor, int loadDict); +ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs); +ZstdBufferWithSegments* BufferWithSegments_FromMemory(void* data, unsigned long long dataSize, BufferSegment* segments, Py_ssize_t segmentsSize); +Py_ssize_t BufferWithSegmentsCollection_length(ZstdBufferWithSegmentsCollection*); +int cpu_count(void); +size_t roundpow2(size_t); +int safe_pybytes_resize(PyObject** obj, Py_ssize_t size); diff --git a/contrib/python/zstandard/py2/ya.make b/contrib/python/zstandard/py2/ya.make new file mode 100644 index 00000000000..2efc543fe85 --- /dev/null +++ b/contrib/python/zstandard/py2/ya.make @@ -0,0 +1,58 @@ +# Generated by devtools/yamaker (pypi). + +PY2_LIBRARY() + +VERSION(0.14.1) + +LICENSE(BSD-3-Clause) + +PEERDIR( + contrib/libs/zstd +) + +ADDINCL( + contrib/libs/zstd/include + contrib/libs/zstd/lib/common + contrib/python/zstandard/py2/c-ext +) + +NO_COMPILER_WARNINGS() + +NO_LINT() + +SRCS( + c-ext/bufferutil.c + c-ext/compressionchunker.c + c-ext/compressiondict.c + c-ext/compressionparams.c + c-ext/compressionreader.c + c-ext/compressionwriter.c + c-ext/compressobj.c + c-ext/compressor.c + c-ext/compressoriterator.c + c-ext/constants.c + c-ext/decompressionreader.c + c-ext/decompressionwriter.c + c-ext/decompressobj.c + c-ext/decompressor.c + c-ext/decompressoriterator.c + c-ext/frameparams.c + zstd.c +) + +PY_REGISTER( + zstd +) + +PY_SRCS( + TOP_LEVEL + zstandard/__init__.py +) + +RESOURCE_FILES( + PREFIX contrib/python/zstandard/py2/ + .dist-info/METADATA + .dist-info/top_level.txt +) + +END() diff --git a/contrib/python/zstandard/py2/zstandard/__init__.py b/contrib/python/zstandard/py2/zstandard/__init__.py new file mode 100644 index 00000000000..5b0a9318761 --- /dev/null +++ b/contrib/python/zstandard/py2/zstandard/__init__.py @@ -0,0 +1,75 @@ +# Copyright (c) 2017-present, Gregory Szorc +# All rights reserved. +# +# This software may be modified and distributed under the terms +# of the BSD license. See the LICENSE file for details. + +"""Python interface to the Zstandard (zstd) compression library.""" + +from __future__ import absolute_import, unicode_literals + +# This module serves 2 roles: +# +# 1) Export the C or CFFI "backend" through a central module. +# 2) Implement additional functionality built on top of C or CFFI backend. + +import os +import platform + +# Some Python implementations don't support C extensions. That's why we have +# a CFFI implementation in the first place. The code here import one of our +# "backends" then re-exports the symbols from this module. For convenience, +# we support falling back to the CFFI backend if the C extension can't be +# imported. But for performance reasons, we only do this on unknown Python +# implementation. Notably, for CPython we require the C extension by default. +# Because someone will inevitably want special behavior, the behavior is +# configurable via an environment variable. A potentially better way to handle +# this is to import a special ``__importpolicy__`` module or something +# defining a variable and `setup.py` could write the file with whatever +# policy was specified at build time. Until someone needs it, we go with +# the hacky but simple environment variable approach. +_module_policy = os.environ.get("PYTHON_ZSTANDARD_IMPORT_POLICY", "default") + +if _module_policy == "default": + if platform.python_implementation() in ("CPython",): + from zstd import * + + backend = "cext" + elif platform.python_implementation() in ("PyPy",): + from .cffi import * + + backend = "cffi" + else: + try: + from zstd import * + + backend = "cext" + except ImportError: + from .cffi import * + + backend = "cffi" +elif _module_policy == "cffi_fallback": + try: + from zstd import * + + backend = "cext" + except ImportError: + from .cffi import * + + backend = "cffi" +elif _module_policy == "cext": + from zstd import * + + backend = "cext" +elif _module_policy == "cffi": + from .cffi import * + + backend = "cffi" +else: + raise ImportError( + "unknown module import policy: %s; use default, cffi_fallback, " + "cext, or cffi" % _module_policy + ) + +# Keep this in sync with python-zstandard.h. +__version__ = "0.14.1" diff --git a/contrib/python/zstandard/py2/zstd.c b/contrib/python/zstandard/py2/zstd.c new file mode 100644 index 00000000000..3ab69a31139 --- /dev/null +++ b/contrib/python/zstandard/py2/zstd.c @@ -0,0 +1,344 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +/* A Python C extension for Zstandard. */ + +#if defined(_WIN32) +#define WIN32_LEAN_AND_MEAN +#include <Windows.h> +#elif defined(__APPLE__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__DragonFly__) +#include <sys/types.h> +#include <sys/sysctl.h> +#endif + +#include "python-zstandard.h" + +PyObject *ZstdError; + +PyDoc_STRVAR(estimate_decompression_context_size__doc__, +"estimate_decompression_context_size()\n" +"\n" +"Estimate the amount of memory allocated to a decompression context.\n" +); + +static PyObject* estimate_decompression_context_size(PyObject* self) { + return PyLong_FromSize_t(ZSTD_estimateDCtxSize()); +} + +PyDoc_STRVAR(frame_content_size__doc__, +"frame_content_size(data)\n" +"\n" +"Obtain the decompressed size of a frame." +); + +static PyObject* frame_content_size(PyObject* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "source", + NULL + }; + + Py_buffer source; + PyObject* result = NULL; + unsigned long long size; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:frame_content_size", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:frame_content_size", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + size = ZSTD_getFrameContentSize(source.buf, source.len); + + if (size == ZSTD_CONTENTSIZE_ERROR) { + PyErr_SetString(ZstdError, "error when determining content size"); + } + else if (size == ZSTD_CONTENTSIZE_UNKNOWN) { + result = PyLong_FromLong(-1); + } + else { + result = PyLong_FromUnsignedLongLong(size); + } + +finally: + PyBuffer_Release(&source); + + return result; +} + +PyDoc_STRVAR(frame_header_size__doc__, +"frame_header_size(data)\n" +"\n" +"Obtain the size of a frame header.\n" +); + +static PyObject* frame_header_size(PyObject* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "source", + NULL + }; + + Py_buffer source; + PyObject* result = NULL; + size_t zresult; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:frame_header_size", +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*:frame_header_size", +#endif + kwlist, &source)) { + return NULL; + } + + if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) { + PyErr_SetString(PyExc_ValueError, + "data buffer should be contiguous and have at most one dimension"); + goto finally; + } + + zresult = ZSTD_frameHeaderSize(source.buf, source.len); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not determine frame header size: %s", + ZSTD_getErrorName(zresult)); + } + else { + result = PyLong_FromSize_t(zresult); + } + +finally: + + PyBuffer_Release(&source); + + return result; +} + +PyDoc_STRVAR(get_frame_parameters__doc__, +"get_frame_parameters(data)\n" +"\n" +"Obtains a ``FrameParameters`` instance by parsing data.\n"); + +PyDoc_STRVAR(train_dictionary__doc__, +"train_dictionary(dict_size, samples, k=None, d=None, steps=None,\n" +" threads=None,notifications=0, dict_id=0, level=0)\n" +"\n" +"Train a dictionary from sample data using the COVER algorithm.\n" +"\n" +"A compression dictionary of size ``dict_size`` will be created from the\n" +"iterable of ``samples``. The raw dictionary bytes will be returned.\n" +"\n" +"The COVER algorithm has 2 parameters: ``k`` and ``d``. These control the\n" +"*segment size* and *dmer size*. A reasonable range for ``k`` is\n" +"``[16, 2048+]``. A reasonable range for ``d`` is ``[6, 16]``.\n" +"``d`` must be less than or equal to ``k``.\n" +"\n" +"``steps`` can be specified to control the number of steps through potential\n" +"values of ``k`` and ``d`` to try. ``k`` and ``d`` will only be varied if\n" +"those arguments are not defined. i.e. if ``d`` is ``8``, then only ``k``\n" +"will be varied in this mode.\n" +"\n" +"``threads`` can specify how many threads to use to test various ``k`` and\n" +"``d`` values. ``-1`` will use as many threads as available CPUs. By default,\n" +"a single thread is used.\n" +"\n" +"When ``k`` and ``d`` are not defined, default values are used and the\n" +"algorithm will perform multiple iterations - or steps - to try to find\n" +"ideal parameters. If both ``k`` and ``d`` are specified, then those values\n" +"will be used. ``steps`` or ``threads`` triggers optimization mode to test\n" +"multiple ``k`` and ``d`` variations.\n" +); + +static char zstd_doc[] = "Interface to zstandard"; + +static PyMethodDef zstd_methods[] = { + { "estimate_decompression_context_size", (PyCFunction)estimate_decompression_context_size, + METH_NOARGS, estimate_decompression_context_size__doc__ }, + { "frame_content_size", (PyCFunction)frame_content_size, + METH_VARARGS | METH_KEYWORDS, frame_content_size__doc__ }, + { "frame_header_size", (PyCFunction)frame_header_size, + METH_VARARGS | METH_KEYWORDS, frame_header_size__doc__ }, + { "get_frame_parameters", (PyCFunction)get_frame_parameters, + METH_VARARGS | METH_KEYWORDS, get_frame_parameters__doc__ }, + { "train_dictionary", (PyCFunction)train_dictionary, + METH_VARARGS | METH_KEYWORDS, train_dictionary__doc__ }, + { NULL, NULL } +}; + +void bufferutil_module_init(PyObject* mod); +void compressobj_module_init(PyObject* mod); +void compressor_module_init(PyObject* mod); +void compressionparams_module_init(PyObject* mod); +void constants_module_init(PyObject* mod); +void compressionchunker_module_init(PyObject* mod); +void compressiondict_module_init(PyObject* mod); +void compressionreader_module_init(PyObject* mod); +void compressionwriter_module_init(PyObject* mod); +void compressoriterator_module_init(PyObject* mod); +void decompressor_module_init(PyObject* mod); +void decompressobj_module_init(PyObject* mod); +void decompressionreader_module_init(PyObject *mod); +void decompressionwriter_module_init(PyObject* mod); +void decompressoriterator_module_init(PyObject* mod); +void frameparams_module_init(PyObject* mod); + +void zstd_module_init(PyObject* m) { + /* python-zstandard relies on unstable zstd C API features. This means + that changes in zstd may break expectations in python-zstandard. + + python-zstandard is distributed with a copy of the zstd sources. + python-zstandard is only guaranteed to work with the bundled version + of zstd. + + However, downstream redistributors or packagers may unbundle zstd + from python-zstandard. This can result in a mismatch between zstd + versions and API semantics. This essentially "voids the warranty" + of python-zstandard and may cause undefined behavior. + + We detect this mismatch here and refuse to load the module if this + scenario is detected. + */ + if (ZSTD_VERSION_NUMBER != 10505 || ZSTD_versionNumber() != 10505) { + PyErr_SetString(PyExc_ImportError, "zstd C API mismatch; Python bindings not compiled against expected zstd version"); + return; + } + + bufferutil_module_init(m); + compressionparams_module_init(m); + compressiondict_module_init(m); + compressobj_module_init(m); + compressor_module_init(m); + compressionchunker_module_init(m); + compressionreader_module_init(m); + compressionwriter_module_init(m); + compressoriterator_module_init(m); + constants_module_init(m); + decompressor_module_init(m); + decompressobj_module_init(m); + decompressionreader_module_init(m); + decompressionwriter_module_init(m); + decompressoriterator_module_init(m); + frameparams_module_init(m); +} + +#if defined(__GNUC__) && (__GNUC__ >= 4) +# define PYTHON_ZSTD_VISIBILITY __attribute__ ((visibility ("default"))) +#else +# define PYTHON_ZSTD_VISIBILITY +#endif + +#if PY_MAJOR_VERSION >= 3 +static struct PyModuleDef zstd_module = { + PyModuleDef_HEAD_INIT, + "zstd", + zstd_doc, + -1, + zstd_methods +}; + +PYTHON_ZSTD_VISIBILITY PyMODINIT_FUNC PyInit_zstd(void) { + PyObject *m = PyModule_Create(&zstd_module); + if (m) { + zstd_module_init(m); + if (PyErr_Occurred()) { + Py_DECREF(m); + m = NULL; + } + } + return m; +} +#else +PYTHON_ZSTD_VISIBILITY PyMODINIT_FUNC initzstd(void) { + PyObject *m = Py_InitModule3("zstd", zstd_methods, zstd_doc); + if (m) { + zstd_module_init(m); + } +} +#endif + +/* Attempt to resolve the number of CPUs in the system. */ +int cpu_count() { + int count = 0; + +#if defined(_WIN32) + SYSTEM_INFO si; + si.dwNumberOfProcessors = 0; + GetSystemInfo(&si); + count = si.dwNumberOfProcessors; +#elif defined(__APPLE__) + int num; + size_t size = sizeof(int); + + if (0 == sysctlbyname("hw.logicalcpu", &num, &size, NULL, 0)) { + count = num; + } +#elif defined(__linux__) + count = sysconf(_SC_NPROCESSORS_ONLN); +#elif defined(__OpenBSD__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__DragonFly__) + int mib[2]; + size_t len = sizeof(count); + mib[0] = CTL_HW; + mib[1] = HW_NCPU; + if (0 != sysctl(mib, 2, &count, &len, NULL, 0)) { + count = 0; + } +#elif defined(__hpux) + count = mpctl(MPC_GETNUMSPUS, NULL, NULL); +#endif + + return count; +} + +size_t roundpow2(size_t i) { + i--; + i |= i >> 1; + i |= i >> 2; + i |= i >> 4; + i |= i >> 8; + i |= i >> 16; + i++; + + return i; +} + +/* Safer version of _PyBytes_Resize(). + * + * _PyBytes_Resize() only works if the refcount is 1. In some scenarios, + * we can get an object with a refcount > 1, even if it was just created + * with PyBytes_FromStringAndSize()! That's because (at least) CPython + * pre-allocates PyBytes instances of size 1 for every possible byte value. + * + * If non-0 is returned, obj may or may not be NULL. + */ +int safe_pybytes_resize(PyObject** obj, Py_ssize_t size) { + PyObject* tmp; + + if ((*obj)->ob_refcnt == 1) { + return _PyBytes_Resize(obj, size); + } + + tmp = PyBytes_FromStringAndSize(NULL, size); + if (!tmp) { + return -1; + } + + memcpy(PyBytes_AS_STRING(tmp), PyBytes_AS_STRING(*obj), + PyBytes_GET_SIZE(*obj)); + + Py_DECREF(*obj); + *obj = tmp; + + return 0; +}
\ No newline at end of file diff --git a/contrib/python/zstandard/py3/.dist-info/METADATA b/contrib/python/zstandard/py3/.dist-info/METADATA new file mode 100644 index 00000000000..6eaf2530266 --- /dev/null +++ b/contrib/python/zstandard/py3/.dist-info/METADATA @@ -0,0 +1,61 @@ +Metadata-Version: 2.1 +Name: zstandard +Version: 0.21.0 +Summary: Zstandard bindings for Python +Home-page: https://github.com/indygreg/python-zstandard +Author: Gregory Szorc +Author-email: [email protected] +License: BSD +Keywords: zstandard,zstd,compression +Classifier: Development Status :: 5 - Production/Stable +Classifier: Intended Audience :: Developers +Classifier: License :: OSI Approved :: BSD License +Classifier: Programming Language :: C +Classifier: Programming Language :: Python :: 3.7 +Classifier: Programming Language :: Python :: 3.8 +Classifier: Programming Language :: Python :: 3.9 +Classifier: Programming Language :: Python :: 3.10 +Classifier: Programming Language :: Python :: 3.11 +Requires-Python: >=3.7 +License-File: LICENSE +Requires-Dist: cffi (>=1.11) ; platform_python_implementation == "PyPy" +Provides-Extra: cffi +Requires-Dist: cffi (>=1.11) ; extra == 'cffi' + +================ +python-zstandard +================ + +| |ci-test| |ci-wheel| |ci-typing| |ci-sdist| |ci-anaconda| |ci-sphinx| + +This project provides Python bindings for interfacing with the +`Zstandard <http://www.zstd.net>`_ compression library. A C extension +and CFFI interface are provided. + +The primary goal of the project is to provide a rich interface to the +underlying C API through a Pythonic interface while not sacrificing +performance. This means exposing most of the features and flexibility +of the C API while not sacrificing usability or safety that Python provides. + +The canonical home for this project is +https://github.com/indygreg/python-zstandard. + +For usage documentation, see https://python-zstandard.readthedocs.org/. + +.. |ci-test| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/test.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/test.yml + +.. |ci-wheel| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/wheel.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/wheel.yml + +.. |ci-typing| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/typing.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/typing.yml + +.. |ci-sdist| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/sdist.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/sdist.yml + +.. |ci-anaconda| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/anaconda.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/anaconda.yml + +.. |ci-sphinx| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/sphinx.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/sphinx.yml diff --git a/contrib/python/zstandard/py3/.dist-info/top_level.txt b/contrib/python/zstandard/py3/.dist-info/top_level.txt new file mode 100644 index 00000000000..864700d2b3e --- /dev/null +++ b/contrib/python/zstandard/py3/.dist-info/top_level.txt @@ -0,0 +1 @@ +zstandard diff --git a/contrib/python/zstandard/py3/LICENSE b/contrib/python/zstandard/py3/LICENSE new file mode 100644 index 00000000000..dcec4760996 --- /dev/null +++ b/contrib/python/zstandard/py3/LICENSE @@ -0,0 +1,27 @@ +Copyright (c) 2016, Gregory Szorc +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, +are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this +list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, +this list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors +may be used to endorse or promote products derived from this software without +specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR +ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/contrib/python/zstandard/py3/README.rst b/contrib/python/zstandard/py3/README.rst new file mode 100644 index 00000000000..bec82d2c3eb --- /dev/null +++ b/contrib/python/zstandard/py3/README.rst @@ -0,0 +1,37 @@ +================ +python-zstandard +================ + +| |ci-test| |ci-wheel| |ci-typing| |ci-sdist| |ci-anaconda| |ci-sphinx| + +This project provides Python bindings for interfacing with the +`Zstandard <http://www.zstd.net>`_ compression library. A C extension +and CFFI interface are provided. + +The primary goal of the project is to provide a rich interface to the +underlying C API through a Pythonic interface while not sacrificing +performance. This means exposing most of the features and flexibility +of the C API while not sacrificing usability or safety that Python provides. + +The canonical home for this project is +https://github.com/indygreg/python-zstandard. + +For usage documentation, see https://python-zstandard.readthedocs.org/. + +.. |ci-test| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/test.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/test.yml + +.. |ci-wheel| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/wheel.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/wheel.yml + +.. |ci-typing| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/typing.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/typing.yml + +.. |ci-sdist| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/sdist.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/sdist.yml + +.. |ci-anaconda| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/anaconda.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/anaconda.yml + +.. |ci-sphinx| image:: https://github.com/indygreg/python-zstandard/workflows/.github/workflows/sphinx.yml/badge.svg + :target: https://github.com/indygreg/python-zstandard/blob/main/.github/workflows/sphinx.yml diff --git a/contrib/python/zstandard/py3/c-ext/backend_c.c b/contrib/python/zstandard/py3/c-ext/backend_c.c new file mode 100644 index 00000000000..bf61f9c8488 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/backend_c.c @@ -0,0 +1,353 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +/* A Python C extension for Zstandard. */ + +#if defined(_WIN32) +#define WIN32_LEAN_AND_MEAN +#include <Windows.h> +#elif defined(__APPLE__) || defined(__OpenBSD__) || defined(__FreeBSD__) || \ + defined(__NetBSD__) || defined(__DragonFly__) +#include <sys/types.h> + +#include <sys/sysctl.h> + +#endif + +#include "python-zstandard.h" + +#include "bufferutil.c" +#include "compressionchunker.c" +#include "compressiondict.c" +#include "compressionparams.c" +#include "compressionreader.c" +#include "compressionwriter.c" +#include "compressobj.c" +#include "compressor.c" +#include "compressoriterator.c" +#include "constants.c" +#include "decompressionreader.c" +#include "decompressionwriter.c" +#include "decompressobj.c" +#include "decompressor.c" +#include "decompressoriterator.c" +#include "frameparams.c" + +PyObject *ZstdError; + +static PyObject *estimate_decompression_context_size(PyObject *self) { + return PyLong_FromSize_t(ZSTD_estimateDCtxSize()); +} + +static PyObject *frame_content_size(PyObject *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"source", NULL}; + + Py_buffer source; + PyObject *result = NULL; + unsigned long long size; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:frame_content_size", + kwlist, &source)) { + return NULL; + } + + size = ZSTD_getFrameContentSize(source.buf, source.len); + + if (size == ZSTD_CONTENTSIZE_ERROR) { + PyErr_SetString(ZstdError, "error when determining content size"); + } + else if (size == ZSTD_CONTENTSIZE_UNKNOWN) { + result = PyLong_FromLong(-1); + } + else { + result = PyLong_FromUnsignedLongLong(size); + } + + PyBuffer_Release(&source); + + return result; +} + +static PyObject *frame_header_size(PyObject *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"source", NULL}; + + Py_buffer source; + PyObject *result = NULL; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:frame_header_size", + kwlist, &source)) { + return NULL; + } + + zresult = ZSTD_frameHeaderSize(source.buf, source.len); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not determine frame header size: %s", + ZSTD_getErrorName(zresult)); + } + else { + result = PyLong_FromSize_t(zresult); + } + + PyBuffer_Release(&source); + + return result; +} + +static char zstd_doc[] = "Interface to zstandard"; + +static PyMethodDef zstd_methods[] = { + {"estimate_decompression_context_size", + (PyCFunction)estimate_decompression_context_size, METH_NOARGS, NULL}, + {"frame_content_size", (PyCFunction)frame_content_size, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"frame_header_size", (PyCFunction)frame_header_size, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"get_frame_parameters", (PyCFunction)get_frame_parameters, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"train_dictionary", (PyCFunction)train_dictionary, + METH_VARARGS | METH_KEYWORDS, NULL}, + {NULL, NULL}}; + +void bufferutil_module_init(PyObject *mod); +void compressobj_module_init(PyObject *mod); +void compressor_module_init(PyObject *mod); +void compressionparams_module_init(PyObject *mod); +void constants_module_init(PyObject *mod); +void compressionchunker_module_init(PyObject *mod); +void compressiondict_module_init(PyObject *mod); +void compressionreader_module_init(PyObject *mod); +void compressionwriter_module_init(PyObject *mod); +void compressoriterator_module_init(PyObject *mod); +void decompressor_module_init(PyObject *mod); +void decompressobj_module_init(PyObject *mod); +void decompressionreader_module_init(PyObject *mod); +void decompressionwriter_module_init(PyObject *mod); +void decompressoriterator_module_init(PyObject *mod); +void frameparams_module_init(PyObject *mod); + +void zstd_module_init(PyObject *m) { + /* python-zstandard relies on unstable zstd C API features. This means + that changes in zstd may break expectations in python-zstandard. + + python-zstandard is distributed with a copy of the zstd sources. + python-zstandard is only guaranteed to work with the bundled version + of zstd. + + However, downstream redistributors or packagers may unbundle zstd + from python-zstandard. This can result in a mismatch between zstd + versions and API semantics. This essentially "voids the warranty" + of python-zstandard and may cause undefined behavior. + + We detect this mismatch here and refuse to load the module if this + scenario is detected. + */ + PyObject *features = NULL; + PyObject *feature = NULL; + unsigned zstd_ver_no = ZSTD_versionNumber(); + unsigned our_hardcoded_version = 10505; + if (ZSTD_VERSION_NUMBER != our_hardcoded_version || + zstd_ver_no != our_hardcoded_version) { + PyErr_Format( + PyExc_ImportError, + "zstd C API versions mismatch; Python bindings were not " + "compiled/linked against expected zstd version (%u returned by the " + "lib, %u hardcoded in zstd headers, %u hardcoded in the cext)", + zstd_ver_no, ZSTD_VERSION_NUMBER, our_hardcoded_version); + return; + } + + features = PySet_New(NULL); + if (NULL == features) { + PyErr_SetString(PyExc_ImportError, "could not create empty set"); + return; + } + + feature = PyUnicode_FromString("buffer_types"); + if (NULL == feature) { + PyErr_SetString(PyExc_ImportError, "could not create feature string"); + return; + } + + if (PySet_Add(features, feature) == -1) { + return; + } + + Py_DECREF(feature); + +#ifdef HAVE_ZSTD_POOL_APIS + feature = PyUnicode_FromString("multi_compress_to_buffer"); + if (NULL == feature) { + PyErr_SetString(PyExc_ImportError, "could not create feature string"); + return; + } + + if (PySet_Add(features, feature) == -1) { + return; + } + + Py_DECREF(feature); +#endif + +#ifdef HAVE_ZSTD_POOL_APIS + feature = PyUnicode_FromString("multi_decompress_to_buffer"); + if (NULL == feature) { + PyErr_SetString(PyExc_ImportError, "could not create feature string"); + return; + } + + if (PySet_Add(features, feature) == -1) { + return; + } + + Py_DECREF(feature); +#endif + + if (PyObject_SetAttrString(m, "backend_features", features) == -1) { + return; + } + + Py_DECREF(features); + + bufferutil_module_init(m); + compressionparams_module_init(m); + compressiondict_module_init(m); + compressobj_module_init(m); + compressor_module_init(m); + compressionchunker_module_init(m); + compressionreader_module_init(m); + compressionwriter_module_init(m); + compressoriterator_module_init(m); + constants_module_init(m); + decompressor_module_init(m); + decompressobj_module_init(m); + decompressionreader_module_init(m); + decompressionwriter_module_init(m); + decompressoriterator_module_init(m); + frameparams_module_init(m); +} + +#if defined(__GNUC__) && (__GNUC__ >= 4) +#define PYTHON_ZSTD_VISIBILITY __attribute__((visibility("default"))) +#else +#define PYTHON_ZSTD_VISIBILITY +#endif + +static struct PyModuleDef zstd_module = {PyModuleDef_HEAD_INIT, "backend_c", + zstd_doc, -1, zstd_methods}; + +PYTHON_ZSTD_VISIBILITY PyMODINIT_FUNC PyInit_backend_c(void) { + PyObject *m = PyModule_Create(&zstd_module); + if (m) { + zstd_module_init(m); + if (PyErr_Occurred()) { + Py_DECREF(m); + m = NULL; + } + } + return m; +} + +/* Attempt to resolve the number of CPUs in the system. */ +int cpu_count() { + int count = 0; + +#if defined(_WIN32) + SYSTEM_INFO si; + si.dwNumberOfProcessors = 0; + GetSystemInfo(&si); + count = si.dwNumberOfProcessors; +#elif defined(__APPLE__) + int num; + size_t size = sizeof(int); + + if (0 == sysctlbyname("hw.logicalcpu", &num, &size, NULL, 0)) { + count = num; + } +#elif defined(__linux__) + count = sysconf(_SC_NPROCESSORS_ONLN); +#elif defined(__OpenBSD__) || defined(__FreeBSD__) || defined(__NetBSD__) || \ + defined(__DragonFly__) + int mib[2]; + size_t len = sizeof(count); + mib[0] = CTL_HW; + mib[1] = HW_NCPU; + if (0 != sysctl(mib, 2, &count, &len, NULL, 0)) { + count = 0; + } +#elif defined(__hpux) + count = mpctl(MPC_GETNUMSPUS, NULL, NULL); +#endif + + return count; +} + +size_t roundpow2(size_t i) { + i--; + i |= i >> 1; + i |= i >> 2; + i |= i >> 4; + i |= i >> 8; + i |= i >> 16; + i++; + + return i; +} + +/* Safer version of _PyBytes_Resize(). + * + * _PyBytes_Resize() only works if the refcount is 1. In some scenarios, + * we can get an object with a refcount > 1, even if it was just created + * with PyBytes_FromStringAndSize()! That's because (at least) CPython + * pre-allocates PyBytes instances of size 1 for every possible byte value. + * + * If non-0 is returned, obj may or may not be NULL. + */ +int safe_pybytes_resize(PyObject **obj, Py_ssize_t size) { + PyObject *tmp; + + if ((*obj)->ob_refcnt == 1) { + return _PyBytes_Resize(obj, size); + } + + tmp = PyBytes_FromStringAndSize(NULL, size); + if (!tmp) { + return -1; + } + + memcpy(PyBytes_AS_STRING(tmp), PyBytes_AS_STRING(*obj), + PyBytes_GET_SIZE(*obj)); + + Py_DECREF(*obj); + *obj = tmp; + + return 0; +} + +// Set/raise an `io.UnsupportedOperation` exception. +void set_io_unsupported_operation(void) { + PyObject *iomod; + PyObject *exc; + + iomod = PyImport_ImportModule("io"); + if (NULL == iomod) { + return; + } + + exc = PyObject_GetAttrString(iomod, "UnsupportedOperation"); + if (NULL == exc) { + Py_DECREF(iomod); + return; + } + + PyErr_SetNone(exc); + Py_DECREF(exc); + Py_DECREF(iomod); +} diff --git a/contrib/python/zstandard/py3/c-ext/bufferutil.c b/contrib/python/zstandard/py3/c-ext/bufferutil.c new file mode 100644 index 00000000000..3d77e331e75 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/bufferutil.c @@ -0,0 +1,578 @@ +/** + * Copyright (c) 2017-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void BufferWithSegments_dealloc(ZstdBufferWithSegments *self) { + /* Backing memory is either canonically owned by a Py_buffer or by us. */ + if (self->parent.buf) { + PyBuffer_Release(&self->parent); + } + else if (self->useFree) { + free(self->data); + } + else { + PyMem_Free(self->data); + } + + self->data = NULL; + + if (self->useFree) { + free(self->segments); + } + else { + PyMem_Free(self->segments); + } + + self->segments = NULL; + + PyObject_Del(self); +} + +static int BufferWithSegments_init(ZstdBufferWithSegments *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"data", "segments", NULL}; + + Py_buffer segments; + Py_ssize_t segmentCount; + Py_ssize_t i; + + memset(&self->parent, 0, sizeof(self->parent)); + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*y*:BufferWithSegments", + kwlist, &self->parent, &segments)) { + return -1; + } + + if (segments.len % sizeof(BufferSegment)) { + PyErr_Format(PyExc_ValueError, + "segments array size is not a multiple of %zu", + sizeof(BufferSegment)); + goto except; + } + + segmentCount = segments.len / sizeof(BufferSegment); + + /* Validate segments data, as blindly trusting it could lead to arbitrary + memory access. */ + for (i = 0; i < segmentCount; i++) { + BufferSegment *segment = &((BufferSegment *)(segments.buf))[i]; + + if (segment->offset + segment->length > + (unsigned long long)self->parent.len) { + PyErr_SetString(PyExc_ValueError, + "offset within segments array references memory " + "outside buffer"); + goto except; + return -1; + } + } + + /* Make a copy of the segments data. It is cheap to do so and is a guard + against caller changing offsets, which has security implications. */ + self->segments = PyMem_Malloc(segments.len); + if (!self->segments) { + PyErr_NoMemory(); + goto except; + } + + memcpy(self->segments, segments.buf, segments.len); + PyBuffer_Release(&segments); + + self->data = self->parent.buf; + self->dataSize = self->parent.len; + self->segmentCount = segmentCount; + + return 0; + +except: + PyBuffer_Release(&self->parent); + PyBuffer_Release(&segments); + return -1; +} + +/** + * Construct a BufferWithSegments from existing memory and offsets. + * + * Ownership of the backing memory and BufferSegments will be transferred to + * the created object and freed when the BufferWithSegments is destroyed. + */ +ZstdBufferWithSegments * +BufferWithSegments_FromMemory(void *data, unsigned long long dataSize, + BufferSegment *segments, + Py_ssize_t segmentsSize) { + ZstdBufferWithSegments *result = NULL; + Py_ssize_t i; + + if (NULL == data) { + PyErr_SetString(PyExc_ValueError, "data is NULL"); + return NULL; + } + + if (NULL == segments) { + PyErr_SetString(PyExc_ValueError, "segments is NULL"); + return NULL; + } + + for (i = 0; i < segmentsSize; i++) { + BufferSegment *segment = &segments[i]; + + if (segment->offset + segment->length > dataSize) { + PyErr_SetString(PyExc_ValueError, + "offset in segments overflows buffer size"); + return NULL; + } + } + + result = PyObject_New(ZstdBufferWithSegments, ZstdBufferWithSegmentsType); + if (NULL == result) { + return NULL; + } + + result->useFree = 0; + + memset(&result->parent, 0, sizeof(result->parent)); + result->data = data; + result->dataSize = dataSize; + result->segments = segments; + result->segmentCount = segmentsSize; + + return result; +} + +static Py_ssize_t BufferWithSegments_length(ZstdBufferWithSegments *self) { + return self->segmentCount; +} + +static ZstdBufferSegment *BufferWithSegments_item(ZstdBufferWithSegments *self, + Py_ssize_t i) { + ZstdBufferSegment *result = NULL; + + if (i < 0) { + PyErr_SetString(PyExc_IndexError, "offset must be non-negative"); + return NULL; + } + + if (i >= self->segmentCount) { + PyErr_Format(PyExc_IndexError, "offset must be less than %zd", + self->segmentCount); + return NULL; + } + + if (self->segments[i].length > PY_SSIZE_T_MAX) { + PyErr_Format(PyExc_ValueError, + "item at offset %zd is too large for this platform", i); + return NULL; + } + + result = (ZstdBufferSegment *)PyObject_CallObject( + (PyObject *)ZstdBufferSegmentType, NULL); + if (NULL == result) { + return NULL; + } + + result->parent = (PyObject *)self; + Py_INCREF(self); + + result->data = (char *)self->data + self->segments[i].offset; + result->dataSize = (Py_ssize_t)self->segments[i].length; + result->offset = self->segments[i].offset; + + return result; +} + +static int BufferWithSegments_getbuffer(ZstdBufferWithSegments *self, + Py_buffer *view, int flags) { + if (self->dataSize > PY_SSIZE_T_MAX) { + view->obj = NULL; + PyErr_SetString(PyExc_BufferError, + "buffer is too large for this platform"); + return -1; + } + + return PyBuffer_FillInfo(view, (PyObject *)self, self->data, + (Py_ssize_t)self->dataSize, 1, flags); +} + +static PyObject *BufferWithSegments_tobytes(ZstdBufferWithSegments *self) { + if (self->dataSize > PY_SSIZE_T_MAX) { + PyErr_SetString(PyExc_ValueError, + "buffer is too large for this platform"); + return NULL; + } + + return PyBytes_FromStringAndSize(self->data, (Py_ssize_t)self->dataSize); +} + +static ZstdBufferSegments * +BufferWithSegments_segments(ZstdBufferWithSegments *self) { + ZstdBufferSegments *result = (ZstdBufferSegments *)PyObject_CallObject( + (PyObject *)ZstdBufferSegmentsType, NULL); + if (NULL == result) { + return NULL; + } + + result->parent = (PyObject *)self; + Py_INCREF(self); + result->segments = self->segments; + result->segmentCount = self->segmentCount; + + return result; +} + +#if PY_VERSION_HEX < 0x03090000 +static PyBufferProcs BufferWithSegments_as_buffer = { + (getbufferproc)BufferWithSegments_getbuffer, /* bf_getbuffer */ + 0 /* bf_releasebuffer */ +}; +#endif + +static PyMethodDef BufferWithSegments_methods[] = { + {"segments", (PyCFunction)BufferWithSegments_segments, METH_NOARGS, NULL}, + {"tobytes", (PyCFunction)BufferWithSegments_tobytes, METH_NOARGS, NULL}, + {NULL, NULL}}; + +static PyMemberDef BufferWithSegments_members[] = { + {"size", T_ULONGLONG, offsetof(ZstdBufferWithSegments, dataSize), READONLY, + "total size of the buffer in bytes"}, + {NULL}}; + +PyType_Slot ZstdBufferWithSegmentsSlots[] = { + {Py_tp_dealloc, BufferWithSegments_dealloc}, + {Py_sq_length, BufferWithSegments_length}, + {Py_sq_item, BufferWithSegments_item}, +#if PY_VERSION_HEX >= 0x03090000 + {Py_bf_getbuffer, BufferWithSegments_getbuffer}, +#endif + {Py_tp_methods, BufferWithSegments_methods}, + {Py_tp_members, BufferWithSegments_members}, + {Py_tp_init, BufferWithSegments_init}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdBufferWithSegmentsSpec = { + "zstd.BufferWithSegments", + sizeof(ZstdBufferWithSegments), + 0, + Py_TPFLAGS_DEFAULT, + ZstdBufferWithSegmentsSlots, +}; + +PyTypeObject *ZstdBufferWithSegmentsType; + +static void BufferSegments_dealloc(ZstdBufferSegments *self) { + Py_CLEAR(self->parent); + PyObject_Del(self); +} + +static int BufferSegments_getbuffer(ZstdBufferSegments *self, Py_buffer *view, + int flags) { + return PyBuffer_FillInfo(view, (PyObject *)self, (void *)self->segments, + self->segmentCount * sizeof(BufferSegment), 1, + flags); +} + +PyType_Slot ZstdBufferSegmentsSlots[] = { + {Py_tp_dealloc, BufferSegments_dealloc}, +#if PY_VERSION_HEX >= 0x03090000 + {Py_bf_getbuffer, BufferSegments_getbuffer}, +#endif + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdBufferSegmentsSpec = { + "zstd.BufferSegments", + sizeof(ZstdBufferSegments), + 0, + Py_TPFLAGS_DEFAULT, + ZstdBufferSegmentsSlots, +}; + +#if PY_VERSION_HEX < 0x03090000 +static PyBufferProcs BufferSegments_as_buffer = { + (getbufferproc)BufferSegments_getbuffer, 0}; +#endif + +PyTypeObject *ZstdBufferSegmentsType; + +static void BufferSegment_dealloc(ZstdBufferSegment *self) { + Py_CLEAR(self->parent); + PyObject_Del(self); +} + +static Py_ssize_t BufferSegment_length(ZstdBufferSegment *self) { + return self->dataSize; +} + +static int BufferSegment_getbuffer(ZstdBufferSegment *self, Py_buffer *view, + int flags) { + return PyBuffer_FillInfo(view, (PyObject *)self, self->data, self->dataSize, + 1, flags); +} + +static PyObject *BufferSegment_tobytes(ZstdBufferSegment *self) { + return PyBytes_FromStringAndSize(self->data, self->dataSize); +} + +#if PY_VERSION_HEX < 0x03090000 +static PyBufferProcs BufferSegment_as_buffer = { + (getbufferproc)BufferSegment_getbuffer, 0}; +#endif + +static PyMethodDef BufferSegment_methods[] = { + {"tobytes", (PyCFunction)BufferSegment_tobytes, METH_NOARGS, NULL}, + {NULL, NULL}}; + +static PyMemberDef BufferSegment_members[] = { + {"offset", T_ULONGLONG, offsetof(ZstdBufferSegment, offset), READONLY, + "offset of segment within parent buffer"}, + {NULL}}; + +PyType_Slot ZstdBufferSegmentSlots[] = { + {Py_tp_dealloc, BufferSegment_dealloc}, + {Py_sq_length, BufferSegment_length}, +#if PY_VERSION_HEX >= 0x03090000 + {Py_bf_getbuffer, BufferSegment_getbuffer}, +#endif + {Py_tp_methods, BufferSegment_methods}, + {Py_tp_members, BufferSegment_members}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdBufferSegmentSpec = { + "zstd.BufferSegment", + sizeof(ZstdBufferSegment), + 0, + Py_TPFLAGS_DEFAULT, + ZstdBufferSegmentSlots, +}; + +PyTypeObject *ZstdBufferSegmentType; + +static void +BufferWithSegmentsCollection_dealloc(ZstdBufferWithSegmentsCollection *self) { + Py_ssize_t i; + + if (self->firstElements) { + PyMem_Free(self->firstElements); + self->firstElements = NULL; + } + + if (self->buffers) { + for (i = 0; i < self->bufferCount; i++) { + Py_CLEAR(self->buffers[i]); + } + + PyMem_Free(self->buffers); + self->buffers = NULL; + } + + PyObject_Del(self); +} + +static int +BufferWithSegmentsCollection_init(ZstdBufferWithSegmentsCollection *self, + PyObject *args) { + Py_ssize_t size; + Py_ssize_t i; + Py_ssize_t offset = 0; + + size = PyTuple_Size(args); + if (-1 == size) { + return -1; + } + + if (0 == size) { + PyErr_SetString(PyExc_ValueError, "must pass at least 1 argument"); + return -1; + } + + for (i = 0; i < size; i++) { + PyObject *item = PyTuple_GET_ITEM(args, i); + if (!PyObject_TypeCheck(item, ZstdBufferWithSegmentsType)) { + PyErr_SetString(PyExc_TypeError, + "arguments must be BufferWithSegments instances"); + return -1; + } + + if (0 == ((ZstdBufferWithSegments *)item)->segmentCount || + 0 == ((ZstdBufferWithSegments *)item)->dataSize) { + PyErr_SetString(PyExc_ValueError, + "ZstdBufferWithSegments cannot be empty"); + return -1; + } + } + + self->buffers = PyMem_Malloc(size * sizeof(ZstdBufferWithSegments *)); + if (NULL == self->buffers) { + PyErr_NoMemory(); + return -1; + } + + self->firstElements = PyMem_Malloc(size * sizeof(Py_ssize_t)); + if (NULL == self->firstElements) { + PyMem_Free(self->buffers); + self->buffers = NULL; + PyErr_NoMemory(); + return -1; + } + + self->bufferCount = size; + + for (i = 0; i < size; i++) { + ZstdBufferWithSegments *item = + (ZstdBufferWithSegments *)PyTuple_GET_ITEM(args, i); + + self->buffers[i] = item; + Py_INCREF(item); + + if (i > 0) { + self->firstElements[i - 1] = offset; + } + + offset += item->segmentCount; + } + + self->firstElements[size - 1] = offset; + + return 0; +} + +static PyObject * +BufferWithSegmentsCollection_size(ZstdBufferWithSegmentsCollection *self) { + Py_ssize_t i; + Py_ssize_t j; + unsigned long long size = 0; + + for (i = 0; i < self->bufferCount; i++) { + for (j = 0; j < self->buffers[i]->segmentCount; j++) { + size += self->buffers[i]->segments[j].length; + } + } + + return PyLong_FromUnsignedLongLong(size); +} + +Py_ssize_t +BufferWithSegmentsCollection_length(ZstdBufferWithSegmentsCollection *self) { + return self->firstElements[self->bufferCount - 1]; +} + +static ZstdBufferSegment * +BufferWithSegmentsCollection_item(ZstdBufferWithSegmentsCollection *self, + Py_ssize_t i) { + Py_ssize_t bufferOffset; + + if (i < 0) { + PyErr_SetString(PyExc_IndexError, "offset must be non-negative"); + return NULL; + } + + if (i >= BufferWithSegmentsCollection_length(self)) { + PyErr_Format(PyExc_IndexError, "offset must be less than %zd", + BufferWithSegmentsCollection_length(self)); + return NULL; + } + + for (bufferOffset = 0; bufferOffset < self->bufferCount; bufferOffset++) { + Py_ssize_t offset = 0; + + if (i < self->firstElements[bufferOffset]) { + if (bufferOffset > 0) { + offset = self->firstElements[bufferOffset - 1]; + } + + return BufferWithSegments_item(self->buffers[bufferOffset], + i - offset); + } + } + + PyErr_SetString(ZstdError, + "error resolving segment; this should not happen"); + return NULL; +} + +static PyMethodDef BufferWithSegmentsCollection_methods[] = { + {"size", (PyCFunction)BufferWithSegmentsCollection_size, METH_NOARGS, + PyDoc_STR("total size in bytes of all segments")}, + {NULL, NULL}}; + +PyType_Slot ZstdBufferWithSegmentsCollectionSlots[] = { + {Py_tp_dealloc, BufferWithSegmentsCollection_dealloc}, + {Py_sq_length, BufferWithSegmentsCollection_length}, + {Py_sq_item, BufferWithSegmentsCollection_item}, + {Py_tp_methods, BufferWithSegmentsCollection_methods}, + {Py_tp_init, BufferWithSegmentsCollection_init}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdBufferWithSegmentsCollectionSpec = { + "zstd.BufferWithSegmentsCollection", + sizeof(ZstdBufferWithSegmentsCollection), + 0, + Py_TPFLAGS_DEFAULT, + ZstdBufferWithSegmentsCollectionSlots, +}; + +PyTypeObject *ZstdBufferWithSegmentsCollectionType; + +void bufferutil_module_init(PyObject *mod) { + ZstdBufferWithSegmentsType = + (PyTypeObject *)PyType_FromSpec(&ZstdBufferWithSegmentsSpec); +#if PY_VERSION_HEX < 0x03090000 + ZstdBufferWithSegmentsType->tp_as_buffer = &BufferWithSegments_as_buffer; +#endif + if (PyType_Ready(ZstdBufferWithSegmentsType) < 0) { + return; + } + + Py_INCREF(ZstdBufferWithSegmentsType); + PyModule_AddObject(mod, "BufferWithSegments", + (PyObject *)ZstdBufferWithSegmentsType); + + ZstdBufferSegmentsType = + (PyTypeObject *)PyType_FromSpec(&ZstdBufferSegmentsSpec); +#if PY_VERSION_HEX < 0x03090000 + ZstdBufferSegmentsType->tp_as_buffer = &BufferSegments_as_buffer; +#endif + if (PyType_Ready(ZstdBufferSegmentsType) < 0) { + return; + } + + Py_INCREF(ZstdBufferSegmentsType); + PyModule_AddObject(mod, "BufferSegments", + (PyObject *)ZstdBufferSegmentsType); + + ZstdBufferSegmentType = + (PyTypeObject *)PyType_FromSpec(&ZstdBufferSegmentSpec); +#if PY_VERSION_HEX < 0x03090000 + ZstdBufferSegmentType->tp_as_buffer = &BufferSegment_as_buffer; +#endif + if (PyType_Ready(ZstdBufferSegmentType) < 0) { + return; + } + + Py_INCREF(ZstdBufferSegmentType); + PyModule_AddObject(mod, "BufferSegment", (PyObject *)ZstdBufferSegmentType); + + ZstdBufferWithSegmentsCollectionType = + (PyTypeObject *)PyType_FromSpec(&ZstdBufferWithSegmentsCollectionSpec); + if (PyType_Ready(ZstdBufferWithSegmentsCollectionType) < 0) { + return; + } + + Py_INCREF(ZstdBufferWithSegmentsCollectionType); + PyModule_AddObject(mod, "BufferWithSegmentsCollection", + (PyObject *)ZstdBufferWithSegmentsCollectionType); +} diff --git a/contrib/python/zstandard/py3/c-ext/compressionchunker.c b/contrib/python/zstandard/py3/c-ext/compressionchunker.c new file mode 100644 index 00000000000..bdf6f5d8ee5 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressionchunker.c @@ -0,0 +1,310 @@ +/** + * Copyright (c) 2018-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void +ZstdCompressionChunkerIterator_dealloc(ZstdCompressionChunkerIterator *self) { + Py_XDECREF(self->chunker); + + PyObject_Del(self); +} + +static PyObject *ZstdCompressionChunkerIterator_iter(PyObject *self) { + Py_INCREF(self); + return self; +} + +static PyObject * +ZstdCompressionChunkerIterator_iternext(ZstdCompressionChunkerIterator *self) { + size_t zresult; + PyObject *chunk; + ZstdCompressionChunker *chunker = self->chunker; + ZSTD_EndDirective zFlushMode; + + if (self->mode != compressionchunker_mode_normal && + chunker->input.pos != chunker->input.size) { + PyErr_SetString(ZstdError, "input should have been fully consumed " + "before calling flush() or finish()"); + return NULL; + } + + if (chunker->finished) { + return NULL; + } + + /* If we have data left in the input, consume it. */ + while (chunker->input.pos < chunker->input.size) { + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_compressStream2(chunker->compressor->cctx, &chunker->output, + &chunker->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + /* Input is fully consumed. */ + if (chunker->input.pos == chunker->input.size) { + chunker->input.src = NULL; + chunker->input.pos = 0; + chunker->input.size = 0; + PyBuffer_Release(&chunker->inBuffer); + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + /* If it produced a full output chunk, emit it. */ + if (chunker->output.pos == chunker->output.size) { + chunk = PyBytes_FromStringAndSize(chunker->output.dst, + chunker->output.pos); + if (!chunk) { + return NULL; + } + + chunker->output.pos = 0; + + return chunk; + } + + /* Else continue to compress available input data. */ + } + + /* We also need this here for the special case of an empty input buffer. */ + if (chunker->input.pos == chunker->input.size) { + chunker->input.src = NULL; + chunker->input.pos = 0; + chunker->input.size = 0; + PyBuffer_Release(&chunker->inBuffer); + } + + /* No more input data. A partial chunk may be in chunker->output. + * If we're in normal compression mode, we're done. Otherwise if we're in + * flush or finish mode, we need to emit what data remains. + */ + if (self->mode == compressionchunker_mode_normal) { + /* We don't need to set StopIteration. */ + return NULL; + } + + if (self->mode == compressionchunker_mode_flush) { + zFlushMode = ZSTD_e_flush; + } + else if (self->mode == compressionchunker_mode_finish) { + zFlushMode = ZSTD_e_end; + } + else { + PyErr_SetString(ZstdError, + "unhandled compression mode; this should never happen"); + return NULL; + } + + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_compressStream2(chunker->compressor->cctx, &chunker->output, + &chunker->input, zFlushMode); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + if (!zresult && chunker->output.pos == 0) { + return NULL; + } + + chunk = PyBytes_FromStringAndSize(chunker->output.dst, chunker->output.pos); + if (!chunk) { + return NULL; + } + + chunker->output.pos = 0; + + if (!zresult && self->mode == compressionchunker_mode_finish) { + chunker->finished = 1; + } + + return chunk; +} + +PyType_Slot ZstdCompressionChunkerIteratorSlots[] = { + {Py_tp_dealloc, ZstdCompressionChunkerIterator_dealloc}, + {Py_tp_iter, ZstdCompressionChunkerIterator_iter}, + {Py_tp_iternext, ZstdCompressionChunkerIterator_iternext}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressionChunkerIteratorSpec = { + "zstd.ZstdCompressionChunkerIterator", + sizeof(ZstdCompressionChunkerIterator), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressionChunkerIteratorSlots, +}; + +PyTypeObject *ZstdCompressionChunkerIteratorType; + +static void ZstdCompressionChunker_dealloc(ZstdCompressionChunker *self) { + PyBuffer_Release(&self->inBuffer); + self->input.src = NULL; + + PyMem_Free(self->output.dst); + self->output.dst = NULL; + + Py_XDECREF(self->compressor); + + PyObject_Del(self); +} + +static ZstdCompressionChunkerIterator * +ZstdCompressionChunker_compress(ZstdCompressionChunker *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"data", NULL}; + + ZstdCompressionChunkerIterator *result; + + if (self->finished) { + PyErr_SetString(ZstdError, + "cannot call compress() after compression finished"); + return NULL; + } + + if (self->inBuffer.obj) { + PyErr_SetString(ZstdError, "cannot perform operation before consuming " + "output from previous operation"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:compress", kwlist, + &self->inBuffer)) { + return NULL; + } + + result = (ZstdCompressionChunkerIterator *)PyObject_CallObject( + (PyObject *)ZstdCompressionChunkerIteratorType, NULL); + if (!result) { + PyBuffer_Release(&self->inBuffer); + return NULL; + } + + self->input.src = self->inBuffer.buf; + self->input.size = self->inBuffer.len; + self->input.pos = 0; + + result->chunker = self; + Py_INCREF(result->chunker); + + result->mode = compressionchunker_mode_normal; + + return result; +} + +static ZstdCompressionChunkerIterator * +ZstdCompressionChunker_finish(ZstdCompressionChunker *self) { + ZstdCompressionChunkerIterator *result; + + if (self->finished) { + PyErr_SetString(ZstdError, + "cannot call finish() after compression finished"); + return NULL; + } + + if (self->inBuffer.obj) { + PyErr_SetString(ZstdError, "cannot call finish() before consuming " + "output from previous operation"); + return NULL; + } + + result = (ZstdCompressionChunkerIterator *)PyObject_CallObject( + (PyObject *)ZstdCompressionChunkerIteratorType, NULL); + if (!result) { + return NULL; + } + + result->chunker = self; + Py_INCREF(result->chunker); + + result->mode = compressionchunker_mode_finish; + + return result; +} + +static ZstdCompressionChunkerIterator * +ZstdCompressionChunker_flush(ZstdCompressionChunker *self, PyObject *args, + PyObject *kwargs) { + ZstdCompressionChunkerIterator *result; + + if (self->finished) { + PyErr_SetString(ZstdError, + "cannot call flush() after compression finished"); + return NULL; + } + + if (self->inBuffer.obj) { + PyErr_SetString(ZstdError, "cannot call flush() before consuming " + "output from previous operation"); + return NULL; + } + + result = (ZstdCompressionChunkerIterator *)PyObject_CallObject( + (PyObject *)ZstdCompressionChunkerIteratorType, NULL); + if (!result) { + return NULL; + } + + result->chunker = self; + Py_INCREF(result->chunker); + + result->mode = compressionchunker_mode_flush; + + return result; +} + +static PyMethodDef ZstdCompressionChunker_methods[] = { + {"compress", (PyCFunction)ZstdCompressionChunker_compress, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("compress data")}, + {"finish", (PyCFunction)ZstdCompressionChunker_finish, METH_NOARGS, + PyDoc_STR("finish compression operation")}, + {"flush", (PyCFunction)ZstdCompressionChunker_flush, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("finish compression operation")}, + {NULL, NULL}}; + +PyType_Slot ZstdCompressionChunkerSlots[] = { + {Py_tp_dealloc, ZstdCompressionChunker_dealloc}, + {Py_tp_methods, ZstdCompressionChunker_methods}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressionChunkerSpec = { + "zstd.ZstdCompressionChunkerType", + sizeof(ZstdCompressionChunker), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressionChunkerSlots, +}; + +PyTypeObject *ZstdCompressionChunkerType; + +void compressionchunker_module_init(PyObject *module) { + ZstdCompressionChunkerIteratorType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressionChunkerIteratorSpec); + if (PyType_Ready(ZstdCompressionChunkerIteratorType) < 0) { + return; + } + + ZstdCompressionChunkerType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressionChunkerSpec); + if (PyType_Ready(ZstdCompressionChunkerType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py3/c-ext/compressiondict.c b/contrib/python/zstandard/py3/c-ext/compressiondict.c new file mode 100644 index 00000000000..437601202b8 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressiondict.c @@ -0,0 +1,348 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +ZstdCompressionDict *train_dictionary(PyObject *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = { + "dict_size", "samples", "k", "d", + "f", "split_point", "accel", "notifications", + "dict_id", "level", "steps", "threads", + NULL}; + + size_t capacity; + PyObject *samples; + unsigned k = 0; + unsigned d = 0; + unsigned f = 0; + double splitPoint = 0.0; + unsigned accel = 0; + unsigned notifications = 0; + unsigned dictID = 0; + int level = 0; + unsigned steps = 0; + int threads = 0; + ZDICT_fastCover_params_t params; + Py_ssize_t samplesLen; + Py_ssize_t i; + size_t samplesSize = 0; + void *sampleBuffer = NULL; + size_t *sampleSizes = NULL; + void *sampleOffset; + Py_ssize_t sampleSize; + void *dict = NULL; + size_t zresult; + ZstdCompressionDict *result = NULL; + + if (!PyArg_ParseTupleAndKeywords( + args, kwargs, "nO!|IIIdIIIiIi:train_dictionary", kwlist, &capacity, + &PyList_Type, &samples, &k, &d, &f, &splitPoint, &accel, + ¬ifications, &dictID, &level, &steps, &threads)) { + return NULL; + } + + if (threads < 0) { + threads = cpu_count(); + } + + if (!steps && !threads) { + /* Defaults from ZDICT_trainFromBuffer() */ + d = d ? d : 8; + steps = steps ? steps : 4; + level = level ? level : 3; + } + + memset(¶ms, 0, sizeof(params)); + params.k = k; + params.d = d; + params.f = f; + params.steps = steps; + params.nbThreads = threads; + params.splitPoint = splitPoint; + params.accel = accel; + + params.zParams.compressionLevel = level; + params.zParams.dictID = dictID; + params.zParams.notificationLevel = notifications; + + /* Figure out total size of input samples. */ + samplesLen = PyList_Size(samples); + for (i = 0; i < samplesLen; i++) { + PyObject *sampleItem = PyList_GET_ITEM(samples, i); + + if (!PyBytes_Check(sampleItem)) { + PyErr_SetString(PyExc_ValueError, "samples must be bytes"); + return NULL; + } + samplesSize += PyBytes_GET_SIZE(sampleItem); + } + + sampleBuffer = PyMem_Malloc(samplesSize); + if (!sampleBuffer) { + PyErr_NoMemory(); + goto finally; + } + + sampleSizes = PyMem_Malloc(samplesLen * sizeof(size_t)); + if (!sampleSizes) { + PyErr_NoMemory(); + goto finally; + } + + sampleOffset = sampleBuffer; + for (i = 0; i < samplesLen; i++) { + PyObject *sampleItem = PyList_GET_ITEM(samples, i); + sampleSize = PyBytes_GET_SIZE(sampleItem); + sampleSizes[i] = sampleSize; + memcpy(sampleOffset, PyBytes_AS_STRING(sampleItem), sampleSize); + sampleOffset = (char *)sampleOffset + sampleSize; + } + + dict = PyMem_Malloc(capacity); + if (!dict) { + PyErr_NoMemory(); + goto finally; + } + + Py_BEGIN_ALLOW_THREADS zresult = ZDICT_optimizeTrainFromBuffer_fastCover( + dict, capacity, sampleBuffer, sampleSizes, (unsigned)samplesLen, + ¶ms); + Py_END_ALLOW_THREADS + + if (ZDICT_isError(zresult)) { + PyMem_Free(dict); + PyErr_Format(ZstdError, "cannot train dict: %s", + ZDICT_getErrorName(zresult)); + goto finally; + } + + result = PyObject_New(ZstdCompressionDict, ZstdCompressionDictType); + if (!result) { + PyMem_Free(dict); + goto finally; + } + + result->dictData = dict; + result->dictSize = zresult; + result->dictType = ZSTD_dct_fullDict; + result->d = params.d; + result->k = params.k; + result->cdict = NULL; + result->ddict = NULL; + +finally: + PyMem_Free(sampleBuffer); + PyMem_Free(sampleSizes); + + return result; +} + +int ensure_ddict(ZstdCompressionDict *dict) { + if (dict->ddict) { + return 0; + } + + Py_BEGIN_ALLOW_THREADS dict->ddict = ZSTD_createDDict_advanced( + dict->dictData, dict->dictSize, ZSTD_dlm_byRef, dict->dictType, + ZSTD_defaultCMem); + Py_END_ALLOW_THREADS if (!dict->ddict) { + PyErr_SetString(ZstdError, "could not create decompression dict"); + return 1; + } + + return 0; +} + +static int ZstdCompressionDict_init(ZstdCompressionDict *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"data", "dict_type", NULL}; + + int result = -1; + Py_buffer source; + unsigned dictType = ZSTD_dct_auto; + + self->dictData = NULL; + self->dictSize = 0; + self->cdict = NULL; + self->ddict = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|I:ZstdCompressionDict", + kwlist, &source, &dictType)) { + return -1; + } + + if (dictType != ZSTD_dct_auto && dictType != ZSTD_dct_rawContent && + dictType != ZSTD_dct_fullDict) { + PyErr_Format( + PyExc_ValueError, + "invalid dictionary load mode: %d; must use DICT_TYPE_* constants", + dictType); + goto finally; + } + + self->dictType = dictType; + + self->dictData = PyMem_Malloc(source.len); + if (!self->dictData) { + PyErr_NoMemory(); + goto finally; + } + + memcpy(self->dictData, source.buf, source.len); + self->dictSize = source.len; + + result = 0; + +finally: + PyBuffer_Release(&source); + return result; +} + +static void ZstdCompressionDict_dealloc(ZstdCompressionDict *self) { + if (self->cdict) { + ZSTD_freeCDict(self->cdict); + self->cdict = NULL; + } + + if (self->ddict) { + ZSTD_freeDDict(self->ddict); + self->ddict = NULL; + } + + if (self->dictData) { + PyMem_Free(self->dictData); + self->dictData = NULL; + } + + PyObject_Del(self); +} + +static PyObject * +ZstdCompressionDict_precompute_compress(ZstdCompressionDict *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"level", "compression_params", NULL}; + + int level = 0; + ZstdCompressionParametersObject *compressionParams = NULL; + ZSTD_compressionParameters cParams; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords( + args, kwargs, "|iO!:precompute_compress", kwlist, &level, + ZstdCompressionParametersType, &compressionParams)) { + return NULL; + } + + if (level && compressionParams) { + PyErr_SetString(PyExc_ValueError, + "must only specify one of level or compression_params"); + return NULL; + } + + if (!level && !compressionParams) { + PyErr_SetString(PyExc_ValueError, + "must specify one of level or compression_params"); + return NULL; + } + + if (self->cdict) { + zresult = ZSTD_freeCDict(self->cdict); + self->cdict = NULL; + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to free CDict: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + } + + if (level) { + cParams = ZSTD_getCParams(level, 0, self->dictSize); + } + else { + if (to_cparams(compressionParams, &cParams)) { + return NULL; + } + } + + assert(!self->cdict); + self->cdict = ZSTD_createCDict_advanced(self->dictData, self->dictSize, + ZSTD_dlm_byRef, self->dictType, + cParams, ZSTD_defaultCMem); + + if (!self->cdict) { + PyErr_SetString(ZstdError, "unable to precompute dictionary"); + return NULL; + } + + Py_RETURN_NONE; +} + +static PyObject *ZstdCompressionDict_dict_id(ZstdCompressionDict *self) { + unsigned dictID = ZDICT_getDictID(self->dictData, self->dictSize); + + return PyLong_FromLong(dictID); +} + +static PyObject *ZstdCompressionDict_as_bytes(ZstdCompressionDict *self) { + return PyBytes_FromStringAndSize(self->dictData, self->dictSize); +} + +static PyMethodDef ZstdCompressionDict_methods[] = { + {"dict_id", (PyCFunction)ZstdCompressionDict_dict_id, METH_NOARGS, + PyDoc_STR("dict_id() -- obtain the numeric dictionary ID")}, + {"as_bytes", (PyCFunction)ZstdCompressionDict_as_bytes, METH_NOARGS, + PyDoc_STR("as_bytes() -- obtain the raw bytes constituting the dictionary " + "data")}, + {"precompute_compress", + (PyCFunction)ZstdCompressionDict_precompute_compress, + METH_VARARGS | METH_KEYWORDS, NULL}, + {NULL, NULL}}; + +static PyMemberDef ZstdCompressionDict_members[] = { + {"k", T_UINT, offsetof(ZstdCompressionDict, k), READONLY, "segment size"}, + {"d", T_UINT, offsetof(ZstdCompressionDict, d), READONLY, "dmer size"}, + {NULL}}; + +static Py_ssize_t ZstdCompressionDict_length(ZstdCompressionDict *self) { + return self->dictSize; +} + +PyType_Slot ZstdCompressionDictSlots[] = { + {Py_tp_dealloc, ZstdCompressionDict_dealloc}, + {Py_sq_length, ZstdCompressionDict_length}, + {Py_tp_methods, ZstdCompressionDict_methods}, + {Py_tp_members, ZstdCompressionDict_members}, + {Py_tp_init, ZstdCompressionDict_init}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressionDictSpec = { + "zstd.ZstdCompressionDict", + sizeof(ZstdCompressionDict), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressionDictSlots, +}; + +PyTypeObject *ZstdCompressionDictType; + +void compressiondict_module_init(PyObject *mod) { + ZstdCompressionDictType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressionDictSpec); + if (PyType_Ready(ZstdCompressionDictType) < 0) { + return; + } + + Py_INCREF((PyObject *)ZstdCompressionDictType); + PyModule_AddObject(mod, "ZstdCompressionDict", + (PyObject *)ZstdCompressionDictType); +} diff --git a/contrib/python/zstandard/py3/c-ext/compressionparams.c b/contrib/python/zstandard/py3/c-ext/compressionparams.c new file mode 100644 index 00000000000..0cc7cbbe6e4 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressionparams.c @@ -0,0 +1,501 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +int set_parameter(ZSTD_CCtx_params *params, ZSTD_cParameter param, int value) { + size_t zresult = ZSTD_CCtxParams_setParameter(params, param, value); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, + "unable to set compression context parameter: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + + return 0; +} + +#define TRY_SET_PARAMETER(params, param, value) \ + if (set_parameter(params, param, value)) \ + return -1; + +#define TRY_COPY_PARAMETER(source, dest, param) \ + { \ + int result; \ + size_t zresult = ZSTD_CCtxParams_getParameter(source, param, &result); \ + if (ZSTD_isError(zresult)) { \ + return 1; \ + } \ + zresult = ZSTD_CCtxParams_setParameter(dest, param, result); \ + if (ZSTD_isError(zresult)) { \ + return 1; \ + } \ + } + +int set_parameters(ZSTD_CCtx_params *params, + ZstdCompressionParametersObject *obj) { + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_nbWorkers); + + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_format); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_compressionLevel); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_windowLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_hashLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_chainLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_searchLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_minMatch); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_targetLength); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_strategy); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_contentSizeFlag); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_checksumFlag); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_dictIDFlag); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_jobSize); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_overlapLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_forceMaxWindow); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_enableLongDistanceMatching); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmHashLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmMinMatch); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmBucketSizeLog); + TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmHashRateLog); + + return 0; +} + +int reset_params(ZstdCompressionParametersObject *params) { + if (params->params) { + ZSTD_CCtxParams_reset(params->params); + } + else { + params->params = ZSTD_createCCtxParams(); + if (!params->params) { + PyErr_NoMemory(); + return 1; + } + } + + return set_parameters(params->params, params); +} + +#define TRY_GET_PARAMETER(params, param, value) \ + { \ + size_t zresult = ZSTD_CCtxParams_getParameter(params, param, value); \ + if (ZSTD_isError(zresult)) { \ + PyErr_Format(ZstdError, "unable to retrieve parameter: %s", \ + ZSTD_getErrorName(zresult)); \ + return 1; \ + } \ + } + +int to_cparams(ZstdCompressionParametersObject *params, + ZSTD_compressionParameters *cparams) { + int value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_windowLog, &value); + cparams->windowLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_chainLog, &value); + cparams->chainLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_hashLog, &value); + cparams->hashLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_searchLog, &value); + cparams->searchLog = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_minMatch, &value); + cparams->minMatch = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_targetLength, &value); + cparams->targetLength = value; + + TRY_GET_PARAMETER(params->params, ZSTD_c_strategy, &value); + cparams->strategy = value; + + return 0; +} + +static int ZstdCompressionParameters_init(ZstdCompressionParametersObject *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"format", + "compression_level", + "window_log", + "hash_log", + "chain_log", + "search_log", + "min_match", + "target_length", + "strategy", + "write_content_size", + "write_checksum", + "write_dict_id", + "job_size", + "overlap_log", + "force_max_window", + "enable_ldm", + "ldm_hash_log", + "ldm_min_match", + "ldm_bucket_size_log", + "ldm_hash_rate_log", + "threads", + NULL}; + + int format = 0; + int compressionLevel = 0; + int windowLog = 0; + int hashLog = 0; + int chainLog = 0; + int searchLog = 0; + int minMatch = 0; + int targetLength = 0; + int strategy = -1; + int contentSizeFlag = 1; + int checksumFlag = 0; + int dictIDFlag = 0; + int jobSize = 0; + int overlapLog = -1; + int forceMaxWindow = 0; + int enableLDM = 0; + int ldmHashLog = 0; + int ldmMinMatch = 0; + int ldmBucketSizeLog = 0; + int ldmHashRateLog = -1; + int threads = 0; + + if (!PyArg_ParseTupleAndKeywords( + args, kwargs, "|iiiiiiiiiiiiiiiiiiiii:ZstdCompressionParameters", + kwlist, &format, &compressionLevel, &windowLog, &hashLog, &chainLog, + &searchLog, &minMatch, &targetLength, &strategy, &contentSizeFlag, + &checksumFlag, &dictIDFlag, &jobSize, &overlapLog, &forceMaxWindow, + &enableLDM, &ldmHashLog, &ldmMinMatch, &ldmBucketSizeLog, + &ldmHashRateLog, &threads)) { + return -1; + } + + if (reset_params(self)) { + return -1; + } + + if (threads < 0) { + threads = cpu_count(); + } + + /* We need to set ZSTD_c_nbWorkers before ZSTD_c_jobSize and + * ZSTD_c_overlapLog because setting ZSTD_c_nbWorkers resets the other + * parameters. */ + TRY_SET_PARAMETER(self->params, ZSTD_c_nbWorkers, threads); + + TRY_SET_PARAMETER(self->params, ZSTD_c_format, format); + TRY_SET_PARAMETER(self->params, ZSTD_c_compressionLevel, compressionLevel); + TRY_SET_PARAMETER(self->params, ZSTD_c_windowLog, windowLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_hashLog, hashLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_chainLog, chainLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_searchLog, searchLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_minMatch, minMatch); + TRY_SET_PARAMETER(self->params, ZSTD_c_targetLength, targetLength); + + if (strategy == -1) { + strategy = 0; + } + + TRY_SET_PARAMETER(self->params, ZSTD_c_strategy, strategy); + TRY_SET_PARAMETER(self->params, ZSTD_c_contentSizeFlag, contentSizeFlag); + TRY_SET_PARAMETER(self->params, ZSTD_c_checksumFlag, checksumFlag); + TRY_SET_PARAMETER(self->params, ZSTD_c_dictIDFlag, dictIDFlag); + TRY_SET_PARAMETER(self->params, ZSTD_c_jobSize, jobSize); + + if (overlapLog == -1) { + overlapLog = 0; + } + + TRY_SET_PARAMETER(self->params, ZSTD_c_overlapLog, overlapLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_forceMaxWindow, forceMaxWindow); + TRY_SET_PARAMETER(self->params, ZSTD_c_enableLongDistanceMatching, + enableLDM); + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmHashLog, ldmHashLog); + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmMinMatch, ldmMinMatch); + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmBucketSizeLog, ldmBucketSizeLog); + + if (ldmHashRateLog == -1) { + ldmHashRateLog = 0; + } + + TRY_SET_PARAMETER(self->params, ZSTD_c_ldmHashRateLog, ldmHashRateLog); + + return 0; +} + +ZstdCompressionParametersObject * +CompressionParameters_from_level(PyObject *undef, PyObject *args, + PyObject *kwargs) { + int managedKwargs = 0; + int level; + PyObject *sourceSize = NULL; + PyObject *dictSize = NULL; + unsigned PY_LONG_LONG iSourceSize = 0; + Py_ssize_t iDictSize = 0; + PyObject *val; + ZSTD_compressionParameters params; + ZstdCompressionParametersObject *result = NULL; + int res; + + if (!PyArg_ParseTuple(args, "i:from_level", &level)) { + return NULL; + } + + if (!kwargs) { + kwargs = PyDict_New(); + if (!kwargs) { + return NULL; + } + managedKwargs = 1; + } + + sourceSize = PyDict_GetItemString(kwargs, "source_size"); + if (sourceSize) { + iSourceSize = PyLong_AsUnsignedLongLong(sourceSize); + if (iSourceSize == (unsigned PY_LONG_LONG)(-1)) { + goto cleanup; + } + + PyDict_DelItemString(kwargs, "source_size"); + } + + dictSize = PyDict_GetItemString(kwargs, "dict_size"); + if (dictSize) { + iDictSize = PyLong_AsSsize_t(dictSize); + if (iDictSize == -1) { + goto cleanup; + } + + PyDict_DelItemString(kwargs, "dict_size"); + } + + params = ZSTD_getCParams(level, iSourceSize, iDictSize); + + /* Values derived from the input level and sizes are passed along to the + constructor. But only if a value doesn't already exist. */ + val = PyDict_GetItemString(kwargs, "window_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.windowLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "window_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "chain_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.chainLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "chain_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "hash_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.hashLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "hash_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "search_log"); + if (!val) { + val = PyLong_FromUnsignedLong(params.searchLog); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "search_log", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "min_match"); + if (!val) { + val = PyLong_FromUnsignedLong(params.minMatch); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "min_match", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "target_length"); + if (!val) { + val = PyLong_FromUnsignedLong(params.targetLength); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "target_length", val); + Py_DECREF(val); + } + + val = PyDict_GetItemString(kwargs, "strategy"); + if (!val) { + val = PyLong_FromUnsignedLong(params.strategy); + if (!val) { + goto cleanup; + } + PyDict_SetItemString(kwargs, "strategy", val); + Py_DECREF(val); + } + + result = PyObject_New(ZstdCompressionParametersObject, + ZstdCompressionParametersType); + if (!result) { + goto cleanup; + } + + result->params = NULL; + + val = PyTuple_New(0); + if (!val) { + Py_CLEAR(result); + goto cleanup; + } + + res = ZstdCompressionParameters_init(result, val, kwargs); + Py_DECREF(val); + + if (res) { + Py_CLEAR(result); + goto cleanup; + } + +cleanup: + if (managedKwargs) { + Py_DECREF(kwargs); + } + + return result; +} + +PyObject *ZstdCompressionParameters_estimated_compression_context_size( + ZstdCompressionParametersObject *self) { + return PyLong_FromSize_t( + ZSTD_estimateCCtxSize_usingCCtxParams(self->params)); +} + +static void +ZstdCompressionParameters_dealloc(ZstdCompressionParametersObject *self) { + if (self->params) { + ZSTD_freeCCtxParams(self->params); + self->params = NULL; + } + + PyObject_Del(self); +} + +#define PARAM_GETTER(name, param) \ + PyObject *ZstdCompressionParameters_get_##name(PyObject *self, \ + void *unused) { \ + int result; \ + size_t zresult; \ + ZstdCompressionParametersObject *p = \ + (ZstdCompressionParametersObject *)(self); \ + zresult = ZSTD_CCtxParams_getParameter(p->params, param, &result); \ + if (ZSTD_isError(zresult)) { \ + PyErr_Format(ZstdError, "unable to get compression parameter: %s", \ + ZSTD_getErrorName(zresult)); \ + return NULL; \ + } \ + return PyLong_FromLong(result); \ + } + +PARAM_GETTER(format, ZSTD_c_format) +PARAM_GETTER(compression_level, ZSTD_c_compressionLevel) +PARAM_GETTER(window_log, ZSTD_c_windowLog) +PARAM_GETTER(hash_log, ZSTD_c_hashLog) +PARAM_GETTER(chain_log, ZSTD_c_chainLog) +PARAM_GETTER(search_log, ZSTD_c_searchLog) +PARAM_GETTER(min_match, ZSTD_c_minMatch) +PARAM_GETTER(target_length, ZSTD_c_targetLength) +PARAM_GETTER(strategy, ZSTD_c_strategy) +PARAM_GETTER(write_content_size, ZSTD_c_contentSizeFlag) +PARAM_GETTER(write_checksum, ZSTD_c_checksumFlag) +PARAM_GETTER(write_dict_id, ZSTD_c_dictIDFlag) +PARAM_GETTER(job_size, ZSTD_c_jobSize) +PARAM_GETTER(overlap_log, ZSTD_c_overlapLog) +PARAM_GETTER(force_max_window, ZSTD_c_forceMaxWindow) +PARAM_GETTER(enable_ldm, ZSTD_c_enableLongDistanceMatching) +PARAM_GETTER(ldm_hash_log, ZSTD_c_ldmHashLog) +PARAM_GETTER(ldm_min_match, ZSTD_c_ldmMinMatch) +PARAM_GETTER(ldm_bucket_size_log, ZSTD_c_ldmBucketSizeLog) +PARAM_GETTER(ldm_hash_rate_log, ZSTD_c_ldmHashRateLog) +PARAM_GETTER(threads, ZSTD_c_nbWorkers) + +static PyMethodDef ZstdCompressionParameters_methods[] = { + {"from_level", (PyCFunction)CompressionParameters_from_level, + METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, + {"estimated_compression_context_size", + (PyCFunction)ZstdCompressionParameters_estimated_compression_context_size, + METH_NOARGS, NULL}, + {NULL, NULL}}; + +#define GET_SET_ENTRY(name) \ + { #name, ZstdCompressionParameters_get_##name, NULL, NULL, NULL } + +static PyGetSetDef ZstdCompressionParameters_getset[] = { + GET_SET_ENTRY(format), + GET_SET_ENTRY(compression_level), + GET_SET_ENTRY(window_log), + GET_SET_ENTRY(hash_log), + GET_SET_ENTRY(chain_log), + GET_SET_ENTRY(search_log), + GET_SET_ENTRY(min_match), + GET_SET_ENTRY(target_length), + GET_SET_ENTRY(strategy), + GET_SET_ENTRY(write_content_size), + GET_SET_ENTRY(write_checksum), + GET_SET_ENTRY(write_dict_id), + GET_SET_ENTRY(threads), + GET_SET_ENTRY(job_size), + GET_SET_ENTRY(overlap_log), + GET_SET_ENTRY(force_max_window), + GET_SET_ENTRY(enable_ldm), + GET_SET_ENTRY(ldm_hash_log), + GET_SET_ENTRY(ldm_min_match), + GET_SET_ENTRY(ldm_bucket_size_log), + GET_SET_ENTRY(ldm_hash_rate_log), + {NULL}}; + +PyType_Slot ZstdCompressionParametersSlots[] = { + {Py_tp_dealloc, ZstdCompressionParameters_dealloc}, + {Py_tp_methods, ZstdCompressionParameters_methods}, + {Py_tp_getset, ZstdCompressionParameters_getset}, + {Py_tp_init, ZstdCompressionParameters_init}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressionParametersSpec = { + "zstd.ZstdCompressionParameters", + sizeof(ZstdCompressionParametersObject), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressionParametersSlots, +}; + +PyTypeObject *ZstdCompressionParametersType; + +void compressionparams_module_init(PyObject *mod) { + ZstdCompressionParametersType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressionParametersSpec); + if (PyType_Ready(ZstdCompressionParametersType) < 0) { + return; + } + + Py_INCREF(ZstdCompressionParametersType); + PyModule_AddObject(mod, "ZstdCompressionParameters", + (PyObject *)ZstdCompressionParametersType); +} diff --git a/contrib/python/zstandard/py3/c-ext/compressionreader.c b/contrib/python/zstandard/py3/c-ext/compressionreader.c new file mode 100644 index 00000000000..110ce9be866 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressionreader.c @@ -0,0 +1,812 @@ +/** + * Copyright (c) 2017-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void compressionreader_dealloc(ZstdCompressionReader *self) { + Py_XDECREF(self->compressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + PyObject_Del(self); +} + +static ZstdCompressionReader * +compressionreader_enter(ZstdCompressionReader *self) { + if (self->entered) { + PyErr_SetString(PyExc_ValueError, "cannot __enter__ multiple times"); + return NULL; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return self; +} + +static PyObject *compressionreader_exit(ZstdCompressionReader *self, + PyObject *args) { + PyObject *exc_type; + PyObject *exc_value; + PyObject *exc_tb; + PyObject *result; + + if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, + &exc_tb)) { + return NULL; + } + + self->entered = 0; + + result = PyObject_CallMethod((PyObject *)self, "close", NULL); + if (NULL == result) { + return NULL; + } + + /* Release resources associated with source. */ + Py_CLEAR(self->reader); + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + Py_CLEAR(self->compressor); + + Py_RETURN_FALSE; +} + +static PyObject *compressionreader_readable(ZstdCompressionReader *self) { + Py_RETURN_TRUE; +} + +static PyObject *compressionreader_writable(ZstdCompressionReader *self) { + Py_RETURN_FALSE; +} + +static PyObject *compressionreader_seekable(ZstdCompressionReader *self) { + Py_RETURN_FALSE; +} + +static PyObject *compressionreader_readline(PyObject *self, PyObject *args) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *compressionreader_readlines(PyObject *self, PyObject *args) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *compressionreader_write(PyObject *self, PyObject *args) { + PyErr_SetString(PyExc_OSError, "stream is not writable"); + return NULL; +} + +static PyObject *compressionreader_writelines(PyObject *self, PyObject *args) { + PyErr_SetString(PyExc_OSError, "stream is not writable"); + return NULL; +} + +static PyObject *compressionreader_isatty(PyObject *self) { + Py_RETURN_FALSE; +} + +static PyObject *compressionreader_flush(PyObject *self) { + Py_RETURN_NONE; +} + +static PyObject *compressionreader_close(ZstdCompressionReader *self) { + if (self->closed) { + Py_RETURN_NONE; + } + + self->closed = 1; + + if (self->closefd && self->reader != NULL && + PyObject_HasAttrString(self->reader, "close")) { + return PyObject_CallMethod(self->reader, "close", NULL); + } + + Py_RETURN_NONE; +} + +static PyObject *compressionreader_tell(ZstdCompressionReader *self) { + /* TODO should this raise OSError since stream isn't seekable? */ + return PyLong_FromUnsignedLongLong(self->bytesCompressed); +} + +int read_compressor_input(ZstdCompressionReader *self) { + if (self->finishedInput) { + return 0; + } + + if (self->input.pos != self->input.size) { + return 0; + } + + if (self->reader) { + Py_buffer buffer; + + assert(self->readResult == NULL); + + self->readResult = + PyObject_CallMethod(self->reader, "read", "k", self->readSize); + + if (NULL == self->readResult) { + return -1; + } + + memset(&buffer, 0, sizeof(buffer)); + + if (0 != + PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) { + return -1; + } + + /* EOF */ + if (0 == buffer.len) { + self->finishedInput = 1; + Py_CLEAR(self->readResult); + } + else { + self->input.src = buffer.buf; + self->input.size = buffer.len; + self->input.pos = 0; + } + + PyBuffer_Release(&buffer); + } + else { + assert(self->buffer.buf); + + self->input.src = self->buffer.buf; + self->input.size = self->buffer.len; + self->input.pos = 0; + } + + return 1; +} + +int compress_input(ZstdCompressionReader *self, ZSTD_outBuffer *output) { + size_t oldPos; + size_t zresult; + + /* If we have data left over, consume it. */ + if (self->input.pos < self->input.size) { + oldPos = output->pos; + + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_compressStream2( + self->compressor->cctx, output, &self->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + self->bytesCompressed += output->pos - oldPos; + + /* Input exhausted. Clear out state tracking. */ + if (self->input.pos == self->input.size) { + memset(&self->input, 0, sizeof(self->input)); + Py_CLEAR(self->readResult); + + if (self->buffer.buf) { + self->finishedInput = 1; + } + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + return -1; + } + } + + if (output->pos && output->pos == output->size) { + return 1; + } + else { + return 0; + } +} + +static PyObject *compressionreader_read(ZstdCompressionReader *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"size", NULL}; + + Py_ssize_t size = -1; + PyObject *result = NULL; + char *resultBuffer; + Py_ssize_t resultSize; + size_t zresult; + size_t oldPos; + int readResult, compressResult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, + "cannot read negative amounts less than -1"); + return NULL; + } + + if (size == -1) { + return PyObject_CallMethod((PyObject *)self, "readall", NULL); + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + self->output.dst = resultBuffer; + self->output.size = resultSize; + self->output.pos = 0; + +readinput: + + compressResult = compress_input(self, &self->output); + + if (-1 == compressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == compressResult) { + /* There is room in the output. We fall through to below, which will + * either get more input for us or will attempt to end the stream. + */ + } + else if (1 == compressResult) { + memset(&self->output, 0, sizeof(self->output)); + return result; + } + else { + assert(0); + } + + readResult = read_compressor_input(self); + + if (-1 == readResult) { + return NULL; + } + else if (0 == readResult) { + } + else if (1 == readResult) { + } + else { + assert(0); + } + + if (self->input.size) { + goto readinput; + } + + /* Else EOF */ + oldPos = self->output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &self->input, ZSTD_e_end); + + self->bytesCompressed += self->output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + Py_XDECREF(result); + return NULL; + } + + assert(self->output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + if (safe_pybytes_resize(&result, self->output.pos)) { + Py_XDECREF(result); + return NULL; + } + + memset(&self->output, 0, sizeof(self->output)); + + return result; +} + +static PyObject *compressionreader_read1(ZstdCompressionReader *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"size", NULL}; + + Py_ssize_t size = -1; + PyObject *result = NULL; + char *resultBuffer; + Py_ssize_t resultSize; + ZSTD_outBuffer output; + int compressResult; + size_t oldPos; + size_t zresult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n:read1", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, + "cannot read negative amounts less than -1"); + return NULL; + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + if (size == -1) { + size = ZSTD_CStreamOutSize(); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + output.dst = resultBuffer; + output.size = resultSize; + output.pos = 0; + + /* read1() is supposed to use at most 1 read() from the underlying stream. + However, we can't satisfy this requirement with compression because + not every input will generate output. We /could/ flush the compressor, + but this may not be desirable. We allow multiple read() from the + underlying stream. But unlike read(), we return as soon as output data + is available. + */ + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == compressResult || 1 == compressResult) { + } + else { + assert(0); + } + + if (output.pos) { + goto finally; + } + + while (!self->finishedInput) { + int readResult = read_compressor_input(self); + + if (-1 == readResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == readResult || 1 == readResult) { + } + else { + assert(0); + } + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == compressResult || 1 == compressResult) { + } + else { + assert(0); + } + + if (output.pos) { + goto finally; + } + } + + /* EOF */ + oldPos = output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &output, + &self->input, ZSTD_e_end); + + self->bytesCompressed += output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + Py_XDECREF(result); + return NULL; + } + + if (zresult == 0) { + self->finishedOutput = 1; + } + +finally: + if (result) { + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + } + + return result; +} + +static PyObject *compressionreader_readall(PyObject *self) { + PyObject *chunks = NULL; + PyObject *empty = NULL; + PyObject *result = NULL; + + /* Our strategy is to collect chunks into a list then join all the + * chunks at the end. We could potentially use e.g. an io.BytesIO. But + * this feels simple enough to implement and avoids potentially expensive + * reallocations of large buffers. + */ + chunks = PyList_New(0); + if (NULL == chunks) { + return NULL; + } + + while (1) { + PyObject *chunk = PyObject_CallMethod(self, "read", "i", 1048576); + if (NULL == chunk) { + Py_DECREF(chunks); + return NULL; + } + + if (!PyBytes_Size(chunk)) { + Py_DECREF(chunk); + break; + } + + if (PyList_Append(chunks, chunk)) { + Py_DECREF(chunk); + Py_DECREF(chunks); + return NULL; + } + + Py_DECREF(chunk); + } + + empty = PyBytes_FromStringAndSize("", 0); + if (NULL == empty) { + Py_DECREF(chunks); + return NULL; + } + + result = PyObject_CallMethod(empty, "join", "O", chunks); + + Py_DECREF(empty); + Py_DECREF(chunks); + + return result; +} + +static PyObject *compressionreader_readinto(ZstdCompressionReader *self, + PyObject *args) { + Py_buffer dest; + ZSTD_outBuffer output; + int readResult, compressResult; + PyObject *result = NULL; + size_t zresult; + size_t oldPos; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto", &dest)) { + return NULL; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult) { + } + else if (1 == compressResult) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + + while (!self->finishedInput) { + readResult = read_compressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult || 1 == readResult) { + } + else { + assert(0); + } + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult) { + } + else if (1 == compressResult) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + } + + /* EOF */ + oldPos = output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &output, + &self->input, ZSTD_e_end); + + self->bytesCompressed += self->output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + assert(output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject *compressionreader_readinto1(ZstdCompressionReader *self, + PyObject *args) { + Py_buffer dest; + PyObject *result = NULL; + ZSTD_outBuffer output; + int compressResult; + size_t oldPos; + size_t zresult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto1", &dest)) { + return NULL; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult || 1 == compressResult) { + } + else { + assert(0); + } + + if (output.pos) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + + while (!self->finishedInput) { + int readResult = read_compressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult || 1 == readResult) { + } + else { + assert(0); + } + + compressResult = compress_input(self, &output); + + if (-1 == compressResult) { + goto finally; + } + else if (0 == compressResult) { + } + else if (1 == compressResult) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + + /* If we produced output and we're not done with input, emit + * that output now, as we've hit restrictions of read1(). + */ + if (output.pos && !self->finishedInput) { + result = PyLong_FromSize_t(output.pos); + goto finally; + } + + /* Otherwise we either have no output or we've exhausted the + * input. Either we try to get more input or we fall through + * to EOF below */ + } + + /* EOF */ + oldPos = output.pos; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &output, + &self->input, ZSTD_e_end); + + self->bytesCompressed += self->output.pos - oldPos; + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + assert(output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject *compressionreader_iter(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *compressionreader_iternext(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyMethodDef compressionreader_methods[] = { + {"__enter__", (PyCFunction)compressionreader_enter, METH_NOARGS, + PyDoc_STR("Enter a compression context")}, + {"__exit__", (PyCFunction)compressionreader_exit, METH_VARARGS, + PyDoc_STR("Exit a compression context")}, + {"close", (PyCFunction)compressionreader_close, METH_NOARGS, + PyDoc_STR("Close the stream so it cannot perform any more operations")}, + {"flush", (PyCFunction)compressionreader_flush, METH_NOARGS, + PyDoc_STR("no-ops")}, + {"isatty", (PyCFunction)compressionreader_isatty, METH_NOARGS, + PyDoc_STR("Returns False")}, + {"readable", (PyCFunction)compressionreader_readable, METH_NOARGS, + PyDoc_STR("Returns True")}, + {"read", (PyCFunction)compressionreader_read, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("read compressed data")}, + {"read1", (PyCFunction)compressionreader_read1, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"readall", (PyCFunction)compressionreader_readall, METH_NOARGS, + PyDoc_STR("Not implemented")}, + {"readinto", (PyCFunction)compressionreader_readinto, METH_VARARGS, NULL}, + {"readinto1", (PyCFunction)compressionreader_readinto1, METH_VARARGS, NULL}, + {"readline", (PyCFunction)compressionreader_readline, METH_VARARGS, + PyDoc_STR("Not implemented")}, + {"readlines", (PyCFunction)compressionreader_readlines, METH_VARARGS, + PyDoc_STR("Not implemented")}, + {"seekable", (PyCFunction)compressionreader_seekable, METH_NOARGS, + PyDoc_STR("Returns False")}, + {"tell", (PyCFunction)compressionreader_tell, METH_NOARGS, + PyDoc_STR("Returns current number of bytes compressed")}, + {"writable", (PyCFunction)compressionreader_writable, METH_NOARGS, + PyDoc_STR("Returns False")}, + {"write", compressionreader_write, METH_VARARGS, + PyDoc_STR("Raises OSError")}, + {"writelines", compressionreader_writelines, METH_VARARGS, + PyDoc_STR("Not implemented")}, + {NULL, NULL}}; + +static PyMemberDef compressionreader_members[] = { + {"closed", T_BOOL, offsetof(ZstdCompressionReader, closed), READONLY, + "whether stream is closed"}, + {NULL}}; + +PyType_Slot ZstdCompressionReaderSlots[] = { + {Py_tp_dealloc, compressionreader_dealloc}, + {Py_tp_iter, compressionreader_iter}, + {Py_tp_iternext, compressionreader_iternext}, + {Py_tp_methods, compressionreader_methods}, + {Py_tp_members, compressionreader_members}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressionReaderSpec = { + "zstd.ZstdCompressionReader", + sizeof(ZstdCompressionReader), + 0, + Py_TPFLAGS_DEFAULT, + ZstdCompressionReaderSlots, +}; + +PyTypeObject *ZstdCompressionReaderType; + +void compressionreader_module_init(PyObject *mod) { + /* TODO make reader a sub-class of io.RawIOBase */ + + ZstdCompressionReaderType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressionReaderSpec); + if (PyType_Ready(ZstdCompressionReaderType) < 0) { + return; + } + + Py_INCREF((PyObject *)ZstdCompressionReaderType); + PyModule_AddObject(mod, "ZstdCompressionReader", + (PyObject *)ZstdCompressionReaderType); +} diff --git a/contrib/python/zstandard/py3/c-ext/compressionwriter.c b/contrib/python/zstandard/py3/c-ext/compressionwriter.c new file mode 100644 index 00000000000..1def1724b9b --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressionwriter.c @@ -0,0 +1,352 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void ZstdCompressionWriter_dealloc(ZstdCompressionWriter *self) { + Py_XDECREF(self->compressor); + Py_XDECREF(self->writer); + + PyMem_Free(self->output.dst); + self->output.dst = NULL; + + PyObject_Del(self); +} + +static PyObject *ZstdCompressionWriter_enter(ZstdCompressionWriter *self) { + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->entered) { + PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return (PyObject *)self; +} + +static PyObject *ZstdCompressionWriter_exit(ZstdCompressionWriter *self, + PyObject *args) { + PyObject *exc_type; + PyObject *exc_value; + PyObject *exc_tb; + + if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, + &exc_tb)) { + return NULL; + } + + self->entered = 0; + + PyObject *result = PyObject_CallMethod((PyObject *)self, "close", NULL); + + if (NULL == result) { + return NULL; + } + + Py_RETURN_FALSE; +} + +static PyObject * +ZstdCompressionWriter_memory_size(ZstdCompressionWriter *self) { + return PyLong_FromSize_t(ZSTD_sizeof_CCtx(self->compressor->cctx)); +} + +static PyObject *ZstdCompressionWriter_write(ZstdCompressionWriter *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"data", NULL}; + + PyObject *result = NULL; + Py_buffer source; + size_t zresult; + ZSTD_inBuffer input; + PyObject *res; + Py_ssize_t totalWrite = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:write", kwlist, + &source)) { + return NULL; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + self->output.pos = 0; + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + while (input.pos < (size_t)source.len) { + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_compressStream2( + self->compressor->cctx, &self->output, &input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + /* Copy data from output buffer to writer. */ + if (self->output.pos) { + res = PyObject_CallMethod(self->writer, "write", "y#", + self->output.dst, self->output.pos); + if (NULL == res) { + goto finally; + } + Py_XDECREF(res); + totalWrite += self->output.pos; + self->bytesCompressed += self->output.pos; + } + self->output.pos = 0; + } + + if (self->writeReturnRead) { + result = PyLong_FromSize_t(input.pos); + } + else { + result = PyLong_FromSsize_t(totalWrite); + } + +finally: + PyBuffer_Release(&source); + return result; +} + +static PyObject *ZstdCompressionWriter_flush(ZstdCompressionWriter *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"flush_mode", NULL}; + + size_t zresult; + ZSTD_inBuffer input; + PyObject *res; + Py_ssize_t totalWrite = 0; + unsigned flush_mode = 0; + ZSTD_EndDirective flush; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|I:flush", kwlist, + &flush_mode)) { + return NULL; + } + + switch (flush_mode) { + case 0: + flush = ZSTD_e_flush; + break; + case 1: + flush = ZSTD_e_end; + break; + default: + PyErr_Format(PyExc_ValueError, "unknown flush_mode: %d", flush_mode); + return NULL; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + self->output.pos = 0; + + input.src = NULL; + input.size = 0; + input.pos = 0; + + while (1) { + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_compressStream2( + self->compressor->cctx, &self->output, &input, flush); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + /* Copy data from output buffer to writer. */ + if (self->output.pos) { + res = PyObject_CallMethod(self->writer, "write", "y#", + self->output.dst, self->output.pos); + if (NULL == res) { + return NULL; + } + Py_XDECREF(res); + totalWrite += self->output.pos; + self->bytesCompressed += self->output.pos; + } + + self->output.pos = 0; + + if (!zresult) { + break; + } + } + + if (!self->closing && PyObject_HasAttrString(self->writer, "flush")) { + res = PyObject_CallMethod(self->writer, "flush", NULL); + if (NULL == res) { + return NULL; + } + Py_XDECREF(res); + } + + return PyLong_FromSsize_t(totalWrite); +} + +static PyObject *ZstdCompressionWriter_close(ZstdCompressionWriter *self) { + PyObject *result; + + if (self->closed) { + Py_RETURN_NONE; + } + + self->closing = 1; + result = PyObject_CallMethod((PyObject *)self, "flush", "I", 1); + self->closing = 0; + self->closed = 1; + + if (NULL == result) { + return NULL; + } + + /* Call close on underlying stream as well. */ + if (self->closefd && PyObject_HasAttrString(self->writer, "close")) { + return PyObject_CallMethod(self->writer, "close", NULL); + } + + Py_RETURN_NONE; +} + +static PyObject *ZstdCompressionWriter_fileno(ZstdCompressionWriter *self) { + if (PyObject_HasAttrString(self->writer, "fileno")) { + return PyObject_CallMethod(self->writer, "fileno", NULL); + } + else { + PyErr_SetString(PyExc_OSError, + "fileno not available on underlying writer"); + return NULL; + } +} + +static PyObject *ZstdCompressionWriter_tell(ZstdCompressionWriter *self) { + return PyLong_FromUnsignedLongLong(self->bytesCompressed); +} + +static PyObject *ZstdCompressionWriter_writelines(PyObject *self, + PyObject *args) { + PyErr_SetNone(PyExc_NotImplementedError); + return NULL; +} + +static PyObject *ZstdCompressionWriter_iter(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *ZstdCompressionWriter_iternext(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *ZstdCompressionWriter_false(PyObject *self, PyObject *args) { + Py_RETURN_FALSE; +} + +static PyObject *ZstdCompressionWriter_true(PyObject *self, PyObject *args) { + Py_RETURN_TRUE; +} + +static PyObject *ZstdCompressionWriter_unsupported(PyObject *self, + PyObject *args, + PyObject *kwargs) { + set_io_unsupported_operation(); + return NULL; +} + +static PyMethodDef ZstdCompressionWriter_methods[] = { + {"__enter__", (PyCFunction)ZstdCompressionWriter_enter, METH_NOARGS, + PyDoc_STR("Enter a compression context.")}, + {"__exit__", (PyCFunction)ZstdCompressionWriter_exit, METH_VARARGS, + PyDoc_STR("Exit a compression context.")}, + {"close", (PyCFunction)ZstdCompressionWriter_close, METH_NOARGS, NULL}, + {"fileno", (PyCFunction)ZstdCompressionWriter_fileno, METH_NOARGS, NULL}, + {"isatty", (PyCFunction)ZstdCompressionWriter_false, METH_NOARGS, NULL}, + {"readable", (PyCFunction)ZstdCompressionWriter_false, METH_NOARGS, NULL}, + {"readline", (PyCFunction)ZstdCompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"readlines", (PyCFunction)ZstdCompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"seek", (PyCFunction)ZstdCompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"seekable", ZstdCompressionWriter_false, METH_NOARGS, NULL}, + {"truncate", (PyCFunction)ZstdCompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"writable", ZstdCompressionWriter_true, METH_NOARGS, NULL}, + {"writelines", ZstdCompressionWriter_writelines, METH_VARARGS, NULL}, + {"read", (PyCFunction)ZstdCompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"readall", (PyCFunction)ZstdCompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"readinto", (PyCFunction)ZstdCompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"memory_size", (PyCFunction)ZstdCompressionWriter_memory_size, METH_NOARGS, + PyDoc_STR("Obtain the memory size of the underlying compressor")}, + {"write", (PyCFunction)ZstdCompressionWriter_write, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("Compress data")}, + {"flush", (PyCFunction)ZstdCompressionWriter_flush, + METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("Flush data and finish a zstd frame")}, + {"tell", (PyCFunction)ZstdCompressionWriter_tell, METH_NOARGS, + PyDoc_STR("Returns current number of bytes compressed")}, + {NULL, NULL}}; + +static PyMemberDef ZstdCompressionWriter_members[] = { + {"closed", T_BOOL, offsetof(ZstdCompressionWriter, closed), READONLY, NULL}, + {NULL}}; + +PyType_Slot ZstdCompressionWriterSlots[] = { + {Py_tp_dealloc, ZstdCompressionWriter_dealloc}, + {Py_tp_iter, ZstdCompressionWriter_iter}, + {Py_tp_iternext, ZstdCompressionWriter_iternext}, + {Py_tp_methods, ZstdCompressionWriter_methods}, + {Py_tp_members, ZstdCompressionWriter_members}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressionWriterSpec = { + "zstd.ZstdCompressionWriter", + sizeof(ZstdCompressionWriter), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressionWriterSlots, +}; + +PyTypeObject *ZstdCompressionWriterType; + +void compressionwriter_module_init(PyObject *mod) { + ZstdCompressionWriterType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressionWriterSpec); + if (PyType_Ready(ZstdCompressionWriterType) < 0) { + return; + } + + Py_INCREF((PyObject *)ZstdCompressionWriterType); + PyModule_AddObject(mod, "ZstdCompressionWriter", + (PyObject *)ZstdCompressionWriterType); +} diff --git a/contrib/python/zstandard/py3/c-ext/compressobj.c b/contrib/python/zstandard/py3/c-ext/compressobj.c new file mode 100644 index 00000000000..5e224475b45 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressobj.c @@ -0,0 +1,220 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void ZstdCompressionObj_dealloc(ZstdCompressionObj *self) { + PyMem_Free(self->output.dst); + self->output.dst = NULL; + + Py_XDECREF(self->compressor); + + PyObject_Del(self); +} + +static PyObject *ZstdCompressionObj_compress(ZstdCompressionObj *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"data", NULL}; + + Py_buffer source; + ZSTD_inBuffer input; + size_t zresult; + PyObject *result = NULL; + Py_ssize_t resultSize = 0; + + if (self->finished) { + PyErr_SetString(ZstdError, + "cannot call compress() after compressor finished"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:compress", kwlist, + &source)) { + return NULL; + } + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + while (input.pos < (size_t)source.len) { + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_compressStream2( + self->compressor->cctx, &self->output, &input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + Py_CLEAR(result); + goto finally; + } + + if (self->output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + + if (safe_pybytes_resize(&result, + resultSize + self->output.pos)) { + Py_CLEAR(result); + goto finally; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, self->output.dst, + self->output.pos); + } + else { + result = PyBytes_FromStringAndSize(self->output.dst, + self->output.pos); + if (!result) { + goto finally; + } + } + + self->output.pos = 0; + } + } + + if (NULL == result) { + result = PyBytes_FromString(""); + } + +finally: + PyBuffer_Release(&source); + + return result; +} + +static PyObject *ZstdCompressionObj_flush(ZstdCompressionObj *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"flush_mode", NULL}; + + int flushMode = compressorobj_flush_finish; + size_t zresult; + PyObject *result = NULL; + Py_ssize_t resultSize = 0; + ZSTD_inBuffer input; + ZSTD_EndDirective zFlushMode; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|i:flush", kwlist, + &flushMode)) { + return NULL; + } + + if (flushMode != compressorobj_flush_finish && + flushMode != compressorobj_flush_block) { + PyErr_SetString(PyExc_ValueError, "flush mode not recognized"); + return NULL; + } + + if (self->finished) { + PyErr_SetString(ZstdError, "compressor object already finished"); + return NULL; + } + + switch (flushMode) { + case compressorobj_flush_block: + zFlushMode = ZSTD_e_flush; + break; + + case compressorobj_flush_finish: + zFlushMode = ZSTD_e_end; + self->finished = 1; + break; + + default: + PyErr_SetString(ZstdError, "unhandled flush mode"); + return NULL; + } + + assert(self->output.pos == 0); + + input.src = NULL; + input.size = 0; + input.pos = 0; + + while (1) { + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_compressStream2( + self->compressor->cctx, &self->output, &input, zFlushMode); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + if (self->output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + + if (safe_pybytes_resize(&result, + resultSize + self->output.pos)) { + Py_XDECREF(result); + return NULL; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, self->output.dst, + self->output.pos); + } + else { + result = PyBytes_FromStringAndSize(self->output.dst, + self->output.pos); + if (!result) { + return NULL; + } + } + + self->output.pos = 0; + } + + if (!zresult) { + break; + } + } + + if (result) { + return result; + } + else { + return PyBytes_FromString(""); + } +} + +static PyMethodDef ZstdCompressionObj_methods[] = { + {"compress", (PyCFunction)ZstdCompressionObj_compress, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("compress data")}, + {"flush", (PyCFunction)ZstdCompressionObj_flush, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("finish compression operation")}, + {NULL, NULL}}; + +PyType_Slot ZstdCompressionObjSlots[] = { + {Py_tp_dealloc, ZstdCompressionObj_dealloc}, + {Py_tp_methods, ZstdCompressionObj_methods}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressionObjSpec = { + "zstd.ZstdCompressionObj", + sizeof(ZstdCompressionObj), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressionObjSlots, +}; + +PyTypeObject *ZstdCompressionObjType; + +void compressobj_module_init(PyObject *module) { + ZstdCompressionObjType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressionObjSpec); + if (PyType_Ready(ZstdCompressionObjType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py3/c-ext/compressor.c b/contrib/python/zstandard/py3/c-ext/compressor.c new file mode 100644 index 00000000000..bbef9fd79eb --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressor.c @@ -0,0 +1,1557 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +int setup_cctx(ZstdCompressor *compressor) { + size_t zresult; + + assert(compressor); + assert(compressor->cctx); + assert(compressor->params); + + zresult = ZSTD_CCtx_setParametersUsingCCtxParams(compressor->cctx, + compressor->params); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not set compression parameters: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + + if (compressor->dict) { + if (compressor->dict->cdict) { + zresult = + ZSTD_CCtx_refCDict(compressor->cctx, compressor->dict->cdict); + } + else { + zresult = ZSTD_CCtx_loadDictionary_advanced( + compressor->cctx, compressor->dict->dictData, + compressor->dict->dictSize, ZSTD_dlm_byRef, + compressor->dict->dictType); + } + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not load compression dictionary: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + } + + return 0; +} + +static PyObject *frame_progression(ZSTD_CCtx *cctx) { + PyObject *result = NULL; + PyObject *value; + ZSTD_frameProgression progression; + + result = PyTuple_New(3); + if (!result) { + return NULL; + } + + progression = ZSTD_getFrameProgression(cctx); + + value = PyLong_FromUnsignedLongLong(progression.ingested); + if (!value) { + Py_DECREF(result); + return NULL; + } + + PyTuple_SET_ITEM(result, 0, value); + + value = PyLong_FromUnsignedLongLong(progression.consumed); + if (!value) { + Py_DECREF(result); + return NULL; + } + + PyTuple_SET_ITEM(result, 1, value); + + value = PyLong_FromUnsignedLongLong(progression.produced); + if (!value) { + Py_DECREF(result); + return NULL; + } + + PyTuple_SET_ITEM(result, 2, value); + + return result; +} + +static int ZstdCompressor_init(ZstdCompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"level", + "dict_data", + "compression_params", + "write_checksum", + "write_content_size", + "write_dict_id", + "threads", + NULL}; + + int level = 3; + PyObject *dict = NULL; + PyObject *params = NULL; + PyObject *writeChecksum = NULL; + PyObject *writeContentSize = NULL; + PyObject *writeDictID = NULL; + int threads = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iOOOOOi:ZstdCompressor", + kwlist, &level, &dict, ¶ms, + &writeChecksum, &writeContentSize, + &writeDictID, &threads)) { + return -1; + } + + if (level > ZSTD_maxCLevel()) { + PyErr_Format(PyExc_ValueError, "level must be less than %d", + ZSTD_maxCLevel() + 1); + return -1; + } + + if (threads < 0) { + threads = cpu_count(); + } + + if (dict) { + if (dict == Py_None) { + dict = NULL; + } + else if (!PyObject_IsInstance(dict, + (PyObject *)ZstdCompressionDictType)) { + PyErr_Format(PyExc_TypeError, + "dict_data must be zstd.ZstdCompressionDict"); + return -1; + } + } + + if (params) { + if (params == Py_None) { + params = NULL; + } + else if (!PyObject_IsInstance( + params, (PyObject *)ZstdCompressionParametersType)) { + PyErr_Format( + PyExc_TypeError, + "compression_params must be zstd.ZstdCompressionParameters"); + return -1; + } + } + + if (writeChecksum == Py_None) { + writeChecksum = NULL; + } + if (writeContentSize == Py_None) { + writeContentSize = NULL; + } + if (writeDictID == Py_None) { + writeDictID = NULL; + } + + /* We create a ZSTD_CCtx for reuse among multiple operations to reduce the + overhead of each compression operation. */ + self->cctx = ZSTD_createCCtx(); + if (!self->cctx) { + PyErr_NoMemory(); + return -1; + } + + /* TODO stuff the original parameters away somewhere so we can reset later. + This will allow us to do things like automatically adjust cparams based + on input size (assuming zstd isn't doing that internally). */ + + self->params = ZSTD_createCCtxParams(); + if (!self->params) { + PyErr_NoMemory(); + return -1; + } + + if (params && writeChecksum) { + PyErr_SetString(PyExc_ValueError, + "cannot define compression_params and write_checksum"); + return -1; + } + + if (params && writeContentSize) { + PyErr_SetString( + PyExc_ValueError, + "cannot define compression_params and write_content_size"); + return -1; + } + + if (params && writeDictID) { + PyErr_SetString(PyExc_ValueError, + "cannot define compression_params and write_dict_id"); + return -1; + } + + if (params && threads) { + PyErr_SetString(PyExc_ValueError, + "cannot define compression_params and threads"); + return -1; + } + + if (params) { + if (set_parameters(self->params, + (ZstdCompressionParametersObject *)params)) { + return -1; + } + } + else { + if (set_parameter(self->params, ZSTD_c_compressionLevel, level)) { + return -1; + } + + if (set_parameter(self->params, ZSTD_c_contentSizeFlag, + writeContentSize ? PyObject_IsTrue(writeContentSize) + : 1)) { + return -1; + } + + if (set_parameter(self->params, ZSTD_c_checksumFlag, + writeChecksum ? PyObject_IsTrue(writeChecksum) : 0)) { + return -1; + } + + if (set_parameter(self->params, ZSTD_c_dictIDFlag, + writeDictID ? PyObject_IsTrue(writeDictID) : 1)) { + return -1; + } + + if (threads) { + if (set_parameter(self->params, ZSTD_c_nbWorkers, threads)) { + return -1; + } + } + } + + if (dict) { + self->dict = (ZstdCompressionDict *)dict; + Py_INCREF(dict); + } + + if (setup_cctx(self)) { + return -1; + } + + return 0; +} + +static void ZstdCompressor_dealloc(ZstdCompressor *self) { + if (self->cctx) { + ZSTD_freeCCtx(self->cctx); + self->cctx = NULL; + } + + if (self->params) { + ZSTD_freeCCtxParams(self->params); + self->params = NULL; + } + + Py_XDECREF(self->dict); + PyObject_Del(self); +} + +static PyObject *ZstdCompressor_memory_size(ZstdCompressor *self) { + if (self->cctx) { + return PyLong_FromSize_t(ZSTD_sizeof_CCtx(self->cctx)); + } + else { + PyErr_SetString( + ZstdError, "no compressor context found; this should never happen"); + return NULL; + } +} + +static PyObject *ZstdCompressor_frame_progression(ZstdCompressor *self) { + return frame_progression(self->cctx); +} + +static PyObject *ZstdCompressor_copy_stream(ZstdCompressor *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"ifh", "ofh", "size", + "read_size", "write_size", NULL}; + + PyObject *source; + PyObject *dest; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t inSize = ZSTD_CStreamInSize(); + size_t outSize = ZSTD_CStreamOutSize(); + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t totalRead = 0; + Py_ssize_t totalWrite = 0; + char *readBuffer; + Py_ssize_t readSize; + PyObject *readResult = NULL; + PyObject *res = NULL; + size_t zresult; + PyObject *writeResult; + PyObject *totalReadPy; + PyObject *totalWritePy; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|Kkk:copy_stream", kwlist, + &source, &dest, &sourceSize, &inSize, + &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(source, "read")) { + PyErr_SetString(PyExc_ValueError, + "first argument must have a read() method"); + return NULL; + } + + if (!PyObject_HasAttrString(dest, "write")) { + PyErr_SetString(PyExc_ValueError, + "second argument must have a write() method"); + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + /* Prevent free on uninitialized memory in finally. */ + output.dst = PyMem_Malloc(outSize); + if (!output.dst) { + PyErr_NoMemory(); + res = NULL; + goto finally; + } + output.size = outSize; + output.pos = 0; + + input.src = NULL; + input.size = 0; + input.pos = 0; + + while (1) { + /* Try to read from source stream. */ + readResult = PyObject_CallMethod(source, "read", "n", inSize); + if (!readResult) { + goto finally; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + + /* If no data was read, we're at EOF. */ + if (0 == readSize) { + break; + } + + totalRead += readSize; + + /* Send data to compressor */ + input.src = readBuffer; + input.size = readSize; + input.pos = 0; + + while (input.pos < input.size) { + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_compressStream2( + self->cctx, &output, &input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + res = NULL; + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + if (output.pos) { + writeResult = PyObject_CallMethod(dest, "write", "y#", + output.dst, output.pos); + if (NULL == writeResult) { + res = NULL; + goto finally; + } + Py_XDECREF(writeResult); + totalWrite += output.pos; + output.pos = 0; + } + } + + Py_CLEAR(readResult); + } + + /* We've finished reading. Now flush the compressor stream. */ + assert(input.pos == input.size); + + while (1) { + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_compressStream2(self->cctx, &output, &input, ZSTD_e_end); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + res = NULL; + goto finally; + } + + if (output.pos) { + writeResult = PyObject_CallMethod(dest, "write", "y#", output.dst, + output.pos); + if (NULL == writeResult) { + res = NULL; + goto finally; + } + totalWrite += output.pos; + Py_XDECREF(writeResult); + output.pos = 0; + } + + if (!zresult) { + break; + } + } + + totalReadPy = PyLong_FromSsize_t(totalRead); + totalWritePy = PyLong_FromSsize_t(totalWrite); + res = PyTuple_Pack(2, totalReadPy, totalWritePy); + Py_DECREF(totalReadPy); + Py_DECREF(totalWritePy); + +finally: + if (output.dst) { + PyMem_Free(output.dst); + } + + Py_XDECREF(readResult); + + return res; +} + +static ZstdCompressionReader *ZstdCompressor_stream_reader(ZstdCompressor *self, + PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"source", "size", "read_size", "closefd", NULL}; + + PyObject *source; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t readSize = ZSTD_CStreamInSize(); + PyObject *closefd = NULL; + ZstdCompressionReader *result = NULL; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|KkO:stream_reader", + kwlist, &source, &sourceSize, &readSize, + &closefd)) { + return NULL; + } + + result = (ZstdCompressionReader *)PyObject_CallObject( + (PyObject *)ZstdCompressionReaderType, NULL); + if (!result) { + return NULL; + } + + result->entered = 0; + result->closed = 0; + + if (PyObject_HasAttrString(source, "read")) { + result->reader = source; + Py_INCREF(source); + result->readSize = readSize; + } + else if (1 == PyObject_CheckBuffer(source)) { + if (0 != PyObject_GetBuffer(source, &result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + + assert(result->buffer.len >= 0); + + sourceSize = result->buffer.len; + } + else { + PyErr_SetString(PyExc_TypeError, + "must pass an object with a read() method or that " + "conforms to the buffer protocol"); + goto except; + } + + result->closefd = closefd ? PyObject_IsTrue(closefd) : 1; + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source source: %s", + ZSTD_getErrorName(zresult)); + goto except; + } + + result->compressor = self; + Py_INCREF(self); + + return result; + +except: + Py_CLEAR(result); + + return NULL; +} + +static PyObject *ZstdCompressor_compress(ZstdCompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"data", NULL}; + + Py_buffer source; + size_t destSize; + PyObject *output = NULL; + size_t zresult; + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|O:compress", kwlist, + &source)) { + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + destSize = ZSTD_compressBound(source.len); + output = PyBytes_FromStringAndSize(NULL, destSize); + if (!output) { + goto finally; + } + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, source.len); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + Py_CLEAR(output); + goto finally; + } + + inBuffer.src = source.buf; + inBuffer.size = source.len; + inBuffer.pos = 0; + + outBuffer.dst = PyBytes_AsString(output); + outBuffer.size = destSize; + outBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS + /* By avoiding ZSTD_compress(), we don't necessarily write out content + size. This means the argument to ZstdCompressor to control frame + parameters is honored. */ + zresult = + ZSTD_compressStream2(self->cctx, &outBuffer, &inBuffer, ZSTD_e_end); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "cannot compress: %s", + ZSTD_getErrorName(zresult)); + Py_CLEAR(output); + goto finally; + } + else if (zresult) { + PyErr_SetString(ZstdError, "unexpected partial frame flush"); + Py_CLEAR(output); + goto finally; + } + + Py_SET_SIZE(output, outBuffer.pos); + +finally: + PyBuffer_Release(&source); + return output; +} + +static ZstdCompressionObj *ZstdCompressor_compressobj(ZstdCompressor *self, + PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"size", NULL}; + + unsigned long long inSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t outSize = ZSTD_CStreamOutSize(); + ZstdCompressionObj *result = NULL; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|K:compressobj", kwlist, + &inSize)) { + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, inSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + result = (ZstdCompressionObj *)PyObject_CallObject( + (PyObject *)ZstdCompressionObjType, NULL); + if (!result) { + return NULL; + } + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + PyErr_NoMemory(); + Py_DECREF(result); + return NULL; + } + result->output.size = outSize; + result->compressor = self; + Py_INCREF(result->compressor); + + return result; +} + +static ZstdCompressorIterator *ZstdCompressor_read_to_iter(ZstdCompressor *self, + PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"reader", "size", "read_size", "write_size", NULL}; + + PyObject *reader; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t inSize = ZSTD_CStreamInSize(); + size_t outSize = ZSTD_CStreamOutSize(); + ZstdCompressorIterator *result; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|Kkk:read_to_iter", kwlist, + &reader, &sourceSize, &inSize, &outSize)) { + return NULL; + } + + result = (ZstdCompressorIterator *)PyObject_CallObject( + (PyObject *)ZstdCompressorIteratorType, NULL); + if (!result) { + return NULL; + } + if (PyObject_HasAttrString(reader, "read")) { + result->reader = reader; + Py_INCREF(result->reader); + } + else if (1 == PyObject_CheckBuffer(reader)) { + if (0 != PyObject_GetBuffer(reader, &result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + + sourceSize = result->buffer.len; + } + else { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a read() method or conforms " + "to buffer protocol"); + goto except; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + result->compressor = self; + Py_INCREF(result->compressor); + + result->inSize = inSize; + result->outSize = outSize; + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + PyErr_NoMemory(); + goto except; + } + result->output.size = outSize; + + goto finally; + +except: + Py_CLEAR(result); + +finally: + return result; +} + +static ZstdCompressionWriter *ZstdCompressor_stream_writer(ZstdCompressor *self, + PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = { + "writer", "size", "write_size", "write_return_read", "closefd", NULL}; + + PyObject *writer; + ZstdCompressionWriter *result; + size_t zresult; + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t outSize = ZSTD_CStreamOutSize(); + PyObject *writeReturnRead = NULL; + PyObject *closefd = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|KkOO:stream_writer", + kwlist, &writer, &sourceSize, &outSize, + &writeReturnRead, &closefd)) { + return NULL; + } + + if (!PyObject_HasAttrString(writer, "write")) { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a write() method"); + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + result = (ZstdCompressionWriter *)PyObject_CallObject( + (PyObject *)ZstdCompressionWriterType, NULL); + if (!result) { + return NULL; + } + + result->entered = 0; + result->closing = 0; + result->closed = 0; + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + Py_DECREF(result); + return (ZstdCompressionWriter *)PyErr_NoMemory(); + } + + result->output.pos = 0; + result->output.size = outSize; + + result->compressor = self; + Py_INCREF(result->compressor); + + result->writer = writer; + Py_INCREF(result->writer); + + result->outSize = outSize; + result->bytesCompressed = 0; + result->writeReturnRead = + writeReturnRead ? PyObject_IsTrue(writeReturnRead) : 1; + result->closefd = closefd ? PyObject_IsTrue(closefd) : 1; + + return result; +} + +static ZstdCompressionChunker * +ZstdCompressor_chunker(ZstdCompressor *self, PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"size", "chunk_size", NULL}; + + unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN; + size_t chunkSize = ZSTD_CStreamOutSize(); + ZstdCompressionChunker *chunker; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|Kk:chunker", kwlist, + &sourceSize, &chunkSize)) { + return NULL; + } + + ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only); + + zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error setting source size: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + chunker = (ZstdCompressionChunker *)PyObject_CallObject( + (PyObject *)ZstdCompressionChunkerType, NULL); + if (!chunker) { + return NULL; + } + + chunker->output.dst = PyMem_Malloc(chunkSize); + if (!chunker->output.dst) { + PyErr_NoMemory(); + Py_DECREF(chunker); + return NULL; + } + chunker->output.size = chunkSize; + chunker->output.pos = 0; + + chunker->compressor = self; + Py_INCREF(chunker->compressor); + + chunker->chunkSize = chunkSize; + + return chunker; +} + +typedef struct { + void *sourceData; + size_t sourceSize; +} DataSource; + +typedef struct { + DataSource *sources; + Py_ssize_t sourcesSize; + unsigned long long totalSourceSize; +} DataSources; + +typedef struct { + void *dest; + Py_ssize_t destSize; + BufferSegment *segments; + Py_ssize_t segmentsSize; +} CompressorDestBuffer; + +typedef enum { + CompressorWorkerError_none = 0, + CompressorWorkerError_zstd = 1, + CompressorWorkerError_no_memory = 2, + CompressorWorkerError_nospace = 3, +} CompressorWorkerError; + +/** + * Holds state for an individual worker performing multi_compress_to_buffer + * work. + */ +typedef struct { + /* Used for compression. */ + ZSTD_CCtx *cctx; + + /* What to compress. */ + DataSource *sources; + Py_ssize_t sourcesSize; + Py_ssize_t startOffset; + Py_ssize_t endOffset; + unsigned long long totalSourceSize; + + /* Result storage. */ + CompressorDestBuffer *destBuffers; + Py_ssize_t destCount; + + /* Error tracking. */ + CompressorWorkerError error; + size_t zresult; + Py_ssize_t errorOffset; +} CompressorWorkerState; + +#ifdef HAVE_ZSTD_POOL_APIS +static void compress_worker(CompressorWorkerState *state) { + Py_ssize_t inputOffset = state->startOffset; + Py_ssize_t remainingItems = state->endOffset - state->startOffset + 1; + Py_ssize_t currentBufferStartOffset = state->startOffset; + size_t zresult; + void *newDest; + size_t allocationSize; + size_t boundSize; + Py_ssize_t destOffset = 0; + DataSource *sources = state->sources; + CompressorDestBuffer *destBuffer; + + assert(!state->destBuffers); + assert(0 == state->destCount); + + /* + * The total size of the compressed data is unknown until we actually + * compress data. That means we can't pre-allocate the exact size we need. + * + * There is a cost to every allocation and reallocation. So, it is in our + * interest to minimize the number of allocations. + * + * There is also a cost to too few allocations. If allocations are too + * large they may fail. If buffers are shared and all inputs become + * irrelevant at different lifetimes, then a reference to one segment + * in the buffer will keep the entire buffer alive. This leads to excessive + * memory usage. + * + * Our current strategy is to assume a compression ratio of 16:1 and + * allocate buffers of that size, rounded up to the nearest power of 2 + * (because computers like round numbers). That ratio is greater than what + * most inputs achieve. This is by design: we don't want to over-allocate. + * But we don't want to under-allocate and lead to too many buffers either. + */ + + state->destCount = 1; + + state->destBuffers = calloc(1, sizeof(CompressorDestBuffer)); + if (NULL == state->destBuffers) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer = &state->destBuffers[state->destCount - 1]; + + /* + * Rather than track bounds and grow the segments buffer, allocate space + * to hold remaining items then truncate when we're done with it. + */ + destBuffer->segments = calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + + assert(state->totalSourceSize <= SIZE_MAX); + allocationSize = roundpow2((size_t)state->totalSourceSize >> 4); + + /* If the maximum size of the output is larger than that, round up. */ + boundSize = ZSTD_compressBound(sources[inputOffset].sourceSize); + + if (boundSize > allocationSize) { + allocationSize = roundpow2(boundSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer->destSize = allocationSize; + + for (inputOffset = state->startOffset; inputOffset <= state->endOffset; + inputOffset++) { + void *source = sources[inputOffset].sourceData; + size_t sourceSize = sources[inputOffset].sourceSize; + size_t destAvailable; + void *dest; + ZSTD_outBuffer opOutBuffer; + ZSTD_inBuffer opInBuffer; + + destAvailable = destBuffer->destSize - destOffset; + boundSize = ZSTD_compressBound(sourceSize); + + /* + * Not enough space in current buffer to hold largest compressed output. + * So allocate and switch to a new output buffer. + */ + if (boundSize > destAvailable) { + /* + * The downsizing of the existing buffer is optional. It should be + * cheap (unlike growing). So we just do it. + */ + if (destAvailable) { + newDest = realloc(destBuffer->dest, destOffset); + if (NULL == newDest) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer->dest = newDest; + destBuffer->destSize = destOffset; + } + + /* Truncate segments buffer. */ + newDest = realloc(destBuffer->segments, + (inputOffset - currentBufferStartOffset + 1) * + sizeof(BufferSegment)); + if (NULL == newDest) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer->segments = newDest; + destBuffer->segmentsSize = inputOffset - currentBufferStartOffset; + + /* Grow space for new struct. */ + /* TODO consider over-allocating so we don't do this every time. */ + newDest = + realloc(state->destBuffers, + (state->destCount + 1) * sizeof(CompressorDestBuffer)); + if (NULL == newDest) { + state->error = CompressorWorkerError_no_memory; + return; + } + + state->destBuffers = newDest; + state->destCount++; + + destBuffer = &state->destBuffers[state->destCount - 1]; + + /* Don't take any chances with non-NULL pointers. */ + memset(destBuffer, 0, sizeof(CompressorDestBuffer)); + + /** + * We could dynamically update allocation size based on work done so + * far. For now, keep is simple. + */ + assert(state->totalSourceSize <= SIZE_MAX); + allocationSize = roundpow2((size_t)state->totalSourceSize >> 4); + + if (boundSize > allocationSize) { + allocationSize = roundpow2(boundSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer->destSize = allocationSize; + destAvailable = allocationSize; + destOffset = 0; + + destBuffer->segments = + calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + currentBufferStartOffset = inputOffset; + } + + dest = (char *)destBuffer->dest + destOffset; + + opInBuffer.src = source; + opInBuffer.size = sourceSize; + opInBuffer.pos = 0; + + opOutBuffer.dst = dest; + opOutBuffer.size = destAvailable; + opOutBuffer.pos = 0; + + zresult = ZSTD_CCtx_setPledgedSrcSize(state->cctx, sourceSize); + if (ZSTD_isError(zresult)) { + state->error = CompressorWorkerError_zstd; + state->zresult = zresult; + state->errorOffset = inputOffset; + break; + } + + zresult = ZSTD_compressStream2(state->cctx, &opOutBuffer, &opInBuffer, + ZSTD_e_end); + if (ZSTD_isError(zresult)) { + state->error = CompressorWorkerError_zstd; + state->zresult = zresult; + state->errorOffset = inputOffset; + break; + } + else if (zresult) { + state->error = CompressorWorkerError_nospace; + state->errorOffset = inputOffset; + break; + } + + destBuffer->segments[inputOffset - currentBufferStartOffset].offset = + destOffset; + destBuffer->segments[inputOffset - currentBufferStartOffset].length = + opOutBuffer.pos; + + destOffset += opOutBuffer.pos; + remainingItems--; + } + + if (destBuffer->destSize > destOffset) { + newDest = realloc(destBuffer->dest, destOffset); + if (NULL == newDest) { + state->error = CompressorWorkerError_no_memory; + return; + } + + destBuffer->dest = newDest; + destBuffer->destSize = destOffset; + } +} +#endif + +/* We can only use the pool.h APIs if we provide the full library, + as these are private APIs. */ +#ifdef HAVE_ZSTD_POOL_APIS + +ZstdBufferWithSegmentsCollection * +compress_from_datasources(ZstdCompressor *compressor, DataSources *sources, + Py_ssize_t threadCount) { + unsigned long long bytesPerWorker; + POOL_ctx *pool = NULL; + CompressorWorkerState *workerStates = NULL; + Py_ssize_t i; + unsigned long long workerBytes = 0; + Py_ssize_t workerStartOffset = 0; + Py_ssize_t currentThread = 0; + int errored = 0; + Py_ssize_t segmentsCount = 0; + Py_ssize_t segmentIndex; + PyObject *segmentsArg = NULL; + ZstdBufferWithSegments *buffer; + ZstdBufferWithSegmentsCollection *result = NULL; + + assert(sources->sourcesSize > 0); + assert(sources->totalSourceSize > 0); + assert(threadCount >= 1); + + /* More threads than inputs makes no sense. */ + threadCount = + sources->sourcesSize < threadCount ? sources->sourcesSize : threadCount; + + /* TODO lower thread count when input size is too small and threads would + add overhead. */ + + workerStates = PyMem_Malloc(threadCount * sizeof(CompressorWorkerState)); + if (NULL == workerStates) { + PyErr_NoMemory(); + goto finally; + } + + memset(workerStates, 0, threadCount * sizeof(CompressorWorkerState)); + + if (threadCount > 1) { + pool = POOL_create(threadCount, 1); + if (NULL == pool) { + PyErr_SetString(ZstdError, "could not initialize zstd thread pool"); + goto finally; + } + } + + bytesPerWorker = sources->totalSourceSize / threadCount; + + for (i = 0; i < threadCount; i++) { + size_t zresult; + + workerStates[i].cctx = ZSTD_createCCtx(); + if (!workerStates[i].cctx) { + PyErr_NoMemory(); + goto finally; + } + + zresult = ZSTD_CCtx_setParametersUsingCCtxParams(workerStates[i].cctx, + compressor->params); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not set compression parameters: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + if (compressor->dict) { + if (compressor->dict->cdict) { + zresult = ZSTD_CCtx_refCDict(workerStates[i].cctx, + compressor->dict->cdict); + } + else { + zresult = ZSTD_CCtx_loadDictionary_advanced( + workerStates[i].cctx, compressor->dict->dictData, + compressor->dict->dictSize, ZSTD_dlm_byRef, + compressor->dict->dictType); + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, + "could not load compression dictionary: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + } + + workerStates[i].sources = sources->sources; + workerStates[i].sourcesSize = sources->sourcesSize; + } + + Py_BEGIN_ALLOW_THREADS for (i = 0; i < sources->sourcesSize; i++) { + workerBytes += sources->sources[i].sourceSize; + + /* + * The last worker/thread needs to handle all remaining work. Don't + * trigger it prematurely. Defer to the block outside of the loop + * to run the last worker/thread. But do still process this loop + * so workerBytes is correct. + */ + if (currentThread == threadCount - 1) { + continue; + } + + if (workerBytes >= bytesPerWorker) { + assert(currentThread < threadCount); + workerStates[currentThread].totalSourceSize = workerBytes; + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = i; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)compress_worker, + &workerStates[currentThread]); + } + else { + compress_worker(&workerStates[currentThread]); + } + + currentThread++; + workerStartOffset = i + 1; + workerBytes = 0; + } + } + + if (workerBytes) { + assert(currentThread < threadCount); + workerStates[currentThread].totalSourceSize = workerBytes; + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = sources->sourcesSize - 1; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)compress_worker, + &workerStates[currentThread]); + } + else { + compress_worker(&workerStates[currentThread]); + } + } + + if (threadCount > 1) { + POOL_free(pool); + pool = NULL; + } + + Py_END_ALLOW_THREADS + + for (i = 0; i < threadCount; i++) { + switch (workerStates[i].error) { + case CompressorWorkerError_no_memory: + PyErr_NoMemory(); + errored = 1; + break; + + case CompressorWorkerError_zstd: + PyErr_Format(ZstdError, "error compressing item %zd: %s", + workerStates[i].errorOffset, + ZSTD_getErrorName(workerStates[i].zresult)); + errored = 1; + break; + + case CompressorWorkerError_nospace: + PyErr_Format( + ZstdError, + "error compressing item %zd: not enough space in output", + workerStates[i].errorOffset); + errored = 1; + break; + + default:; + } + + if (errored) { + break; + } + } + + if (errored) { + goto finally; + } + + segmentsCount = 0; + for (i = 0; i < threadCount; i++) { + CompressorWorkerState *state = &workerStates[i]; + segmentsCount += state->destCount; + } + + segmentsArg = PyTuple_New(segmentsCount); + if (NULL == segmentsArg) { + goto finally; + } + + segmentIndex = 0; + + for (i = 0; i < threadCount; i++) { + Py_ssize_t j; + CompressorWorkerState *state = &workerStates[i]; + + for (j = 0; j < state->destCount; j++) { + CompressorDestBuffer *destBuffer = &state->destBuffers[j]; + buffer = BufferWithSegments_FromMemory( + destBuffer->dest, destBuffer->destSize, destBuffer->segments, + destBuffer->segmentsSize); + + if (NULL == buffer) { + goto finally; + } + + /* Tell instance to use free() instsead of PyMem_Free(). */ + buffer->useFree = 1; + + /* + * BufferWithSegments_FromMemory takes ownership of the backing + * memory. Unset it here so it doesn't get freed below. + */ + destBuffer->dest = NULL; + destBuffer->segments = NULL; + + PyTuple_SET_ITEM(segmentsArg, segmentIndex++, (PyObject *)buffer); + } + } + + result = (ZstdBufferWithSegmentsCollection *)PyObject_CallObject( + (PyObject *)ZstdBufferWithSegmentsCollectionType, segmentsArg); + +finally: + Py_CLEAR(segmentsArg); + + if (pool) { + POOL_free(pool); + } + + if (workerStates) { + Py_ssize_t j; + + for (i = 0; i < threadCount; i++) { + CompressorWorkerState state = workerStates[i]; + + if (state.cctx) { + ZSTD_freeCCtx(state.cctx); + } + + /* malloc() is used in worker thread. */ + + for (j = 0; j < state.destCount; j++) { + if (state.destBuffers) { + free(state.destBuffers[j].dest); + free(state.destBuffers[j].segments); + } + } + + free(state.destBuffers); + } + + PyMem_Free(workerStates); + } + + return result; +} +#endif + +#ifdef HAVE_ZSTD_POOL_APIS +static ZstdBufferWithSegmentsCollection * +ZstdCompressor_multi_compress_to_buffer(ZstdCompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"data", "threads", NULL}; + + PyObject *data; + int threads = 0; + Py_buffer *dataBuffers = NULL; + DataSources sources; + Py_ssize_t i; + Py_ssize_t sourceCount = 0; + ZstdBufferWithSegmentsCollection *result = NULL; + + memset(&sources, 0, sizeof(sources)); + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, + "O|i:multi_compress_to_buffer", kwlist, + &data, &threads)) { + return NULL; + } + + if (threads < 0) { + threads = cpu_count(); + } + + if (threads < 2) { + threads = 1; + } + + if (PyObject_TypeCheck(data, ZstdBufferWithSegmentsType)) { + ZstdBufferWithSegments *buffer = (ZstdBufferWithSegments *)data; + + sources.sources = + PyMem_Malloc(buffer->segmentCount * sizeof(DataSource)); + if (NULL == sources.sources) { + PyErr_NoMemory(); + goto finally; + } + + for (i = 0; i < buffer->segmentCount; i++) { + if (buffer->segments[i].length > SIZE_MAX) { + PyErr_Format( + PyExc_ValueError, + "buffer segment %zd is too large for this platform", i); + goto finally; + } + + sources.sources[i].sourceData = + (char *)buffer->data + buffer->segments[i].offset; + sources.sources[i].sourceSize = (size_t)buffer->segments[i].length; + sources.totalSourceSize += buffer->segments[i].length; + } + + sources.sourcesSize = buffer->segmentCount; + } + else if (PyObject_TypeCheck(data, ZstdBufferWithSegmentsCollectionType)) { + Py_ssize_t j; + Py_ssize_t offset = 0; + ZstdBufferWithSegments *buffer; + ZstdBufferWithSegmentsCollection *collection = + (ZstdBufferWithSegmentsCollection *)data; + + sourceCount = BufferWithSegmentsCollection_length(collection); + + sources.sources = PyMem_Malloc(sourceCount * sizeof(DataSource)); + if (NULL == sources.sources) { + PyErr_NoMemory(); + goto finally; + } + + for (i = 0; i < collection->bufferCount; i++) { + buffer = collection->buffers[i]; + + for (j = 0; j < buffer->segmentCount; j++) { + if (buffer->segments[j].length > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "buffer segment %zd in buffer %zd is too " + "large for this platform", + j, i); + goto finally; + } + + sources.sources[offset].sourceData = + (char *)buffer->data + buffer->segments[j].offset; + sources.sources[offset].sourceSize = + (size_t)buffer->segments[j].length; + sources.totalSourceSize += buffer->segments[j].length; + + offset++; + } + } + + sources.sourcesSize = sourceCount; + } + else if (PyList_Check(data)) { + sourceCount = PyList_GET_SIZE(data); + + sources.sources = PyMem_Malloc(sourceCount * sizeof(DataSource)); + if (NULL == sources.sources) { + PyErr_NoMemory(); + goto finally; + } + + dataBuffers = PyMem_Malloc(sourceCount * sizeof(Py_buffer)); + if (NULL == dataBuffers) { + PyErr_NoMemory(); + goto finally; + } + + memset(dataBuffers, 0, sourceCount * sizeof(Py_buffer)); + + for (i = 0; i < sourceCount; i++) { + if (0 != PyObject_GetBuffer(PyList_GET_ITEM(data, i), + &dataBuffers[i], PyBUF_CONTIG_RO)) { + PyErr_Clear(); + PyErr_Format(PyExc_TypeError, + "item %zd not a bytes like object", i); + goto finally; + } + + sources.sources[i].sourceData = dataBuffers[i].buf; + sources.sources[i].sourceSize = dataBuffers[i].len; + sources.totalSourceSize += dataBuffers[i].len; + } + + sources.sourcesSize = sourceCount; + } + else { + PyErr_SetString(PyExc_TypeError, + "argument must be list of BufferWithSegments"); + goto finally; + } + + if (0 == sources.sourcesSize) { + PyErr_SetString(PyExc_ValueError, "no source elements found"); + goto finally; + } + + if (0 == sources.totalSourceSize) { + PyErr_SetString(PyExc_ValueError, "source elements are empty"); + goto finally; + } + + if (sources.totalSourceSize > SIZE_MAX) { + PyErr_SetString(PyExc_ValueError, + "sources are too large for this platform"); + goto finally; + } + + result = compress_from_datasources(self, &sources, threads); + +finally: + PyMem_Free(sources.sources); + + if (dataBuffers) { + for (i = 0; i < sourceCount; i++) { + PyBuffer_Release(&dataBuffers[i]); + } + + PyMem_Free(dataBuffers); + } + + return result; +} +#endif + +static PyMethodDef ZstdCompressor_methods[] = { + {"chunker", (PyCFunction)ZstdCompressor_chunker, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"compress", (PyCFunction)ZstdCompressor_compress, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"compressobj", (PyCFunction)ZstdCompressor_compressobj, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"copy_stream", (PyCFunction)ZstdCompressor_copy_stream, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"stream_reader", (PyCFunction)ZstdCompressor_stream_reader, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"stream_writer", (PyCFunction)ZstdCompressor_stream_writer, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"read_to_iter", (PyCFunction)ZstdCompressor_read_to_iter, + METH_VARARGS | METH_KEYWORDS, NULL}, +#ifdef HAVE_ZSTD_POOL_APIS + {"multi_compress_to_buffer", + (PyCFunction)ZstdCompressor_multi_compress_to_buffer, + METH_VARARGS | METH_KEYWORDS, NULL}, +#endif + {"memory_size", (PyCFunction)ZstdCompressor_memory_size, METH_NOARGS, NULL}, + {"frame_progression", (PyCFunction)ZstdCompressor_frame_progression, + METH_NOARGS, NULL}, + {NULL, NULL}}; + +PyType_Slot ZstdCompressorSlots[] = { + {Py_tp_dealloc, ZstdCompressor_dealloc}, + {Py_tp_methods, ZstdCompressor_methods}, + {Py_tp_init, ZstdCompressor_init}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressorSpec = { + "zstd.ZstdCompressor", + sizeof(ZstdCompressor), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressorSlots, +}; + +PyTypeObject *ZstdCompressorType; + +void compressor_module_init(PyObject *mod) { + ZstdCompressorType = (PyTypeObject *)PyType_FromSpec(&ZstdCompressorSpec); + if (PyType_Ready(ZstdCompressorType) < 0) { + return; + } + + Py_INCREF((PyObject *)ZstdCompressorType); + PyModule_AddObject(mod, "ZstdCompressor", (PyObject *)ZstdCompressorType); +} diff --git a/contrib/python/zstandard/py3/c-ext/compressoriterator.c b/contrib/python/zstandard/py3/c-ext/compressoriterator.c new file mode 100644 index 00000000000..9e4936caa8b --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/compressoriterator.c @@ -0,0 +1,212 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +#define min(a, b) (((a) < (b)) ? (a) : (b)) + +extern PyObject *ZstdError; + +static void ZstdCompressorIterator_dealloc(ZstdCompressorIterator *self) { + Py_XDECREF(self->readResult); + Py_XDECREF(self->compressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + if (self->output.dst) { + PyMem_Free(self->output.dst); + self->output.dst = NULL; + } + + PyObject_Del(self); +} + +static PyObject *ZstdCompressorIterator_iter(PyObject *self) { + Py_INCREF(self); + return self; +} + +static PyObject *ZstdCompressorIterator_iternext(ZstdCompressorIterator *self) { + size_t zresult; + PyObject *readResult = NULL; + PyObject *chunk; + char *readBuffer; + Py_ssize_t readSize = 0; + Py_ssize_t bufferRemaining; + + if (self->finishedOutput) { + PyErr_SetString(PyExc_StopIteration, "output flushed"); + return NULL; + } + +feedcompressor: + + /* If we have data left in the input, consume it. */ + if (self->input.pos < self->input.size) { + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_compressStream2(self->compressor->cctx, &self->output, + &self->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + /* Release the Python object holding the input buffer. */ + if (self->input.pos == self->input.size) { + self->input.src = NULL; + self->input.pos = 0; + self->input.size = 0; + Py_DECREF(self->readResult); + self->readResult = NULL; + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + /* If it produced output data, emit it. */ + if (self->output.pos) { + chunk = + PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; + } + } + + /* We should never have output data sitting around after a previous call. */ + assert(self->output.pos == 0); + + /* The code above should have either emitted a chunk and returned or + consumed the entire input buffer. So the state of the input buffer is not + relevant. */ + if (!self->finishedInput) { + if (self->reader) { + readResult = + PyObject_CallMethod(self->reader, "read", "I", self->inSize); + if (!readResult) { + return NULL; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + } + else { + assert(self->buffer.buf); + + /* Only support contiguous C arrays. */ + assert(self->buffer.strides == NULL && + self->buffer.suboffsets == NULL); + assert(self->buffer.itemsize == 1); + + readBuffer = (char *)self->buffer.buf + self->bufferOffset; + bufferRemaining = self->buffer.len - self->bufferOffset; + readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); + self->bufferOffset += readSize; + } + + if (0 == readSize) { + Py_XDECREF(readResult); + self->finishedInput = 1; + } + else { + self->readResult = readResult; + } + } + + /* EOF */ + if (0 == readSize) { + self->input.src = NULL; + self->input.size = 0; + self->input.pos = 0; + + zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, + &self->input, ZSTD_e_end); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + assert(self->output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; + } + + /* New data from reader. Feed into compressor. */ + self->input.src = readBuffer; + self->input.size = readSize; + self->input.pos = 0; + + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_compressStream2( + self->compressor->cctx, &self->output, &self->input, ZSTD_e_continue); + Py_END_ALLOW_THREADS + + /* The input buffer currently points to memory managed by Python + (readBuffer). This object was allocated by this function. If it wasn't + fully consumed, we need to release it in a subsequent function call. + If it is fully consumed, do that now. + */ + if (self->input.pos == self->input.size) { + self->input.src = NULL; + self->input.pos = 0; + self->input.size = 0; + Py_XDECREF(self->readResult); + self->readResult = NULL; + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + assert(self->input.pos <= self->input.size); + + /* If we didn't write anything, start the process over. */ + if (0 == self->output.pos) { + goto feedcompressor; + } + + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; +} + +PyType_Slot ZstdCompressorIteratorSlots[] = { + {Py_tp_dealloc, ZstdCompressorIterator_dealloc}, + {Py_tp_iter, ZstdCompressorIterator_iter}, + {Py_tp_iternext, ZstdCompressorIterator_iternext}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdCompressorIteratorSpec = { + "zstd.ZstdCompressorIterator", + sizeof(ZstdCompressorIterator), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdCompressorIteratorSlots, +}; + +PyTypeObject *ZstdCompressorIteratorType; + +void compressoriterator_module_init(PyObject *mod) { + ZstdCompressorIteratorType = + (PyTypeObject *)PyType_FromSpec(&ZstdCompressorIteratorSpec); + if (PyType_Ready(ZstdCompressorIteratorType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py3/c-ext/constants.c b/contrib/python/zstandard/py3/c-ext/constants.c new file mode 100644 index 00000000000..e209ecf9a84 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/constants.c @@ -0,0 +1,109 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static char frame_header[] = { + '\x28', + '\xb5', + '\x2f', + '\xfd', +}; + +void constants_module_init(PyObject *mod) { + PyObject *version; + PyObject *zstdVersion; + PyObject *frameHeader; + + version = PyUnicode_FromString(PYTHON_ZSTANDARD_VERSION); + PyModule_AddObject(mod, "__version__", version); + + ZstdError = PyErr_NewException("zstd.ZstdError", NULL, NULL); + PyModule_AddObject(mod, "ZstdError", ZstdError); + + PyModule_AddIntConstant(mod, "FLUSH_BLOCK", 0); + PyModule_AddIntConstant(mod, "FLUSH_FRAME", 1); + + PyModule_AddIntConstant(mod, "COMPRESSOBJ_FLUSH_FINISH", + compressorobj_flush_finish); + PyModule_AddIntConstant(mod, "COMPRESSOBJ_FLUSH_BLOCK", + compressorobj_flush_block); + + /* For now, the version is a simple tuple instead of a dedicated type. */ + zstdVersion = PyTuple_New(3); + PyTuple_SetItem(zstdVersion, 0, PyLong_FromLong(ZSTD_VERSION_MAJOR)); + PyTuple_SetItem(zstdVersion, 1, PyLong_FromLong(ZSTD_VERSION_MINOR)); + PyTuple_SetItem(zstdVersion, 2, PyLong_FromLong(ZSTD_VERSION_RELEASE)); + PyModule_AddObject(mod, "ZSTD_VERSION", zstdVersion); + + frameHeader = PyBytes_FromStringAndSize(frame_header, sizeof(frame_header)); + if (frameHeader) { + PyModule_AddObject(mod, "FRAME_HEADER", frameHeader); + } + else { + PyErr_Format(PyExc_ValueError, "could not create frame header object"); + } + + PyModule_AddObject(mod, "CONTENTSIZE_UNKNOWN", + PyLong_FromUnsignedLongLong(ZSTD_CONTENTSIZE_UNKNOWN)); + PyModule_AddObject(mod, "CONTENTSIZE_ERROR", + PyLong_FromUnsignedLongLong(ZSTD_CONTENTSIZE_ERROR)); + + PyModule_AddIntConstant(mod, "MAX_COMPRESSION_LEVEL", ZSTD_maxCLevel()); + PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_INPUT_SIZE", + (long)ZSTD_CStreamInSize()); + PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_OUTPUT_SIZE", + (long)ZSTD_CStreamOutSize()); + PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_INPUT_SIZE", + (long)ZSTD_DStreamInSize()); + PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE", + (long)ZSTD_DStreamOutSize()); + + PyModule_AddIntConstant(mod, "MAGIC_NUMBER", ZSTD_MAGICNUMBER); + PyModule_AddIntConstant(mod, "BLOCKSIZELOG_MAX", ZSTD_BLOCKSIZELOG_MAX); + PyModule_AddIntConstant(mod, "BLOCKSIZE_MAX", ZSTD_BLOCKSIZE_MAX); + PyModule_AddIntConstant(mod, "WINDOWLOG_MIN", ZSTD_WINDOWLOG_MIN); + PyModule_AddIntConstant(mod, "WINDOWLOG_MAX", ZSTD_WINDOWLOG_MAX); + PyModule_AddIntConstant(mod, "CHAINLOG_MIN", ZSTD_CHAINLOG_MIN); + PyModule_AddIntConstant(mod, "CHAINLOG_MAX", ZSTD_CHAINLOG_MAX); + PyModule_AddIntConstant(mod, "HASHLOG_MIN", ZSTD_HASHLOG_MIN); + PyModule_AddIntConstant(mod, "HASHLOG_MAX", ZSTD_HASHLOG_MAX); + PyModule_AddIntConstant(mod, "SEARCHLOG_MIN", ZSTD_SEARCHLOG_MIN); + PyModule_AddIntConstant(mod, "SEARCHLOG_MAX", ZSTD_SEARCHLOG_MAX); + PyModule_AddIntConstant(mod, "MINMATCH_MIN", ZSTD_MINMATCH_MIN); + PyModule_AddIntConstant(mod, "MINMATCH_MAX", ZSTD_MINMATCH_MAX); + /* TODO SEARCHLENGTH_* is deprecated. */ + PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_MINMATCH_MIN); + PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_MINMATCH_MAX); + PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN); + PyModule_AddIntConstant(mod, "TARGETLENGTH_MAX", ZSTD_TARGETLENGTH_MAX); + PyModule_AddIntConstant(mod, "LDM_MINMATCH_MIN", ZSTD_LDM_MINMATCH_MIN); + PyModule_AddIntConstant(mod, "LDM_MINMATCH_MAX", ZSTD_LDM_MINMATCH_MAX); + PyModule_AddIntConstant(mod, "LDM_BUCKETSIZELOG_MAX", + ZSTD_LDM_BUCKETSIZELOG_MAX); + + PyModule_AddIntConstant(mod, "STRATEGY_FAST", ZSTD_fast); + PyModule_AddIntConstant(mod, "STRATEGY_DFAST", ZSTD_dfast); + PyModule_AddIntConstant(mod, "STRATEGY_GREEDY", ZSTD_greedy); + PyModule_AddIntConstant(mod, "STRATEGY_LAZY", ZSTD_lazy); + PyModule_AddIntConstant(mod, "STRATEGY_LAZY2", ZSTD_lazy2); + PyModule_AddIntConstant(mod, "STRATEGY_BTLAZY2", ZSTD_btlazy2); + PyModule_AddIntConstant(mod, "STRATEGY_BTOPT", ZSTD_btopt); + PyModule_AddIntConstant(mod, "STRATEGY_BTULTRA", ZSTD_btultra); + PyModule_AddIntConstant(mod, "STRATEGY_BTULTRA2", ZSTD_btultra2); + + PyModule_AddIntConstant(mod, "DICT_TYPE_AUTO", ZSTD_dct_auto); + PyModule_AddIntConstant(mod, "DICT_TYPE_RAWCONTENT", ZSTD_dct_rawContent); + PyModule_AddIntConstant(mod, "DICT_TYPE_FULLDICT", ZSTD_dct_fullDict); + + PyModule_AddIntConstant(mod, "FORMAT_ZSTD1", ZSTD_f_zstd1); + PyModule_AddIntConstant(mod, "FORMAT_ZSTD1_MAGICLESS", + ZSTD_f_zstd1_magicless); +} diff --git a/contrib/python/zstandard/py3/c-ext/decompressionreader.c b/contrib/python/zstandard/py3/c-ext/decompressionreader.c new file mode 100644 index 00000000000..0e0935356f1 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/decompressionreader.c @@ -0,0 +1,780 @@ +/** + * Copyright (c) 2017-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void decompressionreader_dealloc(ZstdDecompressionReader *self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + } + + Py_CLEAR(self->readResult); + + PyObject_Del(self); +} + +static ZstdDecompressionReader * +decompressionreader_enter(ZstdDecompressionReader *self) { + if (self->entered) { + PyErr_SetString(PyExc_ValueError, "cannot __enter__ multiple times"); + return NULL; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return self; +} + +static PyObject *decompressionreader_exit(ZstdDecompressionReader *self, + PyObject *args) { + PyObject *exc_type; + PyObject *exc_value; + PyObject *exc_tb; + + if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, + &exc_tb)) { + return NULL; + } + + self->entered = 0; + + if (NULL == PyObject_CallMethod((PyObject *)self, "close", NULL)) { + return NULL; + } + + /* Release resources. */ + Py_CLEAR(self->reader); + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + Py_CLEAR(self->decompressor); + + Py_RETURN_FALSE; +} + +static PyObject *decompressionreader_readable(PyObject *self) { + Py_RETURN_TRUE; +} + +static PyObject *decompressionreader_writable(PyObject *self) { + Py_RETURN_FALSE; +} + +static PyObject *decompressionreader_seekable(PyObject *self) { + Py_RETURN_FALSE; +} + +static PyObject *decompressionreader_close(ZstdDecompressionReader *self) { + if (self->closed) { + Py_RETURN_NONE; + } + + self->closed = 1; + + if (self->closefd && self->reader != NULL && + PyObject_HasAttrString(self->reader, "close")) { + return PyObject_CallMethod(self->reader, "close", NULL); + } + + Py_RETURN_NONE; +} + +static PyObject *decompressionreader_flush(PyObject *self) { + Py_RETURN_NONE; +} + +static PyObject *decompressionreader_isatty(PyObject *self) { + Py_RETURN_FALSE; +} + +/** + * Read available input. + * + * Returns 0 if no data was added to input. + * Returns 1 if new input data is available. + * Returns -1 on error and sets a Python exception as a side-effect. + */ +int read_decompressor_input(ZstdDecompressionReader *self) { + if (self->finishedInput) { + return 0; + } + + if (self->input.pos != self->input.size) { + return 0; + } + + if (self->reader) { + Py_buffer buffer; + + assert(self->readResult == NULL); + self->readResult = + PyObject_CallMethod(self->reader, "read", "k", self->readSize); + if (NULL == self->readResult) { + return -1; + } + + memset(&buffer, 0, sizeof(buffer)); + + if (0 != + PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) { + return -1; + } + + /* EOF */ + if (0 == buffer.len) { + self->finishedInput = 1; + Py_CLEAR(self->readResult); + } + else { + self->input.src = buffer.buf; + self->input.size = buffer.len; + self->input.pos = 0; + } + + PyBuffer_Release(&buffer); + } + else { + assert(self->buffer.buf); + /* + * We should only get here once since expectation is we always + * exhaust input buffer before reading again. + */ + assert(self->input.src == NULL); + + self->input.src = self->buffer.buf; + self->input.size = self->buffer.len; + self->input.pos = 0; + } + + return 1; +} + +/** + * Decompresses available input into an output buffer. + * + * Returns 0 if we need more input. + * Returns 1 if output buffer should be emitted. + * Returns -1 on error and sets a Python exception. + */ +int decompress_input(ZstdDecompressionReader *self, ZSTD_outBuffer *output) { + size_t zresult; + + if (self->input.pos >= self->input.size) { + return 0; + } + + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->decompressor->dctx, output, &self->input); + Py_END_ALLOW_THREADS + + /* Input exhausted. Clear our state tracking. */ + if (self->input.pos == self->input.size) { + memset(&self->input, 0, sizeof(self->input)); + Py_CLEAR(self->readResult); + + if (self->buffer.buf) { + self->finishedInput = 1; + } + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompress error: %s", + ZSTD_getErrorName(zresult)); + return -1; + } + + /* We fulfilled the full read request. Signal to emit. */ + if (output->pos && output->pos == output->size) { + return 1; + } + /* We're at the end of a frame and we aren't allowed to return data + spanning frames. */ + else if (output->pos && zresult == 0 && !self->readAcrossFrames) { + return 1; + } + + /* There is more room in the output. Signal to collect more data. */ + return 0; +} + +static PyObject *decompressionreader_read(ZstdDecompressionReader *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"size", NULL}; + + Py_ssize_t size = -1; + PyObject *result = NULL; + char *resultBuffer; + Py_ssize_t resultSize; + ZSTD_outBuffer output; + int decompressResult, readResult; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, + "cannot read negative amounts less than -1"); + return NULL; + } + + if (size == -1) { + return PyObject_CallMethod((PyObject *)self, "readall", NULL); + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + output.dst = resultBuffer; + output.size = resultSize; + output.pos = 0; + +readinput: + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == decompressResult) { + } + else if (1 == decompressResult) { + self->bytesDecompressed += output.pos; + + if (output.pos != output.size) { + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + } + return result; + } + else { + assert(0); + } + + readResult = read_decompressor_input(self); + + if (-1 == readResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == readResult) { + } + else if (1 == readResult) { + } + else { + assert(0); + } + + if (self->input.size) { + goto readinput; + } + + /* EOF */ + self->bytesDecompressed += output.pos; + + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + + return result; +} + +static PyObject *decompressionreader_read1(ZstdDecompressionReader *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"size", NULL}; + + Py_ssize_t size = -1; + PyObject *result = NULL; + char *resultBuffer; + Py_ssize_t resultSize; + ZSTD_outBuffer output; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) { + return NULL; + } + + if (size < -1) { + PyErr_SetString(PyExc_ValueError, + "cannot read negative amounts less than -1"); + return NULL; + } + + if (self->finishedOutput || size == 0) { + return PyBytes_FromStringAndSize("", 0); + } + + if (size == -1) { + size = ZSTD_DStreamOutSize(); + } + + result = PyBytes_FromStringAndSize(NULL, size); + if (NULL == result) { + return NULL; + } + + PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize); + + output.dst = resultBuffer; + output.size = resultSize; + output.pos = 0; + + /* read1() is supposed to use at most 1 read() from the underlying stream. + * However, we can't satisfy this requirement with decompression due to the + * nature of how decompression works. Our strategy is to read + decompress + * until we get any output, at which point we return. This satisfies the + * intent of the read1() API to limit read operations. + */ + while (!self->finishedInput) { + int readResult, decompressResult; + + readResult = read_decompressor_input(self); + if (-1 == readResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == readResult || 1 == readResult) { + } + else { + assert(0); + } + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + Py_XDECREF(result); + return NULL; + } + else if (0 == decompressResult || 1 == decompressResult) { + } + else { + assert(0); + } + + if (output.pos) { + break; + } + } + + self->bytesDecompressed += output.pos; + if (safe_pybytes_resize(&result, output.pos)) { + Py_XDECREF(result); + return NULL; + } + + return result; +} + +static PyObject *decompressionreader_readinto(ZstdDecompressionReader *self, + PyObject *args) { + Py_buffer dest; + ZSTD_outBuffer output; + int decompressResult, readResult; + PyObject *result = NULL; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto", &dest)) { + return NULL; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + +readinput: + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + goto finally; + } + else if (0 == decompressResult) { + } + else if (1 == decompressResult) { + self->bytesDecompressed += output.pos; + result = PyLong_FromSize_t(output.pos); + goto finally; + } + else { + assert(0); + } + + readResult = read_decompressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult) { + } + else if (1 == readResult) { + } + else { + assert(0); + } + + if (self->input.size) { + goto readinput; + } + + /* EOF */ + self->bytesDecompressed += output.pos; + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject *decompressionreader_readinto1(ZstdDecompressionReader *self, + PyObject *args) { + Py_buffer dest; + ZSTD_outBuffer output; + PyObject *result = NULL; + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->finishedOutput) { + return PyLong_FromLong(0); + } + + if (!PyArg_ParseTuple(args, "w*:readinto1", &dest)) { + return NULL; + } + + output.dst = dest.buf; + output.size = dest.len; + output.pos = 0; + + while (!self->finishedInput && !self->finishedOutput) { + int decompressResult, readResult; + + readResult = read_decompressor_input(self); + + if (-1 == readResult) { + goto finally; + } + else if (0 == readResult || 1 == readResult) { + } + else { + assert(0); + } + + decompressResult = decompress_input(self, &output); + + if (-1 == decompressResult) { + goto finally; + } + else if (0 == decompressResult || 1 == decompressResult) { + } + else { + assert(0); + } + + if (output.pos) { + break; + } + } + + self->bytesDecompressed += output.pos; + result = PyLong_FromSize_t(output.pos); + +finally: + PyBuffer_Release(&dest); + + return result; +} + +static PyObject *decompressionreader_readall(PyObject *self) { + PyObject *chunks = NULL; + PyObject *empty = NULL; + PyObject *result = NULL; + + /* Our strategy is to collect chunks into a list then join all the + * chunks at the end. We could potentially use e.g. an io.BytesIO. But + * this feels simple enough to implement and avoids potentially expensive + * reallocations of large buffers. + */ + chunks = PyList_New(0); + if (NULL == chunks) { + return NULL; + } + + while (1) { + PyObject *chunk = PyObject_CallMethod(self, "read", "i", 1048576); + if (NULL == chunk) { + Py_DECREF(chunks); + return NULL; + } + + if (!PyBytes_Size(chunk)) { + Py_DECREF(chunk); + break; + } + + if (PyList_Append(chunks, chunk)) { + Py_DECREF(chunk); + Py_DECREF(chunks); + return NULL; + } + + Py_DECREF(chunk); + } + + empty = PyBytes_FromStringAndSize("", 0); + if (NULL == empty) { + Py_DECREF(chunks); + return NULL; + } + + result = PyObject_CallMethod(empty, "join", "O", chunks); + + Py_DECREF(empty); + Py_DECREF(chunks); + + return result; +} + +static PyObject *decompressionreader_readline(PyObject *self, PyObject *args, + PyObject *kwargs) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *decompressionreader_readlines(PyObject *self, PyObject *args, + PyObject *kwargs) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *decompressionreader_seek(ZstdDecompressionReader *self, + PyObject *args) { + Py_ssize_t pos; + int whence = 0; + unsigned long long readAmount = 0; + size_t defaultOutSize = ZSTD_DStreamOutSize(); + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!PyArg_ParseTuple(args, "n|i:seek", &pos, &whence)) { + return NULL; + } + + if (whence == SEEK_SET) { + if (pos < 0) { + PyErr_SetString(PyExc_OSError, + "cannot seek to negative position with SEEK_SET"); + return NULL; + } + + if ((unsigned long long)pos < self->bytesDecompressed) { + PyErr_SetString(PyExc_OSError, + "cannot seek zstd decompression stream backwards"); + return NULL; + } + + readAmount = pos - self->bytesDecompressed; + } + else if (whence == SEEK_CUR) { + if (pos < 0) { + PyErr_SetString(PyExc_OSError, + "cannot seek zstd decompression stream backwards"); + return NULL; + } + + readAmount = pos; + } + else if (whence == SEEK_END) { + /* We /could/ support this with pos==0. But let's not do that until + someone needs it. */ + PyErr_SetString( + PyExc_OSError, + "zstd decompression streams cannot be seeked with SEEK_END"); + return NULL; + } + + /* It is a bit inefficient to do this via the Python API. But since there + is a bit of state tracking involved to read from this type, it is the + easiest to implement. */ + while (readAmount) { + Py_ssize_t readSize; + PyObject *readResult = PyObject_CallMethod( + (PyObject *)self, "read", "K", + readAmount < defaultOutSize ? readAmount : defaultOutSize); + + if (!readResult) { + return NULL; + } + + readSize = PyBytes_GET_SIZE(readResult); + + Py_CLEAR(readResult); + + /* Empty read means EOF. */ + if (!readSize) { + break; + } + + readAmount -= readSize; + } + + return PyLong_FromUnsignedLongLong(self->bytesDecompressed); +} + +static PyObject *decompressionreader_tell(ZstdDecompressionReader *self) { + /* TODO should this raise OSError since stream isn't seekable? */ + return PyLong_FromUnsignedLongLong(self->bytesDecompressed); +} + +static PyObject *decompressionreader_write(PyObject *self, PyObject *args) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *decompressionreader_writelines(PyObject *self, + PyObject *args) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *decompressionreader_iter(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *decompressionreader_iternext(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyMethodDef decompressionreader_methods[] = { + {"__enter__", (PyCFunction)decompressionreader_enter, METH_NOARGS, + PyDoc_STR("Enter a compression context")}, + {"__exit__", (PyCFunction)decompressionreader_exit, METH_VARARGS, + PyDoc_STR("Exit a compression context")}, + {"close", (PyCFunction)decompressionreader_close, METH_NOARGS, + PyDoc_STR("Close the stream so it cannot perform any more operations")}, + {"flush", (PyCFunction)decompressionreader_flush, METH_NOARGS, + PyDoc_STR("no-ops")}, + {"isatty", (PyCFunction)decompressionreader_isatty, METH_NOARGS, + PyDoc_STR("Returns False")}, + {"readable", (PyCFunction)decompressionreader_readable, METH_NOARGS, + PyDoc_STR("Returns True")}, + {"read", (PyCFunction)decompressionreader_read, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("read compressed data")}, + {"read1", (PyCFunction)decompressionreader_read1, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("read compressed data")}, + {"readinto", (PyCFunction)decompressionreader_readinto, METH_VARARGS, NULL}, + {"readinto1", (PyCFunction)decompressionreader_readinto1, METH_VARARGS, + NULL}, + {"readall", (PyCFunction)decompressionreader_readall, METH_NOARGS, + PyDoc_STR("Not implemented")}, + {"readline", (PyCFunction)decompressionreader_readline, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("Not implemented")}, + {"readlines", (PyCFunction)decompressionreader_readlines, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("Not implemented")}, + {"seek", (PyCFunction)decompressionreader_seek, METH_VARARGS, + PyDoc_STR("Seek the stream")}, + {"seekable", (PyCFunction)decompressionreader_seekable, METH_NOARGS, + PyDoc_STR("Returns False")}, + {"tell", (PyCFunction)decompressionreader_tell, METH_NOARGS, + PyDoc_STR("Returns current number of bytes compressed")}, + {"writable", (PyCFunction)decompressionreader_writable, METH_NOARGS, + PyDoc_STR("Returns False")}, + {"write", (PyCFunction)decompressionreader_write, METH_VARARGS, + PyDoc_STR("unsupported operation")}, + {"writelines", (PyCFunction)decompressionreader_writelines, METH_VARARGS, + PyDoc_STR("unsupported operation")}, + {NULL, NULL}}; + +static PyMemberDef decompressionreader_members[] = { + {"closed", T_BOOL, offsetof(ZstdDecompressionReader, closed), READONLY, + "whether stream is closed"}, + {NULL}}; + +PyType_Slot ZstdDecompressionReaderSlots[] = { + {Py_tp_dealloc, decompressionreader_dealloc}, + {Py_tp_iter, decompressionreader_iter}, + {Py_tp_iternext, decompressionreader_iternext}, + {Py_tp_methods, decompressionreader_methods}, + {Py_tp_members, decompressionreader_members}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdDecompressionReaderSpec = { + "zstd.ZstdDecompressionReader", + sizeof(ZstdDecompressionReader), + 0, + Py_TPFLAGS_DEFAULT, + ZstdDecompressionReaderSlots, +}; + +PyTypeObject *ZstdDecompressionReaderType; + +void decompressionreader_module_init(PyObject *mod) { + /* TODO make reader a sub-class of io.RawIOBase */ + + ZstdDecompressionReaderType = + (PyTypeObject *)PyType_FromSpec(&ZstdDecompressionReaderSpec); + if (PyType_Ready(ZstdDecompressionReaderType) < 0) { + return; + } + + Py_INCREF((PyObject *)ZstdDecompressionReaderType); + PyModule_AddObject(mod, "ZstdDecompressionReader", + (PyObject *)ZstdDecompressionReaderType); +} diff --git a/contrib/python/zstandard/py3/c-ext/decompressionwriter.c b/contrib/python/zstandard/py3/c-ext/decompressionwriter.c new file mode 100644 index 00000000000..09fd1338056 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/decompressionwriter.c @@ -0,0 +1,273 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void ZstdDecompressionWriter_dealloc(ZstdDecompressionWriter *self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->writer); + + PyObject_Del(self); +} + +static PyObject *ZstdDecompressionWriter_enter(ZstdDecompressionWriter *self) { + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (self->entered) { + PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return (PyObject *)self; +} + +static PyObject *ZstdDecompressionWriter_exit(ZstdDecompressionWriter *self, + PyObject *args) { + self->entered = 0; + + if (NULL == PyObject_CallMethod((PyObject *)self, "close", NULL)) { + return NULL; + } + + Py_RETURN_FALSE; +} + +static PyObject * +ZstdDecompressionWriter_memory_size(ZstdDecompressionWriter *self) { + return PyLong_FromSize_t(ZSTD_sizeof_DCtx(self->decompressor->dctx)); +} + +static PyObject *ZstdDecompressionWriter_write(ZstdDecompressionWriter *self, + PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"data", NULL}; + + PyObject *result = NULL; + Py_buffer source; + size_t zresult = 0; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + PyObject *res; + Py_ssize_t totalWrite = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:write", kwlist, + &source)) { + return NULL; + } + + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + output.dst = PyMem_Malloc(self->outSize); + if (!output.dst) { + PyErr_NoMemory(); + goto finally; + } + output.size = self->outSize; + output.pos = 0; + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + while (input.pos < (size_t)source.len) { + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->decompressor->dctx, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyMem_Free(output.dst); + PyErr_Format(ZstdError, "zstd decompress error: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + if (output.pos) { + res = PyObject_CallMethod(self->writer, "write", "y#", output.dst, + output.pos); + if (NULL == res) { + goto finally; + } + Py_XDECREF(res); + totalWrite += output.pos; + output.pos = 0; + } + } + + PyMem_Free(output.dst); + + if (self->writeReturnRead) { + result = PyLong_FromSize_t(input.pos); + } + else { + result = PyLong_FromSsize_t(totalWrite); + } + +finally: + PyBuffer_Release(&source); + return result; +} + +static PyObject *ZstdDecompressionWriter_close(ZstdDecompressionWriter *self) { + PyObject *result; + + if (self->closed) { + Py_RETURN_NONE; + } + + self->closing = 1; + result = PyObject_CallMethod((PyObject *)self, "flush", NULL); + self->closing = 0; + self->closed = 1; + + if (NULL == result) { + return NULL; + } + + /* Call close on underlying stream as well. */ + if (self->closefd && PyObject_HasAttrString(self->writer, "close")) { + return PyObject_CallMethod(self->writer, "close", NULL); + } + + Py_RETURN_NONE; +} + +static PyObject *ZstdDecompressionWriter_fileno(ZstdDecompressionWriter *self) { + if (PyObject_HasAttrString(self->writer, "fileno")) { + return PyObject_CallMethod(self->writer, "fileno", NULL); + } + else { + PyErr_SetString(PyExc_OSError, + "fileno not available on underlying writer"); + return NULL; + } +} + +static PyObject *ZstdDecompressionWriter_flush(ZstdDecompressionWriter *self) { + if (self->closed) { + PyErr_SetString(PyExc_ValueError, "stream is closed"); + return NULL; + } + + if (!self->closing && PyObject_HasAttrString(self->writer, "flush")) { + return PyObject_CallMethod(self->writer, "flush", NULL); + } + else { + Py_RETURN_NONE; + } +} + +static PyObject *ZstdDecompressionWriter_iter(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *ZstdDecompressionWriter_iternext(PyObject *self) { + set_io_unsupported_operation(); + return NULL; +} + +static PyObject *ZstdDecompressionWriter_false(PyObject *self, PyObject *args) { + Py_RETURN_FALSE; +} + +static PyObject *ZstdDecompressionWriter_true(PyObject *self, PyObject *args) { + Py_RETURN_TRUE; +} + +static PyObject *ZstdDecompressionWriter_unsupported(PyObject *self, + PyObject *args, + PyObject *kwargs) { + set_io_unsupported_operation(); + return NULL; +} + +static PyMethodDef ZstdDecompressionWriter_methods[] = { + {"__enter__", (PyCFunction)ZstdDecompressionWriter_enter, METH_NOARGS, + PyDoc_STR("Enter a decompression context.")}, + {"__exit__", (PyCFunction)ZstdDecompressionWriter_exit, METH_VARARGS, + PyDoc_STR("Exit a decompression context.")}, + {"memory_size", (PyCFunction)ZstdDecompressionWriter_memory_size, + METH_NOARGS, + PyDoc_STR( + "Obtain the memory size in bytes of the underlying decompressor.")}, + {"close", (PyCFunction)ZstdDecompressionWriter_close, METH_NOARGS, NULL}, + {"fileno", (PyCFunction)ZstdDecompressionWriter_fileno, METH_NOARGS, NULL}, + {"flush", (PyCFunction)ZstdDecompressionWriter_flush, METH_NOARGS, NULL}, + {"isatty", ZstdDecompressionWriter_false, METH_NOARGS, NULL}, + {"readable", ZstdDecompressionWriter_false, METH_NOARGS, NULL}, + {"readline", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"readlines", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"seek", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"seekable", ZstdDecompressionWriter_false, METH_NOARGS, NULL}, + {"tell", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"truncate", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"writable", ZstdDecompressionWriter_true, METH_NOARGS, NULL}, + {"writelines", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"read", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"readall", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"readinto", (PyCFunction)ZstdDecompressionWriter_unsupported, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"write", (PyCFunction)ZstdDecompressionWriter_write, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("Compress data")}, + {NULL, NULL}}; + +static PyMemberDef ZstdDecompressionWriter_members[] = { + {"closed", T_BOOL, offsetof(ZstdDecompressionWriter, closed), READONLY, + NULL}, + {NULL}}; + +PyType_Slot ZstdDecompressionWriterSlots[] = { + {Py_tp_dealloc, ZstdDecompressionWriter_dealloc}, + {Py_tp_iter, ZstdDecompressionWriter_iter}, + {Py_tp_iternext, ZstdDecompressionWriter_iternext}, + {Py_tp_methods, ZstdDecompressionWriter_methods}, + {Py_tp_members, ZstdDecompressionWriter_members}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdDecompressionWriterSpec = { + "zstd.ZstdDecompressionWriter", + sizeof(ZstdDecompressionWriter), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdDecompressionWriterSlots, +}; + +PyTypeObject *ZstdDecompressionWriterType; + +void decompressionwriter_module_init(PyObject *mod) { + ZstdDecompressionWriterType = + (PyTypeObject *)PyType_FromSpec(&ZstdDecompressionWriterSpec); + if (PyType_Ready(ZstdDecompressionWriterType) < 0) { + return; + } + + Py_INCREF((PyObject *)ZstdDecompressionWriterType); + PyModule_AddObject(mod, "ZstdDecompressionWriter", + (PyObject *)ZstdDecompressionWriterType); +} diff --git a/contrib/python/zstandard/py3/c-ext/decompressobj.c b/contrib/python/zstandard/py3/c-ext/decompressobj.c new file mode 100644 index 00000000000..5c3a124fc9a --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/decompressobj.c @@ -0,0 +1,201 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +static void DecompressionObj_dealloc(ZstdDecompressionObj *self) { + Py_XDECREF(self->decompressor); + Py_CLEAR(self->unused_data); + + PyObject_Del(self); +} + +static PyObject *DecompressionObj_decompress(ZstdDecompressionObj *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"data", NULL}; + + Py_buffer source; + size_t zresult; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + PyObject *result = NULL; + Py_ssize_t resultSize = 0; + + output.dst = NULL; + + if (self->finished) { + PyErr_SetString(ZstdError, "cannot use a decompressobj multiple times"); + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:decompress", kwlist, + &source)) { + return NULL; + } + + /* Special case of empty input. Output will always be empty. */ + if (source.len == 0) { + result = PyBytes_FromString(""); + goto finally; + } + + input.src = source.buf; + input.size = source.len; + input.pos = 0; + + output.dst = PyMem_Malloc(self->outSize); + if (!output.dst) { + PyErr_NoMemory(); + goto except; + } + output.size = self->outSize; + output.pos = 0; + + while (1) { + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->decompressor->dctx, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompressor error: %s", + ZSTD_getErrorName(zresult)); + goto except; + } + + if (output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + if (-1 == + safe_pybytes_resize(&result, resultSize + output.pos)) { + Py_XDECREF(result); + goto except; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, output.dst, + output.pos); + } + else { + result = PyBytes_FromStringAndSize(output.dst, output.pos); + if (!result) { + goto except; + } + } + } + + if (0 == zresult) { + self->finished = 1; + + /* We should only get here at most once. */ + assert(!self->unused_data); + self->unused_data = PyBytes_FromStringAndSize((char *)(input.src) + input.pos, input.size - input.pos); + + break; + } + else if (input.pos == input.size && output.pos == 0) { + break; + } + else { + output.pos = 0; + } + } + + if (!result) { + result = PyBytes_FromString(""); + } + + goto finally; + +except: + Py_CLEAR(result); + +finally: + PyMem_Free(output.dst); + PyBuffer_Release(&source); + + return result; +} + +static PyObject *DecompressionObj_flush(ZstdDecompressionObj *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"length", NULL}; + + PyObject *length = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O:flush", kwlist, + &length)) { + return NULL; + } + + return PyBytes_FromString(""); +} + +static PyObject *DecompressionObj_unused_data(PyObject *self, void *unused) { + ZstdDecompressionObj *slf = (ZstdDecompressionObj *)(self); + + if (slf->unused_data) { + Py_INCREF(slf->unused_data); + return slf->unused_data; + } else { + return PyBytes_FromString(""); + } +} + +static PyObject *DecompressionObj_unconsumed_tail(PyObject *self, void *unused) { + return PyBytes_FromString(""); +} + +static PyObject *DecompressionObj_eof(PyObject *self, void *unused) { + ZstdDecompressionObj *slf = (ZstdDecompressionObj*)(self); + + if (slf->finished) { + Py_RETURN_TRUE; + } else { + Py_RETURN_FALSE; + } +} + +static PyMethodDef DecompressionObj_methods[] = { + {"decompress", (PyCFunction)DecompressionObj_decompress, + METH_VARARGS | METH_KEYWORDS, PyDoc_STR("decompress data")}, + {"flush", (PyCFunction)DecompressionObj_flush, METH_VARARGS | METH_KEYWORDS, + PyDoc_STR("no-op")}, + {NULL, NULL}}; + +static PyGetSetDef DecompressionObj_getset[] = { + {"unused_data", DecompressionObj_unused_data, NULL, NULL, NULL }, + {"unconsumed_tail", DecompressionObj_unconsumed_tail, NULL, NULL, NULL }, + {"eof", DecompressionObj_eof, NULL, NULL, NULL }, + {NULL}}; + +PyType_Slot ZstdDecompressionObjSlots[] = { + {Py_tp_dealloc, DecompressionObj_dealloc}, + {Py_tp_methods, DecompressionObj_methods}, + {Py_tp_getset, DecompressionObj_getset}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdDecompressionObjSpec = { + "zstd.ZstdDecompressionObj", + sizeof(ZstdDecompressionObj), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdDecompressionObjSlots, +}; + +PyTypeObject *ZstdDecompressionObjType; + +void decompressobj_module_init(PyObject *module) { + ZstdDecompressionObjType = + (PyTypeObject *)PyType_FromSpec(&ZstdDecompressionObjSpec); + if (PyType_Ready(ZstdDecompressionObjType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py3/c-ext/decompressor.c b/contrib/python/zstandard/py3/c-ext/decompressor.c new file mode 100644 index 00000000000..3e61fe9ee6f --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/decompressor.c @@ -0,0 +1,1762 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +/** + * Ensure the ZSTD_DCtx on a decompressor is initiated and ready for a new + * operation. + */ +int ensure_dctx(ZstdDecompressor *decompressor, int loadDict) { + size_t zresult; + + ZSTD_DCtx_reset(decompressor->dctx, ZSTD_reset_session_only); + + if (decompressor->maxWindowSize) { + zresult = ZSTD_DCtx_setMaxWindowSize(decompressor->dctx, + decompressor->maxWindowSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to set max window size: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + } + + zresult = ZSTD_DCtx_setParameter(decompressor->dctx, ZSTD_d_format, + decompressor->format); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "unable to set decoding format: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + + if (loadDict && decompressor->dict) { + if (ensure_ddict(decompressor->dict)) { + return 1; + } + + zresult = + ZSTD_DCtx_refDDict(decompressor->dctx, decompressor->dict->ddict); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, + "unable to reference prepared dictionary: %s", + ZSTD_getErrorName(zresult)); + return 1; + } + } + + return 0; +} + +static int Decompressor_init(ZstdDecompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"dict_data", "max_window_size", "format", NULL}; + + PyObject *dict = NULL; + Py_ssize_t maxWindowSize = 0; + ZSTD_format_e format = ZSTD_f_zstd1; + + self->dctx = NULL; + self->dict = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|OnI:ZstdDecompressor", + kwlist, &dict, &maxWindowSize, &format)) { + return -1; + } + + if (dict) { + if (dict == Py_None) { + dict = NULL; + } + else if (!PyObject_IsInstance(dict, + (PyObject *)ZstdCompressionDictType)) { + PyErr_Format(PyExc_TypeError, + "dict_data must be zstd.ZstdCompressionDict"); + return -1; + } + } + + self->dctx = ZSTD_createDCtx(); + if (!self->dctx) { + PyErr_NoMemory(); + goto except; + } + + self->maxWindowSize = maxWindowSize; + self->format = format; + + if (dict) { + self->dict = (ZstdCompressionDict *)dict; + Py_INCREF(dict); + } + + if (ensure_dctx(self, 1)) { + goto except; + } + + return 0; + +except: + Py_CLEAR(self->dict); + + if (self->dctx) { + ZSTD_freeDCtx(self->dctx); + self->dctx = NULL; + } + + return -1; +} + +static void Decompressor_dealloc(ZstdDecompressor *self) { + Py_CLEAR(self->dict); + + if (self->dctx) { + ZSTD_freeDCtx(self->dctx); + self->dctx = NULL; + } + + PyObject_Del(self); +} + +static PyObject *Decompressor_memory_size(ZstdDecompressor *self) { + if (self->dctx) { + return PyLong_FromSize_t(ZSTD_sizeof_DCtx(self->dctx)); + } + else { + PyErr_SetString( + ZstdError, + "no decompressor context found; this should never happen"); + return NULL; + } +} + +static PyObject *Decompressor_copy_stream(ZstdDecompressor *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"ifh", "ofh", "read_size", "write_size", NULL}; + + PyObject *source; + PyObject *dest; + size_t inSize = ZSTD_DStreamInSize(); + size_t outSize = ZSTD_DStreamOutSize(); + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t totalRead = 0; + Py_ssize_t totalWrite = 0; + char *readBuffer; + Py_ssize_t readSize; + PyObject *readResult = NULL; + PyObject *res = NULL; + size_t zresult = 0; + PyObject *writeResult; + PyObject *totalReadPy; + PyObject *totalWritePy; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|kk:copy_stream", kwlist, + &source, &dest, &inSize, &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(source, "read")) { + PyErr_SetString(PyExc_ValueError, + "first argument must have a read() method"); + return NULL; + } + + if (!PyObject_HasAttrString(dest, "write")) { + PyErr_SetString(PyExc_ValueError, + "second argument must have a write() method"); + return NULL; + } + + /* Prevent free on uninitialized memory in finally. */ + output.dst = NULL; + + if (ensure_dctx(self, 1)) { + res = NULL; + goto finally; + } + + output.dst = PyMem_Malloc(outSize); + if (!output.dst) { + PyErr_NoMemory(); + res = NULL; + goto finally; + } + output.size = outSize; + output.pos = 0; + + /* Read source stream until EOF */ + while (1) { + readResult = PyObject_CallMethod(source, "read", "n", inSize); + if (!readResult) { + goto finally; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + + /* If no data was read, we're at EOF. */ + if (0 == readSize) { + break; + } + + totalRead += readSize; + + /* Send data to decompressor */ + input.src = readBuffer; + input.size = readSize; + input.pos = 0; + + while (input.pos < input.size) { + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->dctx, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompressor error: %s", + ZSTD_getErrorName(zresult)); + res = NULL; + goto finally; + } + + if (output.pos) { + writeResult = PyObject_CallMethod(dest, "write", "y#", + output.dst, output.pos); + if (NULL == writeResult) { + res = NULL; + goto finally; + } + + Py_XDECREF(writeResult); + totalWrite += output.pos; + output.pos = 0; + } + } + + Py_CLEAR(readResult); + } + + /* Source stream is exhausted. Finish up. */ + + totalReadPy = PyLong_FromSsize_t(totalRead); + totalWritePy = PyLong_FromSsize_t(totalWrite); + res = PyTuple_Pack(2, totalReadPy, totalWritePy); + Py_DECREF(totalReadPy); + Py_DECREF(totalWritePy); + +finally: + if (output.dst) { + PyMem_Free(output.dst); + } + + Py_XDECREF(readResult); + + return res; +} + +PyObject *Decompressor_decompress(ZstdDecompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = { + "data", + "max_output_size", + "read_across_frames", + "allow_extra_data", + NULL + }; + + Py_buffer source; + Py_ssize_t maxOutputSize = 0; + + unsigned long long decompressedSize; + PyObject *readAcrossFrames = NULL; + PyObject *allowExtraData = NULL; + size_t destCapacity; + PyObject *result = NULL; + size_t zresult; + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|nOO:decompress", kwlist, + &source, &maxOutputSize, &readAcrossFrames, + &allowExtraData)) { + return NULL; + } + + if (readAcrossFrames ? PyObject_IsTrue(readAcrossFrames) : 0) { + PyErr_SetString(ZstdError, + "ZstdDecompressor.read_across_frames=True is not yet implemented" + ); + goto finally; + } + + if (ensure_dctx(self, 1)) { + goto finally; + } + + decompressedSize = ZSTD_getFrameContentSize(source.buf, source.len); + + if (ZSTD_CONTENTSIZE_ERROR == decompressedSize) { + PyErr_SetString(ZstdError, + "error determining content size from frame header"); + goto finally; + } + /* Special case of empty frame. */ + else if (0 == decompressedSize) { + result = PyBytes_FromStringAndSize("", 0); + goto finally; + } + /* Missing content size in frame header. */ + if (ZSTD_CONTENTSIZE_UNKNOWN == decompressedSize) { + if (0 == maxOutputSize) { + PyErr_SetString(ZstdError, + "could not determine content size in frame header"); + goto finally; + } + + result = PyBytes_FromStringAndSize(NULL, maxOutputSize); + destCapacity = maxOutputSize; + decompressedSize = 0; + } + /* Size is recorded in frame header. */ + else { + assert(SIZE_MAX >= PY_SSIZE_T_MAX); + if (decompressedSize > PY_SSIZE_T_MAX) { + PyErr_SetString( + ZstdError, "frame is too large to decompress on this platform"); + goto finally; + } + + result = PyBytes_FromStringAndSize(NULL, (Py_ssize_t)decompressedSize); + destCapacity = (size_t)decompressedSize; + } + + if (!result) { + goto finally; + } + + outBuffer.dst = PyBytes_AsString(result); + outBuffer.size = destCapacity; + outBuffer.pos = 0; + + inBuffer.src = source.buf; + inBuffer.size = source.len; + inBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "decompression error: %s", + ZSTD_getErrorName(zresult)); + Py_CLEAR(result); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, + "decompression error: did not decompress full frame"); + Py_CLEAR(result); + goto finally; + } + else if (decompressedSize && outBuffer.pos != decompressedSize) { + PyErr_Format( + ZstdError, + "decompression error: decompressed %zu bytes; expected %llu", + zresult, decompressedSize); + Py_CLEAR(result); + goto finally; + } + else if (outBuffer.pos < destCapacity) { + if (safe_pybytes_resize(&result, outBuffer.pos)) { + Py_CLEAR(result); + goto finally; + } + } + else if ((allowExtraData ? PyObject_IsTrue(allowExtraData) : 1) == 0 + && inBuffer.pos < inBuffer.size) { + PyErr_Format( + ZstdError, + "compressed input contains %zu bytes of unused data, which is disallowed", + inBuffer.size - inBuffer.pos + ); + Py_CLEAR(result); + goto finally; + } + +finally: + PyBuffer_Release(&source); + return result; +} + +static ZstdDecompressionObj *Decompressor_decompressobj(ZstdDecompressor *self, + PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"write_size", NULL}; + + ZstdDecompressionObj *result = NULL; + size_t outSize = ZSTD_DStreamOutSize(); + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|k:decompressobj", kwlist, + &outSize)) { + return NULL; + } + + if (!outSize) { + PyErr_SetString(PyExc_ValueError, "write_size must be positive"); + return NULL; + } + + result = (ZstdDecompressionObj *)PyObject_CallObject( + (PyObject *)ZstdDecompressionObjType, NULL); + if (!result) { + return NULL; + } + + if (ensure_dctx(self, 1)) { + Py_DECREF(result); + return NULL; + } + + result->decompressor = self; + Py_INCREF(result->decompressor); + result->outSize = outSize; + + return result; +} + +static ZstdDecompressorIterator * +Decompressor_read_to_iter(ZstdDecompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"reader", "read_size", "write_size", "skip_bytes", + NULL}; + + PyObject *reader; + size_t inSize = ZSTD_DStreamInSize(); + size_t outSize = ZSTD_DStreamOutSize(); + ZstdDecompressorIterator *result; + size_t skipBytes = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kkk:read_to_iter", kwlist, + &reader, &inSize, &outSize, &skipBytes)) { + return NULL; + } + + if (skipBytes >= inSize) { + PyErr_SetString(PyExc_ValueError, + "skip_bytes must be smaller than read_size"); + return NULL; + } + + result = (ZstdDecompressorIterator *)PyObject_CallObject( + (PyObject *)ZstdDecompressorIteratorType, NULL); + if (!result) { + return NULL; + } + + if (PyObject_HasAttrString(reader, "read")) { + result->reader = reader; + Py_INCREF(result->reader); + } + else if (1 == PyObject_CheckBuffer(reader)) { + /* Object claims it is a buffer. Try to get a handle to it. */ + if (0 != PyObject_GetBuffer(reader, &result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + } + else { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a read() method or conforms " + "to buffer protocol"); + goto except; + } + + result->decompressor = self; + Py_INCREF(result->decompressor); + + result->inSize = inSize; + result->outSize = outSize; + result->skipBytes = skipBytes; + + if (ensure_dctx(self, 1)) { + goto except; + } + + result->input.src = PyMem_Malloc(inSize); + if (!result->input.src) { + PyErr_NoMemory(); + goto except; + } + + goto finally; + +except: + Py_CLEAR(result); + +finally: + + return result; +} + +static ZstdDecompressionReader * +Decompressor_stream_reader(ZstdDecompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"source", "read_size", "read_across_frames", + "closefd", NULL}; + + PyObject *source; + size_t readSize = ZSTD_DStreamInSize(); + PyObject *readAcrossFrames = NULL; + PyObject *closefd = NULL; + ZstdDecompressionReader *result; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kOO:stream_reader", + kwlist, &source, &readSize, + &readAcrossFrames, &closefd)) { + return NULL; + } + + if (ensure_dctx(self, 1)) { + return NULL; + } + + result = (ZstdDecompressionReader *)PyObject_CallObject( + (PyObject *)ZstdDecompressionReaderType, NULL); + if (NULL == result) { + return NULL; + } + + result->entered = 0; + result->closed = 0; + + if (PyObject_HasAttrString(source, "read")) { + result->reader = source; + Py_INCREF(source); + result->readSize = readSize; + } + else if (1 == PyObject_CheckBuffer(source)) { + if (0 != PyObject_GetBuffer(source, &result->buffer, PyBUF_CONTIG_RO)) { + Py_CLEAR(result); + return NULL; + } + } + else { + PyErr_SetString(PyExc_TypeError, + "must pass an object with a read() method or that " + "conforms to the buffer protocol"); + Py_CLEAR(result); + return NULL; + } + + result->decompressor = self; + Py_INCREF(self); + result->readAcrossFrames = + readAcrossFrames ? PyObject_IsTrue(readAcrossFrames) : 0; + result->closefd = closefd ? PyObject_IsTrue(closefd) : 1; + + return result; +} + +static ZstdDecompressionWriter * +Decompressor_stream_writer(ZstdDecompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"writer", "write_size", "write_return_read", + "closefd", NULL}; + + PyObject *writer; + size_t outSize = ZSTD_DStreamOutSize(); + PyObject *writeReturnRead = NULL; + PyObject *closefd = NULL; + ZstdDecompressionWriter *result; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kOO:stream_writer", + kwlist, &writer, &outSize, + &writeReturnRead, &closefd)) { + return NULL; + } + + if (!PyObject_HasAttrString(writer, "write")) { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a write() method"); + return NULL; + } + + if (ensure_dctx(self, 1)) { + return NULL; + } + + result = (ZstdDecompressionWriter *)PyObject_CallObject( + (PyObject *)ZstdDecompressionWriterType, NULL); + if (!result) { + return NULL; + } + + result->entered = 0; + result->closing = 0; + result->closed = 0; + + result->decompressor = self; + Py_INCREF(result->decompressor); + + result->writer = writer; + Py_INCREF(result->writer); + + result->outSize = outSize; + result->writeReturnRead = + writeReturnRead ? PyObject_IsTrue(writeReturnRead) : 1; + result->closefd = closefd ? PyObject_IsTrue(closefd) : 1; + + return result; +} + +static PyObject * +Decompressor_decompress_content_dict_chain(ZstdDecompressor *self, + PyObject *args, PyObject *kwargs) { + static char *kwlist[] = {"frames", NULL}; + + PyObject *chunks; + Py_ssize_t chunksLen; + Py_ssize_t chunkIndex; + char parity = 0; + PyObject *chunk; + char *chunkData; + Py_ssize_t chunkSize; + size_t zresult; + ZSTD_frameHeader frameHeader; + void *buffer1 = NULL; + size_t buffer1Size = 0; + size_t buffer1ContentSize = 0; + void *buffer2 = NULL; + size_t buffer2Size = 0; + size_t buffer2ContentSize = 0; + void *destBuffer = NULL; + PyObject *result = NULL; + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, + "O!:decompress_content_dict_chain", kwlist, + &PyList_Type, &chunks)) { + return NULL; + } + + chunksLen = PyList_Size(chunks); + if (!chunksLen) { + PyErr_SetString(PyExc_ValueError, "empty input chain"); + return NULL; + } + + /* The first chunk should not be using a dictionary. We handle it specially. + */ + chunk = PyList_GetItem(chunks, 0); + if (!PyBytes_Check(chunk)) { + PyErr_SetString(PyExc_ValueError, "chunk 0 must be bytes"); + return NULL; + } + + /* We require that all chunks be zstd frames and that they have content size + * set. */ + PyBytes_AsStringAndSize(chunk, &chunkData, &chunkSize); + zresult = ZSTD_getFrameHeader(&frameHeader, (void *)chunkData, chunkSize); + if (ZSTD_isError(zresult)) { + PyErr_SetString(PyExc_ValueError, "chunk 0 is not a valid zstd frame"); + return NULL; + } + else if (zresult) { + PyErr_SetString(PyExc_ValueError, + "chunk 0 is too small to contain a zstd frame"); + return NULL; + } + + if (ZSTD_CONTENTSIZE_UNKNOWN == frameHeader.frameContentSize) { + PyErr_SetString(PyExc_ValueError, + "chunk 0 missing content size in frame"); + return NULL; + } + + assert(ZSTD_CONTENTSIZE_ERROR != frameHeader.frameContentSize); + + /* We check against PY_SSIZE_T_MAX here because we ultimately cast the + * result to a Python object and it's length can be no greater than + * Py_ssize_t. In theory, we could have an intermediate frame that is + * larger. But a) why would this API be used for frames that large b) + * it isn't worth the complexity to support. */ + assert(SIZE_MAX >= PY_SSIZE_T_MAX); + if (frameHeader.frameContentSize > PY_SSIZE_T_MAX) { + PyErr_SetString(PyExc_ValueError, + "chunk 0 is too large to decompress on this platform"); + return NULL; + } + + if (ensure_dctx(self, 0)) { + goto finally; + } + + buffer1Size = (size_t)frameHeader.frameContentSize; + buffer1 = PyMem_Malloc(buffer1Size); + if (!buffer1) { + goto finally; + } + + outBuffer.dst = buffer1; + outBuffer.size = buffer1Size; + outBuffer.pos = 0; + + inBuffer.src = chunkData; + inBuffer.size = chunkSize; + inBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not decompress chunk 0: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, "chunk 0 did not decompress full frame"); + goto finally; + } + + buffer1ContentSize = outBuffer.pos; + + /* Special case of a simple chain. */ + if (1 == chunksLen) { + result = PyBytes_FromStringAndSize(buffer1, buffer1Size); + goto finally; + } + + /* This should ideally look at next chunk. But this is slightly simpler. */ + buffer2Size = (size_t)frameHeader.frameContentSize; + buffer2 = PyMem_Malloc(buffer2Size); + if (!buffer2) { + goto finally; + } + + /* For each subsequent chunk, use the previous fulltext as a content + dictionary. Our strategy is to have 2 buffers. One holds the previous + fulltext (to be used as a content dictionary) and the other holds the new + fulltext. The buffers grow when needed but never decrease in size. This + limits the memory allocator overhead. + */ + for (chunkIndex = 1; chunkIndex < chunksLen; chunkIndex++) { + chunk = PyList_GetItem(chunks, chunkIndex); + if (!PyBytes_Check(chunk)) { + PyErr_Format(PyExc_ValueError, "chunk %zd must be bytes", + chunkIndex); + goto finally; + } + + PyBytes_AsStringAndSize(chunk, &chunkData, &chunkSize); + zresult = + ZSTD_getFrameHeader(&frameHeader, (void *)chunkData, chunkSize); + if (ZSTD_isError(zresult)) { + PyErr_Format(PyExc_ValueError, + "chunk %zd is not a valid zstd frame", chunkIndex); + goto finally; + } + else if (zresult) { + PyErr_Format(PyExc_ValueError, + "chunk %zd is too small to contain a zstd frame", + chunkIndex); + goto finally; + } + + if (ZSTD_CONTENTSIZE_UNKNOWN == frameHeader.frameContentSize) { + PyErr_Format(PyExc_ValueError, + "chunk %zd missing content size in frame", chunkIndex); + goto finally; + } + + assert(ZSTD_CONTENTSIZE_ERROR != frameHeader.frameContentSize); + + if (frameHeader.frameContentSize > PY_SSIZE_T_MAX) { + PyErr_Format( + PyExc_ValueError, + "chunk %zd is too large to decompress on this platform", + chunkIndex); + goto finally; + } + + inBuffer.src = chunkData; + inBuffer.size = chunkSize; + inBuffer.pos = 0; + + parity = chunkIndex % 2; + + /* This could definitely be abstracted to reduce code duplication. */ + if (parity) { + /* Resize destination buffer to hold larger content. */ + if (buffer2Size < frameHeader.frameContentSize) { + buffer2Size = (size_t)frameHeader.frameContentSize; + destBuffer = PyMem_Realloc(buffer2, buffer2Size); + if (!destBuffer) { + goto finally; + } + buffer2 = destBuffer; + } + + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_DCtx_refPrefix_advanced( + self->dctx, buffer1, buffer1ContentSize, ZSTD_dct_rawContent); + Py_END_ALLOW_THREADS if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, + "failed to load prefix dictionary at chunk %zd", + chunkIndex); + goto finally; + } + + outBuffer.dst = buffer2; + outBuffer.size = buffer2Size; + outBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not decompress chunk %zd: %s", + chunkIndex, ZSTD_getErrorName(zresult)); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, + "chunk %zd did not decompress full frame", + chunkIndex); + goto finally; + } + + buffer2ContentSize = outBuffer.pos; + } + else { + if (buffer1Size < frameHeader.frameContentSize) { + buffer1Size = (size_t)frameHeader.frameContentSize; + destBuffer = PyMem_Realloc(buffer1, buffer1Size); + if (!destBuffer) { + goto finally; + } + buffer1 = destBuffer; + } + + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_DCtx_refPrefix_advanced( + self->dctx, buffer2, buffer2ContentSize, ZSTD_dct_rawContent); + Py_END_ALLOW_THREADS if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, + "failed to load prefix dictionary at chunk %zd", + chunkIndex); + goto finally; + } + + outBuffer.dst = buffer1; + outBuffer.size = buffer1Size; + outBuffer.pos = 0; + + Py_BEGIN_ALLOW_THREADS zresult = + ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer); + Py_END_ALLOW_THREADS if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not decompress chunk %zd: %s", + chunkIndex, ZSTD_getErrorName(zresult)); + goto finally; + } + else if (zresult) { + PyErr_Format(ZstdError, + "chunk %zd did not decompress full frame", + chunkIndex); + goto finally; + } + + buffer1ContentSize = outBuffer.pos; + } + } + + result = PyBytes_FromStringAndSize(parity ? buffer2 : buffer1, + parity ? buffer2ContentSize + : buffer1ContentSize); + +finally: + if (buffer2) { + PyMem_Free(buffer2); + } + if (buffer1) { + PyMem_Free(buffer1); + } + + return result; +} + +typedef struct { + void *sourceData; + size_t sourceSize; + size_t destSize; +} FramePointer; + +typedef struct { + FramePointer *frames; + Py_ssize_t framesSize; + unsigned long long compressedSize; +} FrameSources; + +typedef struct { + void *dest; + Py_ssize_t destSize; + BufferSegment *segments; + Py_ssize_t segmentsSize; +} DecompressorDestBuffer; + +typedef enum { + DecompressorWorkerError_none = 0, + DecompressorWorkerError_zstd = 1, + DecompressorWorkerError_memory = 2, + DecompressorWorkerError_sizeMismatch = 3, + DecompressorWorkerError_unknownSize = 4, +} DecompressorWorkerError; + +typedef struct { + /* Source records and length */ + FramePointer *framePointers; + /* Which records to process. */ + Py_ssize_t startOffset; + Py_ssize_t endOffset; + unsigned long long totalSourceSize; + + /* Compression state and settings. */ + ZSTD_DCtx *dctx; + int requireOutputSizes; + + /* Output storage. */ + DecompressorDestBuffer *destBuffers; + Py_ssize_t destCount; + + /* Item that error occurred on. */ + Py_ssize_t errorOffset; + /* If an error occurred. */ + DecompressorWorkerError error; + /* result from zstd decompression operation */ + size_t zresult; +} DecompressorWorkerState; + +#ifdef HAVE_ZSTD_POOL_APIS +static void decompress_worker(DecompressorWorkerState *state) { + size_t allocationSize; + DecompressorDestBuffer *destBuffer; + Py_ssize_t frameIndex; + Py_ssize_t localOffset = 0; + Py_ssize_t currentBufferStartIndex = state->startOffset; + Py_ssize_t remainingItems = state->endOffset - state->startOffset + 1; + void *tmpBuf; + Py_ssize_t destOffset = 0; + FramePointer *framePointers = state->framePointers; + size_t zresult; + + assert(NULL == state->destBuffers); + assert(0 == state->destCount); + assert(state->endOffset - state->startOffset >= 0); + + /* We could get here due to the way work is allocated. Ideally we wouldn't + get here. But that would require a bit of a refactor in the caller. */ + if (state->totalSourceSize > SIZE_MAX) { + state->error = DecompressorWorkerError_memory; + state->errorOffset = 0; + return; + } + + /* + * We need to allocate a buffer to hold decompressed data. How we do this + * depends on what we know about the output. The following scenarios are + * possible: + * + * 1. All structs defining frames declare the output size. + * 2. The decompressed size is embedded within the zstd frame. + * 3. The decompressed size is not stored anywhere. + * + * For now, we only support #1 and #2. + */ + + /* Resolve ouput segments. */ + for (frameIndex = state->startOffset; frameIndex <= state->endOffset; + frameIndex++) { + FramePointer *fp = &framePointers[frameIndex]; + unsigned long long decompressedSize; + + if (0 == fp->destSize) { + decompressedSize = + ZSTD_getFrameContentSize(fp->sourceData, fp->sourceSize); + + if (ZSTD_CONTENTSIZE_ERROR == decompressedSize) { + state->error = DecompressorWorkerError_unknownSize; + state->errorOffset = frameIndex; + return; + } + else if (ZSTD_CONTENTSIZE_UNKNOWN == decompressedSize) { + if (state->requireOutputSizes) { + state->error = DecompressorWorkerError_unknownSize; + state->errorOffset = frameIndex; + return; + } + + /* This will fail the assert for .destSize > 0 below. */ + decompressedSize = 0; + } + + if (decompressedSize > SIZE_MAX) { + state->error = DecompressorWorkerError_memory; + state->errorOffset = frameIndex; + return; + } + + fp->destSize = (size_t)decompressedSize; + } + } + + state->destBuffers = calloc(1, sizeof(DecompressorDestBuffer)); + if (NULL == state->destBuffers) { + state->error = DecompressorWorkerError_memory; + return; + } + + state->destCount = 1; + + destBuffer = &state->destBuffers[state->destCount - 1]; + + assert(framePointers[state->startOffset].destSize > 0); /* For now. */ + + allocationSize = roundpow2((size_t)state->totalSourceSize); + + if (framePointers[state->startOffset].destSize > allocationSize) { + allocationSize = roundpow2(framePointers[state->startOffset].destSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = DecompressorWorkerError_memory; + return; + } + + destBuffer->destSize = allocationSize; + + destBuffer->segments = calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + /* Caller will free state->dest as part of cleanup. */ + state->error = DecompressorWorkerError_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + + for (frameIndex = state->startOffset; frameIndex <= state->endOffset; + frameIndex++) { + ZSTD_outBuffer outBuffer; + ZSTD_inBuffer inBuffer; + const void *source = framePointers[frameIndex].sourceData; + const size_t sourceSize = framePointers[frameIndex].sourceSize; + void *dest; + const size_t decompressedSize = framePointers[frameIndex].destSize; + size_t destAvailable = destBuffer->destSize - destOffset; + + assert(decompressedSize > 0); /* For now. */ + + /* + * Not enough space in current buffer. Finish current before and + * allocate and switch to a new one. + */ + if (decompressedSize > destAvailable) { + /* + * Shrinking the destination buffer is optional. But it should be + * cheap, so we just do it. + */ + if (destAvailable) { + tmpBuf = realloc(destBuffer->dest, destOffset); + if (NULL == tmpBuf) { + state->error = DecompressorWorkerError_memory; + return; + } + + destBuffer->dest = tmpBuf; + destBuffer->destSize = destOffset; + } + + /* Truncate segments buffer. */ + tmpBuf = realloc(destBuffer->segments, + (frameIndex - currentBufferStartIndex) * + sizeof(BufferSegment)); + if (NULL == tmpBuf) { + state->error = DecompressorWorkerError_memory; + return; + } + + destBuffer->segments = tmpBuf; + destBuffer->segmentsSize = frameIndex - currentBufferStartIndex; + + /* Grow space for new DestBuffer. */ + tmpBuf = + realloc(state->destBuffers, (state->destCount + 1) * + sizeof(DecompressorDestBuffer)); + if (NULL == tmpBuf) { + state->error = DecompressorWorkerError_memory; + return; + } + + state->destBuffers = tmpBuf; + state->destCount++; + + destBuffer = &state->destBuffers[state->destCount - 1]; + + /* Don't take any chances will non-NULL pointers. */ + memset(destBuffer, 0, sizeof(DecompressorDestBuffer)); + + allocationSize = roundpow2((size_t)state->totalSourceSize); + + if (decompressedSize > allocationSize) { + allocationSize = roundpow2(decompressedSize); + } + + destBuffer->dest = malloc(allocationSize); + if (NULL == destBuffer->dest) { + state->error = DecompressorWorkerError_memory; + return; + } + + destBuffer->destSize = allocationSize; + destAvailable = allocationSize; + destOffset = 0; + localOffset = 0; + + destBuffer->segments = + calloc(remainingItems, sizeof(BufferSegment)); + if (NULL == destBuffer->segments) { + state->error = DecompressorWorkerError_memory; + return; + } + + destBuffer->segmentsSize = remainingItems; + currentBufferStartIndex = frameIndex; + } + + dest = (char *)destBuffer->dest + destOffset; + + outBuffer.dst = dest; + outBuffer.size = decompressedSize; + outBuffer.pos = 0; + + inBuffer.src = source; + inBuffer.size = sourceSize; + inBuffer.pos = 0; + + zresult = ZSTD_decompressStream(state->dctx, &outBuffer, &inBuffer); + if (ZSTD_isError(zresult)) { + state->error = DecompressorWorkerError_zstd; + state->zresult = zresult; + state->errorOffset = frameIndex; + return; + } + else if (zresult || outBuffer.pos != decompressedSize) { + state->error = DecompressorWorkerError_sizeMismatch; + state->zresult = outBuffer.pos; + state->errorOffset = frameIndex; + return; + } + + destBuffer->segments[localOffset].offset = destOffset; + destBuffer->segments[localOffset].length = outBuffer.pos; + destOffset += outBuffer.pos; + localOffset++; + remainingItems--; + } + + if (destBuffer->destSize > destOffset) { + tmpBuf = realloc(destBuffer->dest, destOffset); + if (NULL == tmpBuf) { + state->error = DecompressorWorkerError_memory; + return; + } + + destBuffer->dest = tmpBuf; + destBuffer->destSize = destOffset; + } +} +#endif + +#ifdef HAVE_ZSTD_POOL_APIS +ZstdBufferWithSegmentsCollection * +decompress_from_framesources(ZstdDecompressor *decompressor, + FrameSources *frames, Py_ssize_t threadCount) { + Py_ssize_t i = 0; + int errored = 0; + Py_ssize_t segmentsCount; + ZstdBufferWithSegments *bws = NULL; + PyObject *resultArg = NULL; + Py_ssize_t resultIndex; + ZstdBufferWithSegmentsCollection *result = NULL; + FramePointer *framePointers = frames->frames; + unsigned long long workerBytes = 0; + Py_ssize_t currentThread = 0; + Py_ssize_t workerStartOffset = 0; + POOL_ctx *pool = NULL; + DecompressorWorkerState *workerStates = NULL; + unsigned long long bytesPerWorker; + + /* Caller should normalize 0 and negative values to 1 or larger. */ + assert(threadCount >= 1); + + /* More threads than inputs makes no sense under any conditions. */ + threadCount = + frames->framesSize < threadCount ? frames->framesSize : threadCount; + + /* TODO lower thread count if input size is too small and threads would just + add overhead. */ + + if (decompressor->dict) { + if (ensure_ddict(decompressor->dict)) { + return NULL; + } + } + + /* If threadCount==1, we don't start a thread pool. But we do leverage the + same API for dispatching work. */ + workerStates = PyMem_Malloc(threadCount * sizeof(DecompressorWorkerState)); + if (NULL == workerStates) { + PyErr_NoMemory(); + goto finally; + } + + memset(workerStates, 0, threadCount * sizeof(DecompressorWorkerState)); + + if (threadCount > 1) { + pool = POOL_create(threadCount, 1); + if (NULL == pool) { + PyErr_SetString(ZstdError, "could not initialize zstd thread pool"); + goto finally; + } + } + + bytesPerWorker = frames->compressedSize / threadCount; + + if (bytesPerWorker > SIZE_MAX) { + PyErr_SetString(ZstdError, + "too much data per worker for this platform"); + goto finally; + } + + for (i = 0; i < threadCount; i++) { + size_t zresult; + + workerStates[i].dctx = ZSTD_createDCtx(); + if (NULL == workerStates[i].dctx) { + PyErr_NoMemory(); + goto finally; + } + + if (decompressor->dict) { + zresult = ZSTD_DCtx_refDDict(workerStates[i].dctx, + decompressor->dict->ddict); + if (zresult) { + PyErr_Format(ZstdError, + "unable to reference prepared dictionary: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + } + + workerStates[i].framePointers = framePointers; + workerStates[i].requireOutputSizes = 1; + } + + Py_BEGIN_ALLOW_THREADS + /* There are many ways to split work among workers. + + For now, we take a simple approach of splitting work so each worker + gets roughly the same number of input bytes. This will result in more + starvation than running N>threadCount jobs. But it avoids + complications around state tracking, which could involve extra + locking. + */ + for (i = 0; i < frames->framesSize; i++) { + workerBytes += frames->frames[i].sourceSize; + + /* + * The last worker/thread needs to handle all remaining work. Don't + * trigger it prematurely. Defer to the block outside of the loop. + * (But still process this loop so workerBytes is correct. + */ + if (currentThread == threadCount - 1) { + continue; + } + + if (workerBytes >= bytesPerWorker) { + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = i; + workerStates[currentThread].totalSourceSize = workerBytes; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)decompress_worker, + &workerStates[currentThread]); + } + else { + decompress_worker(&workerStates[currentThread]); + } + currentThread++; + workerStartOffset = i + 1; + workerBytes = 0; + } + } + + if (workerBytes) { + workerStates[currentThread].startOffset = workerStartOffset; + workerStates[currentThread].endOffset = frames->framesSize - 1; + workerStates[currentThread].totalSourceSize = workerBytes; + + if (threadCount > 1) { + POOL_add(pool, (POOL_function)decompress_worker, + &workerStates[currentThread]); + } + else { + decompress_worker(&workerStates[currentThread]); + } + } + + if (threadCount > 1) { + POOL_free(pool); + pool = NULL; + } + Py_END_ALLOW_THREADS + + for (i = 0; i < threadCount; i++) { + switch (workerStates[i].error) { + case DecompressorWorkerError_none: + break; + + case DecompressorWorkerError_zstd: + PyErr_Format(ZstdError, "error decompressing item %zd: %s", + workerStates[i].errorOffset, + ZSTD_getErrorName(workerStates[i].zresult)); + errored = 1; + break; + + case DecompressorWorkerError_memory: + PyErr_NoMemory(); + errored = 1; + break; + + case DecompressorWorkerError_sizeMismatch: + PyErr_Format(ZstdError, + "error decompressing item %zd: decompressed %zu " + "bytes; expected %zu", + workerStates[i].errorOffset, workerStates[i].zresult, + framePointers[workerStates[i].errorOffset].destSize); + errored = 1; + break; + + case DecompressorWorkerError_unknownSize: + PyErr_Format(PyExc_ValueError, + "could not determine decompressed size of item %zd", + workerStates[i].errorOffset); + errored = 1; + break; + + default: + PyErr_Format(ZstdError, "unhandled error type: %d; this is a bug", + workerStates[i].error); + errored = 1; + break; + } + + if (errored) { + break; + } + } + + if (errored) { + goto finally; + } + + segmentsCount = 0; + for (i = 0; i < threadCount; i++) { + segmentsCount += workerStates[i].destCount; + } + + resultArg = PyTuple_New(segmentsCount); + if (NULL == resultArg) { + goto finally; + } + + resultIndex = 0; + + for (i = 0; i < threadCount; i++) { + Py_ssize_t bufferIndex; + DecompressorWorkerState *state = &workerStates[i]; + + for (bufferIndex = 0; bufferIndex < state->destCount; bufferIndex++) { + DecompressorDestBuffer *destBuffer = + &state->destBuffers[bufferIndex]; + + bws = BufferWithSegments_FromMemory( + destBuffer->dest, destBuffer->destSize, destBuffer->segments, + destBuffer->segmentsSize); + if (NULL == bws) { + goto finally; + } + + /* + * Memory for buffer and segments was allocated using malloc() in + * worker and the memory is transferred to the BufferWithSegments + * instance. So tell instance to use free() and NULL the reference + * in the state struct so it isn't freed below. + */ + bws->useFree = 1; + destBuffer->dest = NULL; + destBuffer->segments = NULL; + + PyTuple_SET_ITEM(resultArg, resultIndex++, (PyObject *)bws); + } + } + + result = (ZstdBufferWithSegmentsCollection *)PyObject_CallObject( + (PyObject *)ZstdBufferWithSegmentsCollectionType, resultArg); + +finally: + Py_CLEAR(resultArg); + + if (workerStates) { + for (i = 0; i < threadCount; i++) { + Py_ssize_t bufferIndex; + DecompressorWorkerState *state = &workerStates[i]; + + if (state->dctx) { + ZSTD_freeDCtx(state->dctx); + } + + for (bufferIndex = 0; bufferIndex < state->destCount; + bufferIndex++) { + if (state->destBuffers) { + /* + * Will be NULL if memory transfered to a + * BufferWithSegments. Otherwise it is left over after an + * error occurred. + */ + free(state->destBuffers[bufferIndex].dest); + free(state->destBuffers[bufferIndex].segments); + } + } + + free(state->destBuffers); + } + + PyMem_Free(workerStates); + } + + POOL_free(pool); + + return result; +} +#endif + +#ifdef HAVE_ZSTD_POOL_APIS +static ZstdBufferWithSegmentsCollection * +Decompressor_multi_decompress_to_buffer(ZstdDecompressor *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"frames", "decompressed_sizes", "threads", NULL}; + + PyObject *frames; + Py_buffer frameSizes; + int threads = 0; + Py_ssize_t frameCount; + Py_buffer *frameBuffers = NULL; + FramePointer *framePointers = NULL; + unsigned long long *frameSizesP = NULL; + unsigned long long totalInputSize = 0; + FrameSources frameSources; + ZstdBufferWithSegmentsCollection *result = NULL; + Py_ssize_t i; + + memset(&frameSizes, 0, sizeof(frameSizes)); + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, + "O|y*i:multi_decompress_to_buffer", kwlist, + &frames, &frameSizes, &threads)) { + return NULL; + } + + if (frameSizes.buf) { + frameSizesP = (unsigned long long *)frameSizes.buf; + } + + if (threads < 0) { + threads = cpu_count(); + } + + if (threads < 2) { + threads = 1; + } + + if (PyObject_TypeCheck(frames, ZstdBufferWithSegmentsType)) { + ZstdBufferWithSegments *buffer = (ZstdBufferWithSegments *)frames; + frameCount = buffer->segmentCount; + + if (frameSizes.buf && + frameSizes.len != + frameCount * (Py_ssize_t)sizeof(unsigned long long)) { + PyErr_Format( + PyExc_ValueError, + "decompressed_sizes size mismatch; expected %zd, got %zd", + frameCount * sizeof(unsigned long long), frameSizes.len); + goto finally; + } + + framePointers = PyMem_Malloc(frameCount * sizeof(FramePointer)); + if (!framePointers) { + PyErr_NoMemory(); + goto finally; + } + + for (i = 0; i < frameCount; i++) { + void *sourceData; + unsigned long long sourceSize; + unsigned long long decompressedSize = 0; + + if (buffer->segments[i].offset + buffer->segments[i].length > + buffer->dataSize) { + PyErr_Format(PyExc_ValueError, + "item %zd has offset outside memory area", i); + goto finally; + } + + sourceData = (char *)buffer->data + buffer->segments[i].offset; + sourceSize = buffer->segments[i].length; + totalInputSize += sourceSize; + + if (frameSizesP) { + decompressedSize = frameSizesP[i]; + } + + if (sourceSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "item %zd is too large for this platform", i); + goto finally; + } + + if (decompressedSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "decompressed size of item %zd is too large for " + "this platform", + i); + goto finally; + } + + framePointers[i].sourceData = sourceData; + framePointers[i].sourceSize = (size_t)sourceSize; + framePointers[i].destSize = (size_t)decompressedSize; + } + } + else if (PyObject_TypeCheck(frames, ZstdBufferWithSegmentsCollectionType)) { + Py_ssize_t offset = 0; + ZstdBufferWithSegments *buffer; + ZstdBufferWithSegmentsCollection *collection = + (ZstdBufferWithSegmentsCollection *)frames; + + frameCount = BufferWithSegmentsCollection_length(collection); + + if (frameSizes.buf && frameSizes.len != frameCount) { + PyErr_Format( + PyExc_ValueError, + "decompressed_sizes size mismatch; expected %zd; got %zd", + frameCount * sizeof(unsigned long long), frameSizes.len); + goto finally; + } + + framePointers = PyMem_Malloc(frameCount * sizeof(FramePointer)); + if (NULL == framePointers) { + PyErr_NoMemory(); + goto finally; + } + + /* Iterate the data structure directly because it is faster. */ + for (i = 0; i < collection->bufferCount; i++) { + Py_ssize_t segmentIndex; + buffer = collection->buffers[i]; + + for (segmentIndex = 0; segmentIndex < buffer->segmentCount; + segmentIndex++) { + unsigned long long decompressedSize = + frameSizesP ? frameSizesP[offset] : 0; + + if (buffer->segments[segmentIndex].offset + + buffer->segments[segmentIndex].length > + buffer->dataSize) { + PyErr_Format(PyExc_ValueError, + "item %zd has offset outside memory area", + offset); + goto finally; + } + + if (buffer->segments[segmentIndex].length > SIZE_MAX) { + PyErr_Format( + PyExc_ValueError, + "item %zd in buffer %zd is too large for this platform", + segmentIndex, i); + goto finally; + } + + if (decompressedSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "decompressed size of item %zd in buffer %zd " + "is too large for this platform", + segmentIndex, i); + goto finally; + } + + totalInputSize += buffer->segments[segmentIndex].length; + + framePointers[offset].sourceData = + (char *)buffer->data + + buffer->segments[segmentIndex].offset; + framePointers[offset].sourceSize = + (size_t)buffer->segments[segmentIndex].length; + framePointers[offset].destSize = (size_t)decompressedSize; + + offset++; + } + } + } + else if (PyList_Check(frames)) { + frameCount = PyList_GET_SIZE(frames); + + if (frameSizes.buf && + frameSizes.len != + frameCount * (Py_ssize_t)sizeof(unsigned long long)) { + PyErr_Format( + PyExc_ValueError, + "decompressed_sizes size mismatch; expected %zd, got %zd", + frameCount * sizeof(unsigned long long), frameSizes.len); + goto finally; + } + + framePointers = PyMem_Malloc(frameCount * sizeof(FramePointer)); + if (!framePointers) { + PyErr_NoMemory(); + goto finally; + } + + frameBuffers = PyMem_Malloc(frameCount * sizeof(Py_buffer)); + if (NULL == frameBuffers) { + PyErr_NoMemory(); + goto finally; + } + + memset(frameBuffers, 0, frameCount * sizeof(Py_buffer)); + + /* Do a pass to assemble info about our input buffers and output sizes. + */ + for (i = 0; i < frameCount; i++) { + unsigned long long decompressedSize = + frameSizesP ? frameSizesP[i] : 0; + + if (0 != PyObject_GetBuffer(PyList_GET_ITEM(frames, i), + &frameBuffers[i], PyBUF_CONTIG_RO)) { + PyErr_Clear(); + PyErr_Format(PyExc_TypeError, + "item %zd not a bytes like object", i); + goto finally; + } + + if (decompressedSize > SIZE_MAX) { + PyErr_Format(PyExc_ValueError, + "decompressed size of item %zd is too large for " + "this platform", + i); + goto finally; + } + + totalInputSize += frameBuffers[i].len; + + framePointers[i].sourceData = frameBuffers[i].buf; + framePointers[i].sourceSize = frameBuffers[i].len; + framePointers[i].destSize = (size_t)decompressedSize; + } + } + else { + PyErr_SetString(PyExc_TypeError, + "argument must be list or BufferWithSegments"); + goto finally; + } + + /* We now have an array with info about our inputs and outputs. Feed it into + our generic decompression function. */ + frameSources.frames = framePointers; + frameSources.framesSize = frameCount; + frameSources.compressedSize = totalInputSize; + + result = decompress_from_framesources(self, &frameSources, threads); + +finally: + if (frameSizes.buf) { + PyBuffer_Release(&frameSizes); + } + PyMem_Free(framePointers); + + if (frameBuffers) { + for (i = 0; i < frameCount; i++) { + PyBuffer_Release(&frameBuffers[i]); + } + + PyMem_Free(frameBuffers); + } + + return result; +} +#endif + +static PyMethodDef Decompressor_methods[] = { + {"copy_stream", (PyCFunction)Decompressor_copy_stream, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"decompress", (PyCFunction)Decompressor_decompress, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"decompressobj", (PyCFunction)Decompressor_decompressobj, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"read_to_iter", (PyCFunction)Decompressor_read_to_iter, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"stream_reader", (PyCFunction)Decompressor_stream_reader, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"stream_writer", (PyCFunction)Decompressor_stream_writer, + METH_VARARGS | METH_KEYWORDS, NULL}, + {"decompress_content_dict_chain", + (PyCFunction)Decompressor_decompress_content_dict_chain, + METH_VARARGS | METH_KEYWORDS, NULL}, +#ifdef HAVE_ZSTD_POOL_APIS + {"multi_decompress_to_buffer", + (PyCFunction)Decompressor_multi_decompress_to_buffer, + METH_VARARGS | METH_KEYWORDS, NULL}, +#endif + {"memory_size", (PyCFunction)Decompressor_memory_size, METH_NOARGS, NULL}, + {NULL, NULL}}; + +PyType_Slot ZstdDecompressorSlots[] = { + {Py_tp_dealloc, Decompressor_dealloc}, + {Py_tp_methods, Decompressor_methods}, + {Py_tp_init, Decompressor_init}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdDecompressorSpec = { + "zstd.ZstdDecompressor", + sizeof(ZstdDecompressor), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdDecompressorSlots, +}; + +PyTypeObject *ZstdDecompressorType; + +void decompressor_module_init(PyObject *mod) { + ZstdDecompressorType = + (PyTypeObject *)PyType_FromSpec(&ZstdDecompressorSpec); + if (PyType_Ready(ZstdDecompressorType) < 0) { + return; + } + + Py_INCREF((PyObject *)ZstdDecompressorType); + PyModule_AddObject(mod, "ZstdDecompressor", + (PyObject *)ZstdDecompressorType); +} diff --git a/contrib/python/zstandard/py3/c-ext/decompressoriterator.c b/contrib/python/zstandard/py3/c-ext/decompressoriterator.c new file mode 100644 index 00000000000..e575a7a1016 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/decompressoriterator.c @@ -0,0 +1,228 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +#define min(a, b) (((a) < (b)) ? (a) : (b)) + +extern PyObject *ZstdError; + +static void ZstdDecompressorIterator_dealloc(ZstdDecompressorIterator *self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->reader); + + if (self->buffer.buf) { + PyBuffer_Release(&self->buffer); + memset(&self->buffer, 0, sizeof(self->buffer)); + } + + if (self->input.src) { + PyMem_Free((void *)self->input.src); + self->input.src = NULL; + } + + PyObject_Del(self); +} + +static PyObject *ZstdDecompressorIterator_iter(PyObject *self) { + Py_INCREF(self); + return self; +} + +static DecompressorIteratorResult +read_decompressor_iterator(ZstdDecompressorIterator *self) { + size_t zresult; + PyObject *chunk; + DecompressorIteratorResult result; + size_t oldInputPos = self->input.pos; + + result.chunk = NULL; + + chunk = PyBytes_FromStringAndSize(NULL, self->outSize); + if (!chunk) { + result.errored = 1; + return result; + } + + self->output.dst = PyBytes_AsString(chunk); + self->output.size = self->outSize; + self->output.pos = 0; + + Py_BEGIN_ALLOW_THREADS zresult = ZSTD_decompressStream( + self->decompressor->dctx, &self->output, &self->input); + Py_END_ALLOW_THREADS + + /* We're done with the pointer. Nullify to prevent anyone from getting a + handle on a Python object. */ + self->output.dst = NULL; + + if (ZSTD_isError(zresult)) { + Py_DECREF(chunk); + PyErr_Format(ZstdError, "zstd decompress error: %s", + ZSTD_getErrorName(zresult)); + result.errored = 1; + return result; + } + + self->readCount += self->input.pos - oldInputPos; + + /* Frame is fully decoded. Input exhausted and output sitting in buffer. */ + if (0 == zresult) { + self->finishedInput = 1; + self->finishedOutput = 1; + } + + /* If it produced output data, return it. */ + if (self->output.pos) { + if (self->output.pos < self->outSize) { + if (safe_pybytes_resize(&chunk, self->output.pos)) { + Py_XDECREF(chunk); + result.errored = 1; + return result; + } + } + } + else { + Py_DECREF(chunk); + chunk = NULL; + } + + result.errored = 0; + result.chunk = chunk; + + return result; +} + +static PyObject * +ZstdDecompressorIterator_iternext(ZstdDecompressorIterator *self) { + PyObject *readResult = NULL; + char *readBuffer; + Py_ssize_t readSize; + Py_ssize_t bufferRemaining; + DecompressorIteratorResult result; + + if (self->finishedOutput) { + PyErr_SetString(PyExc_StopIteration, "output flushed"); + return NULL; + } + + /* If we have data left in the input, consume it. */ + if (self->input.pos < self->input.size) { + result = read_decompressor_iterator(self); + if (result.chunk || result.errored) { + return result.chunk; + } + + /* Else fall through to get more data from input. */ + } + +read_from_source: + + if (!self->finishedInput) { + if (self->reader) { + readResult = + PyObject_CallMethod(self->reader, "read", "I", self->inSize); + if (!readResult) { + return NULL; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + } + else { + assert(self->buffer.buf); + + /* Only support contiguous C arrays for now */ + assert(self->buffer.strides == NULL && + self->buffer.suboffsets == NULL); + assert(self->buffer.itemsize == 1); + + /* TODO avoid memcpy() below */ + readBuffer = (char *)self->buffer.buf + self->bufferOffset; + bufferRemaining = self->buffer.len - self->bufferOffset; + readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); + self->bufferOffset += readSize; + } + + if (readSize) { + if (!self->readCount && self->skipBytes) { + assert(self->skipBytes < self->inSize); + if ((Py_ssize_t)self->skipBytes >= readSize) { + PyErr_SetString(PyExc_ValueError, + "skip_bytes larger than first input chunk; " + "this scenario is currently unsupported"); + Py_XDECREF(readResult); + return NULL; + } + + readBuffer = readBuffer + self->skipBytes; + readSize -= self->skipBytes; + } + + /* Copy input into previously allocated buffer because it can live + longer than a single function call and we don't want to keep a ref + to a Python object around. This could be changed... */ + memcpy((void *)self->input.src, readBuffer, readSize); + self->input.size = readSize; + self->input.pos = 0; + } + /* No bytes on first read must mean an empty input stream. */ + else if (!self->readCount) { + self->finishedInput = 1; + self->finishedOutput = 1; + Py_XDECREF(readResult); + PyErr_SetString(PyExc_StopIteration, "empty input"); + return NULL; + } + else { + self->finishedInput = 1; + } + + /* We've copied the data managed by memory. Discard the Python object. + */ + Py_XDECREF(readResult); + } + + result = read_decompressor_iterator(self); + if (result.errored || result.chunk) { + return result.chunk; + } + + /* No new output data. Try again unless we know there is no more data. */ + if (!self->finishedInput) { + goto read_from_source; + } + + PyErr_SetString(PyExc_StopIteration, "input exhausted"); + return NULL; +} + +PyType_Slot ZstdDecompressorIteratorSlots[] = { + {Py_tp_dealloc, ZstdDecompressorIterator_dealloc}, + {Py_tp_iter, ZstdDecompressorIterator_iter}, + {Py_tp_iternext, ZstdDecompressorIterator_iternext}, + {Py_tp_new, PyType_GenericNew}, + {0, NULL}, +}; + +PyType_Spec ZstdDecompressorIteratorSpec = { + "zstd.ZstdDecompressorIterator", + sizeof(ZstdDecompressorIterator), + 0, + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, + ZstdDecompressorIteratorSlots, +}; + +PyTypeObject *ZstdDecompressorIteratorType; + +void decompressoriterator_module_init(PyObject *mod) { + ZstdDecompressorIteratorType = + (PyTypeObject *)PyType_FromSpec(&ZstdDecompressorIteratorSpec); + if (PyType_Ready(ZstdDecompressorIteratorType) < 0) { + return; + } +} diff --git a/contrib/python/zstandard/py3/c-ext/frameparams.c b/contrib/python/zstandard/py3/c-ext/frameparams.c new file mode 100644 index 00000000000..951a91d6319 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/frameparams.c @@ -0,0 +1,97 @@ +/** + * Copyright (c) 2017-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#include "python-zstandard.h" + +extern PyObject *ZstdError; + +FrameParametersObject *get_frame_parameters(PyObject *self, PyObject *args, + PyObject *kwargs) { + static char *kwlist[] = {"data", NULL}; + + Py_buffer source; + ZSTD_frameHeader header; + FrameParametersObject *result = NULL; + size_t zresult; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*:get_frame_parameters", + kwlist, &source)) { + return NULL; + } + + zresult = ZSTD_getFrameHeader(&header, source.buf, source.len); + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "cannot get frame parameters: %s", + ZSTD_getErrorName(zresult)); + goto finally; + } + + if (zresult) { + PyErr_Format(ZstdError, + "not enough data for frame parameters; need %zu bytes", + zresult); + goto finally; + } + + result = PyObject_New(FrameParametersObject, FrameParametersType); + if (!result) { + goto finally; + } + + result->frameContentSize = header.frameContentSize; + result->windowSize = header.windowSize; + result->dictID = header.dictID; + result->checksumFlag = header.checksumFlag ? 1 : 0; + +finally: + PyBuffer_Release(&source); + return result; +} + +static void FrameParameters_dealloc(PyObject *self) { + PyObject_Del(self); +} + +static PyMemberDef FrameParameters_members[] = { + {"content_size", T_ULONGLONG, + offsetof(FrameParametersObject, frameContentSize), READONLY, + "frame content size"}, + {"window_size", T_ULONGLONG, offsetof(FrameParametersObject, windowSize), + READONLY, "window size"}, + {"dict_id", T_UINT, offsetof(FrameParametersObject, dictID), READONLY, + "dictionary ID"}, + {"has_checksum", T_BOOL, offsetof(FrameParametersObject, checksumFlag), + READONLY, "checksum flag"}, + {NULL}}; + +PyType_Slot FrameParametersSlots[] = { + {Py_tp_dealloc, FrameParameters_dealloc}, + {Py_tp_members, FrameParameters_members}, + {0, NULL}, +}; + +PyType_Spec FrameParametersSpec = { + "zstd.FrameParameters", + sizeof(FrameParametersObject), + 0, + Py_TPFLAGS_DEFAULT, + FrameParametersSlots, +}; + +PyTypeObject *FrameParametersType; + +void frameparams_module_init(PyObject *mod) { + FrameParametersType = (PyTypeObject *)PyType_FromSpec(&FrameParametersSpec); + if (PyType_Ready(FrameParametersType) < 0) { + return; + } + + Py_INCREF(FrameParametersType); + PyModule_AddObject(mod, "FrameParameters", (PyObject *)FrameParametersType); +} diff --git a/contrib/python/zstandard/py3/c-ext/python-zstandard.h b/contrib/python/zstandard/py3/c-ext/python-zstandard.h new file mode 100644 index 00000000000..a58261d0e40 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/python-zstandard.h @@ -0,0 +1,392 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +#ifndef PYTHON_ZSTANDARD_H +#define PYTHON_ZSTANDARD_H + +#define PY_SSIZE_T_CLEAN +#include <Python.h> +#include <pythoncapi_compat.h> +#include "structmember.h" + +#define ZSTD_STATIC_LINKING_ONLY +#define ZDICT_STATIC_LINKING_ONLY + +#ifdef ZSTD_SINGLE_FILE +#error #include <zstd.c> + +/* We use private APIs from pool.h. We can't rely on availability + of this header or symbols when linking against the system libzstd. + But we know it works when using the bundled single file library. */ +#define HAVE_ZSTD_POOL_APIS +#else +#include <zdict.h> +#include <zstd.h> +#endif + +/* Remember to change the string in zstandard/__init__.py, rust-ext/src/lib.rs, + and debian/changelog as well */ +#define PYTHON_ZSTANDARD_VERSION "0.21.0" + +typedef enum { + compressorobj_flush_finish, + compressorobj_flush_block, +} CompressorObj_Flush; + +/* + Represents a ZstdCompressionParameters type. + + This type holds all the low-level compression parameters that can be set. +*/ +typedef struct { + PyObject_HEAD ZSTD_CCtx_params *params; +} ZstdCompressionParametersObject; + +extern PyTypeObject *ZstdCompressionParametersType; + +/* + Represents a FrameParameters type. + + This type is basically a wrapper around ZSTD_frameParams. +*/ +typedef struct { + PyObject_HEAD unsigned long long frameContentSize; + unsigned long long windowSize; + unsigned dictID; + char checksumFlag; +} FrameParametersObject; + +extern PyTypeObject *FrameParametersType; + +/* + Represents a ZstdCompressionDict type. + + Instances hold data used for a zstd compression dictionary. +*/ +typedef struct { + PyObject_HEAD + + /* Pointer to dictionary data. Owned by self. */ + void *dictData; + /* Size of dictionary data. */ + size_t dictSize; + ZSTD_dictContentType_e dictType; + /* k parameter for cover dictionaries. Only populated by train_cover_dict(). + */ + unsigned k; + /* d parameter for cover dictionaries. Only populated by train_cover_dict(). + */ + unsigned d; + /* Digested dictionary, suitable for reuse. */ + ZSTD_CDict *cdict; + ZSTD_DDict *ddict; +} ZstdCompressionDict; + +extern PyTypeObject *ZstdCompressionDictType; + +/* + Represents a ZstdCompressor type. +*/ +typedef struct { + PyObject_HEAD + + /* Number of threads to use for operations. */ + unsigned int threads; + /* Pointer to compression dictionary to use. NULL if not using dictionary + compression. */ + ZstdCompressionDict *dict; + /* Compression context to use. Populated during object construction. */ + ZSTD_CCtx *cctx; + /* Compression parameters in use. */ + ZSTD_CCtx_params *params; +} ZstdCompressor; + +extern PyTypeObject *ZstdCompressorType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor *compressor; + ZSTD_outBuffer output; + int finished; +} ZstdCompressionObj; + +extern PyTypeObject *ZstdCompressionObjType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor *compressor; + PyObject *writer; + ZSTD_outBuffer output; + size_t outSize; + int entered; + int closing; + char closed; + int writeReturnRead; + int closefd; + unsigned long long bytesCompressed; +} ZstdCompressionWriter; + +extern PyTypeObject *ZstdCompressionWriterType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor *compressor; + PyObject *reader; + Py_buffer buffer; + Py_ssize_t bufferOffset; + size_t inSize; + size_t outSize; + + ZSTD_inBuffer input; + ZSTD_outBuffer output; + int finishedOutput; + int finishedInput; + PyObject *readResult; +} ZstdCompressorIterator; + +extern PyTypeObject *ZstdCompressorIteratorType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor *compressor; + PyObject *reader; + Py_buffer buffer; + size_t readSize; + int closefd; + + int entered; + char closed; + unsigned long long bytesCompressed; + + ZSTD_inBuffer input; + ZSTD_outBuffer output; + int finishedInput; + int finishedOutput; + PyObject *readResult; +} ZstdCompressionReader; + +extern PyTypeObject *ZstdCompressionReaderType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor *compressor; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_buffer inBuffer; + int finished; + size_t chunkSize; +} ZstdCompressionChunker; + +extern PyTypeObject *ZstdCompressionChunkerType; + +typedef enum { + compressionchunker_mode_normal, + compressionchunker_mode_flush, + compressionchunker_mode_finish, +} CompressionChunkerMode; + +typedef struct { + PyObject_HEAD + + ZstdCompressionChunker *chunker; + CompressionChunkerMode mode; +} ZstdCompressionChunkerIterator; + +extern PyTypeObject *ZstdCompressionChunkerIteratorType; + +typedef struct { + PyObject_HEAD + + ZSTD_DCtx *dctx; + ZstdCompressionDict *dict; + size_t maxWindowSize; + ZSTD_format_e format; +} ZstdDecompressor; + +extern PyTypeObject *ZstdDecompressorType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor *decompressor; + size_t outSize; + int finished; + PyObject *unused_data; +} ZstdDecompressionObj; + +extern PyTypeObject *ZstdDecompressionObjType; + +typedef struct { + PyObject_HEAD + + /* Parent decompressor to which this object is associated. */ + ZstdDecompressor *decompressor; + /* Object to read() from (if reading from a stream). */ + PyObject *reader; + /* Size for read() operations on reader. */ + size_t readSize; + /* Whether a read() can return data spanning multiple zstd frames. */ + int readAcrossFrames; + /* Buffer to read from (if reading from a buffer). */ + Py_buffer buffer; + /* Whether to close the inner object on close() */ + int closefd; + + /* Whether the context manager is active. */ + int entered; + /* Whether we've closed the stream. */ + char closed; + + /* Number of bytes decompressed and returned to user. */ + unsigned long long bytesDecompressed; + + /* Tracks data going into decompressor. */ + ZSTD_inBuffer input; + + /* Holds output from read() operation on reader. */ + PyObject *readResult; + + /* Whether all input has been sent to the decompressor. */ + int finishedInput; + /* Whether all output has been flushed from the decompressor. */ + int finishedOutput; +} ZstdDecompressionReader; + +extern PyTypeObject *ZstdDecompressionReaderType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor *decompressor; + PyObject *writer; + size_t outSize; + int entered; + int closing; + char closed; + int writeReturnRead; + int closefd; +} ZstdDecompressionWriter; + +extern PyTypeObject *ZstdDecompressionWriterType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor *decompressor; + PyObject *reader; + Py_buffer buffer; + Py_ssize_t bufferOffset; + size_t inSize; + size_t outSize; + size_t skipBytes; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t readCount; + int finishedInput; + int finishedOutput; +} ZstdDecompressorIterator; + +extern PyTypeObject *ZstdDecompressorIteratorType; + +typedef struct { + int errored; + PyObject *chunk; +} DecompressorIteratorResult; + +typedef struct { + /* The public API is that these are 64-bit unsigned integers. So these can't + * be size_t, even though values larger than SIZE_MAX or PY_SSIZE_T_MAX may + * be nonsensical for this platform. */ + unsigned long long offset; + unsigned long long length; +} BufferSegment; + +typedef struct { + PyObject_HEAD + + PyObject *parent; + BufferSegment *segments; + Py_ssize_t segmentCount; +} ZstdBufferSegments; + +extern PyTypeObject *ZstdBufferSegmentsType; + +typedef struct { + PyObject_HEAD + + PyObject *parent; + void *data; + Py_ssize_t dataSize; + unsigned long long offset; +} ZstdBufferSegment; + +extern PyTypeObject *ZstdBufferSegmentType; + +typedef struct { + PyObject_HEAD + + Py_buffer parent; + void *data; + unsigned long long dataSize; + BufferSegment *segments; + Py_ssize_t segmentCount; + int useFree; +} ZstdBufferWithSegments; + +extern PyTypeObject *ZstdBufferWithSegmentsType; + +/** + * An ordered collection of BufferWithSegments exposed as a squashed collection. + * + * This type provides a virtual view spanning multiple BufferWithSegments + * instances. It allows multiple instances to be "chained" together and + * exposed as a single collection. e.g. if there are 2 buffers holding + * 10 segments each, then o[14] will access the 5th segment in the 2nd buffer. + */ +typedef struct { + PyObject_HEAD + + /* An array of buffers that should be exposed through this instance. */ + ZstdBufferWithSegments **buffers; + /* Number of elements in buffers array. */ + Py_ssize_t bufferCount; + /* Array of first offset in each buffer instance. 0th entry corresponds + to number of elements in the 0th buffer. 1st entry corresponds to the + sum of elements in 0th and 1st buffers. */ + Py_ssize_t *firstElements; +} ZstdBufferWithSegmentsCollection; + +extern PyTypeObject *ZstdBufferWithSegmentsCollectionType; + +int set_parameter(ZSTD_CCtx_params *params, ZSTD_cParameter param, int value); +int set_parameters(ZSTD_CCtx_params *params, + ZstdCompressionParametersObject *obj); +int to_cparams(ZstdCompressionParametersObject *params, + ZSTD_compressionParameters *cparams); +FrameParametersObject *get_frame_parameters(PyObject *self, PyObject *args, + PyObject *kwargs); +int ensure_ddict(ZstdCompressionDict *dict); +int ensure_dctx(ZstdDecompressor *decompressor, int loadDict); +ZstdCompressionDict *train_dictionary(PyObject *self, PyObject *args, + PyObject *kwargs); +ZstdBufferWithSegments * +BufferWithSegments_FromMemory(void *data, unsigned long long dataSize, + BufferSegment *segments, Py_ssize_t segmentsSize); +Py_ssize_t +BufferWithSegmentsCollection_length(ZstdBufferWithSegmentsCollection *); +int cpu_count(void); +size_t roundpow2(size_t); +int safe_pybytes_resize(PyObject **obj, Py_ssize_t size); +void set_io_unsupported_operation(void); + +#endif diff --git a/contrib/python/zstandard/py3/c-ext/pythoncapi_compat.h b/contrib/python/zstandard/py3/c-ext/pythoncapi_compat.h new file mode 100644 index 00000000000..450d7ed6e80 --- /dev/null +++ b/contrib/python/zstandard/py3/c-ext/pythoncapi_compat.h @@ -0,0 +1,307 @@ +// Header file providing new functions of the Python C API to old Python +// versions. +// +// File distributed under the MIT license. +// +// Homepage: +// https://github.com/pythoncapi/pythoncapi_compat +// +// Latest version: +// https://raw.githubusercontent.com/pythoncapi/pythoncapi_compat/master/pythoncapi_compat.h +// +// SPDX-License-Identifier: MIT + +#ifndef PYTHONCAPI_COMPAT +#define PYTHONCAPI_COMPAT + +#ifdef __cplusplus +extern "C" { +#endif + +#include <Python.h> +#include "frameobject.h" // PyFrameObject, PyFrame_GetBack() + + +// Compatibility with Visual Studio 2013 and older which don't support +// the inline keyword in C (only in C++): use __inline instead. +#if (defined(_MSC_VER) && _MSC_VER < 1900 \ + && !defined(__cplusplus) && !defined(inline)) +# define inline __inline +# define PYTHONCAPI_COMPAT_MSC_INLINE + // These two macros are undefined at the end of this file +#endif + + +// Cast argument to PyObject* type. +#ifndef _PyObject_CAST +# define _PyObject_CAST(op) ((PyObject*)(op)) +#endif +#ifndef _PyObject_CAST_CONST +# define _PyObject_CAST_CONST(op) ((const PyObject*)(op)) +#endif + + +// bpo-42262 added Py_NewRef() to Python 3.10.0a3 +#if PY_VERSION_HEX < 0x030A00A3 && !defined(Py_NewRef) +static inline PyObject* _Py_NewRef(PyObject *obj) +{ + Py_INCREF(obj); + return obj; +} +#define Py_NewRef(obj) _Py_NewRef(_PyObject_CAST(obj)) +#endif + + +// bpo-42262 added Py_XNewRef() to Python 3.10.0a3 +#if PY_VERSION_HEX < 0x030A00A3 && !defined(Py_XNewRef) +static inline PyObject* _Py_XNewRef(PyObject *obj) +{ + Py_XINCREF(obj); + return obj; +} +#define Py_XNewRef(obj) _Py_XNewRef(_PyObject_CAST(obj)) +#endif + + +// bpo-39573 added Py_SET_REFCNT() to Python 3.9.0a4 +#if PY_VERSION_HEX < 0x030900A4 && !defined(Py_SET_REFCNT) +static inline void _Py_SET_REFCNT(PyObject *ob, Py_ssize_t refcnt) +{ + ob->ob_refcnt = refcnt; +} +#define Py_SET_REFCNT(ob, refcnt) _Py_SET_REFCNT(_PyObject_CAST(ob), refcnt) +#endif + + +// bpo-39573 added Py_SET_TYPE() to Python 3.9.0a4 +#if PY_VERSION_HEX < 0x030900A4 && !defined(Py_SET_TYPE) +static inline void +_Py_SET_TYPE(PyObject *ob, PyTypeObject *type) +{ + ob->ob_type = type; +} +#define Py_SET_TYPE(ob, type) _Py_SET_TYPE(_PyObject_CAST(ob), type) +#endif + + +// bpo-39573 added Py_SET_SIZE() to Python 3.9.0a4 +#if PY_VERSION_HEX < 0x030900A4 && !defined(Py_SET_SIZE) +static inline void +_Py_SET_SIZE(PyVarObject *ob, Py_ssize_t size) +{ + ob->ob_size = size; +} +#define Py_SET_SIZE(ob, size) _Py_SET_SIZE((PyVarObject*)(ob), size) +#endif + + +// bpo-40421 added PyFrame_GetCode() to Python 3.9.0b1 +#if PY_VERSION_HEX < 0x030900B1 +static inline PyCodeObject* +PyFrame_GetCode(PyFrameObject *frame) +{ + PyCodeObject *code; + assert(frame != NULL); + code = frame->f_code; + assert(code != NULL); + Py_INCREF(code); + return code; +} +#endif + +static inline PyCodeObject* +_PyFrame_GetCodeBorrow(PyFrameObject *frame) +{ + PyCodeObject *code = PyFrame_GetCode(frame); + Py_DECREF(code); + return code; // borrowed reference +} + + +// bpo-40421 added PyFrame_GetCode() to Python 3.9.0b1 +#if PY_VERSION_HEX < 0x030900B1 +static inline PyFrameObject* +PyFrame_GetBack(PyFrameObject *frame) +{ + PyFrameObject *back; + assert(frame != NULL); + back = frame->f_back; + Py_XINCREF(back); + return back; +} +#endif + +static inline PyFrameObject* +_PyFrame_GetBackBorrow(PyFrameObject *frame) +{ + PyFrameObject *back = PyFrame_GetBack(frame); + Py_XDECREF(back); + return back; // borrowed reference +} + + +// bpo-39947 added PyThreadState_GetInterpreter() to Python 3.9.0a5 +#if PY_VERSION_HEX < 0x030900A5 +static inline PyInterpreterState * +PyThreadState_GetInterpreter(PyThreadState *tstate) +{ + assert(tstate != NULL); + return tstate->interp; +} +#endif + + +// bpo-40429 added PyThreadState_GetFrame() to Python 3.9.0b1 +#if PY_VERSION_HEX < 0x030900B1 +static inline PyFrameObject* +PyThreadState_GetFrame(PyThreadState *tstate) +{ + PyFrameObject *frame; + assert(tstate != NULL); + frame = tstate->frame; + Py_XINCREF(frame); + return frame; +} +#endif + +static inline PyFrameObject* +_PyThreadState_GetFrameBorrow(PyThreadState *tstate) +{ + PyFrameObject *frame = PyThreadState_GetFrame(tstate); + Py_XDECREF(frame); + return frame; // borrowed reference +} + + +// bpo-39947 added PyInterpreterState_Get() to Python 3.9.0a5 +#if PY_VERSION_HEX < 0x030900A5 +static inline PyInterpreterState * +PyInterpreterState_Get(void) +{ + PyThreadState *tstate; + PyInterpreterState *interp; + + tstate = PyThreadState_GET(); + if (tstate == NULL) { + Py_FatalError("GIL released (tstate is NULL)"); + } + interp = tstate->interp; + if (interp == NULL) { + Py_FatalError("no current interpreter"); + } + return interp; +} +#endif + + +// bpo-39947 added PyInterpreterState_Get() to Python 3.9.0a6 +#if 0x030700A1 <= PY_VERSION_HEX && PY_VERSION_HEX < 0x030900A6 +static inline uint64_t +PyThreadState_GetID(PyThreadState *tstate) +{ + assert(tstate != NULL); + return tstate->id; +} +#endif + + +// bpo-37194 added PyObject_CallNoArgs() to Python 3.9.0a1 +#if PY_VERSION_HEX < 0x030900A1 +static inline PyObject* +PyObject_CallNoArgs(PyObject *func) +{ + return PyObject_CallFunctionObjArgs(func, NULL); +} +#endif + + +// bpo-39245 made PyObject_CallOneArg() public (previously called +// _PyObject_CallOneArg) in Python 3.9.0a4 +#if PY_VERSION_HEX < 0x030900A4 +static inline PyObject* +PyObject_CallOneArg(PyObject *func, PyObject *arg) +{ + return PyObject_CallFunctionObjArgs(func, arg, NULL); +} +#endif + + +// bpo-1635741 added PyModule_AddObjectRef() to Python 3.10.0a3 +#if PY_VERSION_HEX < 0x030A00A3 +static inline int +PyModule_AddObjectRef(PyObject *module, const char *name, PyObject *value) +{ + Py_XINCREF(value); + int res = PyModule_AddObject(module, name, value); + if (res < 0) { + Py_XDECREF(value); + } + return res; +} +#endif + + +// bpo-40024 added PyModule_AddType() to Python 3.9.0a5 +#if PY_VERSION_HEX < 0x030900A5 +static inline int +PyModule_AddType(PyObject *module, PyTypeObject *type) +{ + const char *name, *dot; + + if (PyType_Ready(type) < 0) { + return -1; + } + + // inline _PyType_Name() + name = type->tp_name; + assert(name != NULL); + dot = strrchr(name, '.'); + if (dot != NULL) { + name = dot + 1; + } + + return PyModule_AddObjectRef(module, name, (PyObject *)type); +} +#endif + + +// bpo-40241 added PyObject_GC_IsTracked() to Python 3.9.0a6. +// bpo-4688 added _PyObject_GC_IS_TRACKED() to Python 2.7.0a2. +#if PY_VERSION_HEX < 0x030900A6 +static inline int +PyObject_GC_IsTracked(PyObject* obj) +{ + return (PyObject_IS_GC(obj) && _PyObject_GC_IS_TRACKED(obj)); +} +#endif + +// bpo-40241 added PyObject_GC_IsFinalized() to Python 3.9.0a6. +// bpo-18112 added _PyGCHead_FINALIZED() to Python 3.4.0 final. +#if PY_VERSION_HEX < 0x030900A6 && PY_VERSION_HEX >= 0x030400F0 +static inline int +PyObject_GC_IsFinalized(PyObject *obj) +{ + return (PyObject_IS_GC(obj) && _PyGCHead_FINALIZED((PyGC_Head *)(obj)-1)); +} +#endif + + +// bpo-39573 added Py_IS_TYPE() to Python 3.9.0a4 +#if PY_VERSION_HEX < 0x030900A4 && !defined(Py_IS_TYPE) +static inline int +_Py_IS_TYPE(const PyObject *ob, const PyTypeObject *type) { + return ob->ob_type == type; +} +#define Py_IS_TYPE(ob, type) _Py_IS_TYPE(_PyObject_CAST_CONST(ob), type) +#endif + + +#ifdef PYTHONCAPI_COMPAT_MSC_INLINE +# undef inline +# undef PYTHONCAPI_COMPAT_MSC_INLINE +#endif + +#ifdef __cplusplus +} +#endif +#endif // PYTHONCAPI_COMPAT diff --git a/contrib/python/zstandard/py3/ya.make b/contrib/python/zstandard/py3/ya.make new file mode 100644 index 00000000000..27e56f1470d --- /dev/null +++ b/contrib/python/zstandard/py3/ya.make @@ -0,0 +1,59 @@ +# Generated by devtools/yamaker (pypi). + +PY3_LIBRARY() + +VERSION(0.21.0) + +LICENSE(BSD-3-Clause) + +PEERDIR( + contrib/libs/zstd +) + +ADDINCL( + contrib/libs/zstd/include + contrib/python/zstandard/py3/c-ext +) + +NO_COMPILER_WARNINGS() + +NO_LINT() + +SRCS( + c-ext/backend_c.c + c-ext/bufferutil.c + c-ext/compressionchunker.c + c-ext/compressiondict.c + c-ext/compressionparams.c + c-ext/compressionreader.c + c-ext/compressionwriter.c + c-ext/compressobj.c + c-ext/compressor.c + c-ext/compressoriterator.c + c-ext/constants.c + c-ext/decompressionreader.c + c-ext/decompressionwriter.c + c-ext/decompressobj.c + c-ext/decompressor.c + c-ext/decompressoriterator.c + c-ext/frameparams.c +) + +PY_REGISTER( + zstandard.backend_c +) + +PY_SRCS( + TOP_LEVEL + zstandard/__init__.py + zstandard/__init__.pyi +) + +RESOURCE_FILES( + PREFIX contrib/python/zstandard/py3/ + .dist-info/METADATA + .dist-info/top_level.txt + zstandard/py.typed +) + +END() diff --git a/contrib/python/zstandard/py3/zstandard/__init__.py b/contrib/python/zstandard/py3/zstandard/__init__.py new file mode 100644 index 00000000000..7840473e1cb --- /dev/null +++ b/contrib/python/zstandard/py3/zstandard/__init__.py @@ -0,0 +1,210 @@ +# Copyright (c) 2017-present, Gregory Szorc +# All rights reserved. +# +# This software may be modified and distributed under the terms +# of the BSD license. See the LICENSE file for details. + +"""Python interface to the Zstandard (zstd) compression library.""" + +from __future__ import absolute_import, unicode_literals + +# This module serves 2 roles: +# +# 1) Export the C or CFFI "backend" through a central module. +# 2) Implement additional functionality built on top of C or CFFI backend. + +import builtins +import io +import os +import platform + +from typing import ByteString + +# Some Python implementations don't support C extensions. That's why we have +# a CFFI implementation in the first place. The code here import one of our +# "backends" then re-exports the symbols from this module. For convenience, +# we support falling back to the CFFI backend if the C extension can't be +# imported. But for performance reasons, we only do this on unknown Python +# implementation. Notably, for CPython we require the C extension by default. +# Because someone will inevitably want special behavior, the behavior is +# configurable via an environment variable. A potentially better way to handle +# this is to import a special ``__importpolicy__`` module or something +# defining a variable and `setup.py` could write the file with whatever +# policy was specified at build time. Until someone needs it, we go with +# the hacky but simple environment variable approach. +_module_policy = os.environ.get("PYTHON_ZSTANDARD_IMPORT_POLICY", "default") + +if _module_policy == "default": + if platform.python_implementation() in ("CPython",): + from .backend_c import * # type: ignore + + backend = "cext" + elif platform.python_implementation() in ("PyPy",): + from .backend_cffi import * # type: ignore + + backend = "cffi" + else: + try: + from .backend_c import * + + backend = "cext" + except ImportError: + from .backend_cffi import * + + backend = "cffi" +elif _module_policy == "cffi_fallback": + try: + from .backend_c import * + + backend = "cext" + except ImportError: + from .backend_cffi import * + + backend = "cffi" +elif _module_policy == "rust": + from .backend_rust import * # type: ignore + + backend = "rust" +elif _module_policy == "cext": + from .backend_c import * + + backend = "cext" +elif _module_policy == "cffi": + from .backend_cffi import * + + backend = "cffi" +else: + raise ImportError( + "unknown module import policy: %s; use default, cffi_fallback, " + "cext, or cffi" % _module_policy + ) + +# Keep this in sync with python-zstandard.h, rust-ext/src/lib.rs, and debian/changelog. +__version__ = "0.21.0" + +_MODE_CLOSED = 0 +_MODE_READ = 1 +_MODE_WRITE = 2 + + +def open( + filename, + mode="rb", + cctx=None, + dctx=None, + encoding=None, + errors=None, + newline=None, + closefd=None, +): + """Create a file object with zstd (de)compression. + + The object returned from this function will be a + :py:class:`ZstdDecompressionReader` if opened for reading in binary mode, + a :py:class:`ZstdCompressionWriter` if opened for writing in binary mode, + or an ``io.TextIOWrapper`` if opened for reading or writing in text mode. + + :param filename: + ``bytes``, ``str``, or ``os.PathLike`` defining a file to open or a + file object (with a ``read()`` or ``write()`` method). + :param mode: + ``str`` File open mode. Accepts any of the open modes recognized by + ``open()``. + :param cctx: + ``ZstdCompressor`` to use for compression. If not specified and file + is opened for writing, the default ``ZstdCompressor`` will be used. + :param dctx: + ``ZstdDecompressor`` to use for decompression. If not specified and file + is opened for reading, the default ``ZstdDecompressor`` will be used. + :param encoding: + ``str`` that defines text encoding to use when file is opened in text + mode. + :param errors: + ``str`` defining text encoding error handling mode. + :param newline: + ``str`` defining newline to use in text mode. + :param closefd: + ``bool`` whether to close the file when the returned object is closed. + Only used if a file object is passed. If a filename is specified, the + opened file is always closed when the returned object is closed. + """ + normalized_mode = mode.replace("t", "") + + if normalized_mode in ("r", "rb"): + dctx = dctx or ZstdDecompressor() + open_mode = "r" + raw_open_mode = "rb" + elif normalized_mode in ("w", "wb", "a", "ab", "x", "xb"): + cctx = cctx or ZstdCompressor() + open_mode = "w" + raw_open_mode = normalized_mode + if not raw_open_mode.endswith("b"): + raw_open_mode = raw_open_mode + "b" + else: + raise ValueError("Invalid mode: {!r}".format(mode)) + + if hasattr(os, "PathLike"): + types = (str, bytes, os.PathLike) + else: + types = (str, bytes) + + if isinstance(filename, types): # type: ignore + inner_fh = builtins.open(filename, raw_open_mode) + closefd = True + elif hasattr(filename, "read") or hasattr(filename, "write"): + inner_fh = filename + closefd = bool(closefd) + else: + raise TypeError( + "filename must be a str, bytes, file or PathLike object" + ) + + if open_mode == "r": + fh = dctx.stream_reader(inner_fh, closefd=closefd) + elif open_mode == "w": + fh = cctx.stream_writer(inner_fh, closefd=closefd) + else: + raise RuntimeError("logic error in zstandard.open() handling open mode") + + if "b" not in normalized_mode: + return io.TextIOWrapper( + fh, encoding=encoding, errors=errors, newline=newline + ) + else: + return fh + + +def compress(data: ByteString, level: int = 3) -> bytes: + """Compress source data using the zstd compression format. + + This performs one-shot compression using basic/default compression + settings. + + This method is provided for convenience and is equivalent to calling + ``ZstdCompressor(level=level).compress(data)``. + + If you find yourself calling this function in a tight loop, + performance will be greater if you construct a single ``ZstdCompressor`` + and repeatedly call ``compress()`` on it. + """ + cctx = ZstdCompressor(level=level) + + return cctx.compress(data) + + +def decompress(data: ByteString, max_output_size: int = 0) -> bytes: + """Decompress a zstd frame into its original data. + + This performs one-shot decompression using basic/default compression + settings. + + This method is provided for convenience and is equivalent to calling + ``ZstdDecompressor().decompress(data, max_output_size=max_output_size)``. + + If you find yourself calling this function in a tight loop, performance + will be greater if you construct a single ``ZstdDecompressor`` and + repeatedly call ``decompress()`` on it. + """ + dctx = ZstdDecompressor() + + return dctx.decompress(data, max_output_size=max_output_size) diff --git a/contrib/python/zstandard/py3/zstandard/py.typed b/contrib/python/zstandard/py3/zstandard/py.typed new file mode 100644 index 00000000000..e69de29bb2d --- /dev/null +++ b/contrib/python/zstandard/py3/zstandard/py.typed diff --git a/contrib/python/zstandard/ya.make b/contrib/python/zstandard/ya.make new file mode 100644 index 00000000000..75cf08f3572 --- /dev/null +++ b/contrib/python/zstandard/ya.make @@ -0,0 +1,18 @@ +PY23_LIBRARY() + +LICENSE(Service-Py23-Proxy) + +IF (PYTHON2) + PEERDIR(contrib/python/zstandard/py2) +ELSE() + PEERDIR(contrib/python/zstandard/py3) +ENDIF() + +NO_LINT() + +END() + +RECURSE( + py2 + py3 +) |
