aboutsummaryrefslogtreecommitdiffstats
path: root/contrib/libs/jemalloc/ChangeLog
diff options
context:
space:
mode:
authorbugaevskiy <bugaevskiy@yandex-team.ru>2022-02-10 16:46:17 +0300
committerDaniil Cherednik <dcherednik@yandex-team.ru>2022-02-10 16:46:17 +0300
commita6e0145a095c7bb3770d6e07aee301de5c73f96e (patch)
tree1a2c5ffcf89eb53ecd79dbc9bc0a195c27404d0c /contrib/libs/jemalloc/ChangeLog
parentc7f68570483e493f4ddaf946de7b3a420ee621b0 (diff)
downloadydb-a6e0145a095c7bb3770d6e07aee301de5c73f96e.tar.gz
Restoring authorship annotation for <bugaevskiy@yandex-team.ru>. Commit 2 of 2.
Diffstat (limited to 'contrib/libs/jemalloc/ChangeLog')
-rw-r--r--contrib/libs/jemalloc/ChangeLog3036
1 files changed, 1518 insertions, 1518 deletions
diff --git a/contrib/libs/jemalloc/ChangeLog b/contrib/libs/jemalloc/ChangeLog
index e8cb9eec50..e55813b7be 100644
--- a/contrib/libs/jemalloc/ChangeLog
+++ b/contrib/libs/jemalloc/ChangeLog
@@ -1,1518 +1,1518 @@
-Following are change highlights associated with official releases. Important
-bug fixes are all mentioned, but some internal enhancements are omitted here for
-brevity. Much more detail can be found in the git revision history:
-
- https://github.com/jemalloc/jemalloc
-
-* 5.2.1 (August 5, 2019)
-
- This release is primarily about Windows. A critical virtual memory leak is
- resolved on all Windows platforms. The regression was present in all releases
- since 5.0.0.
-
- Bug fixes:
- - Fix a severe virtual memory leak on Windows. This regression was first
- released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt,
- @interwq)
- - Fix size 0 handling in posix_memalign(). This regression was first released
- in 5.2.0. (@interwq)
- - Fix the prof_log unit test which may observe unexpected backtraces from
- compiler optimizations. The test was first added in 5.2.0. (@marxin,
- @gnzlbg, @interwq)
- - Fix the declaration of the extent_avail tree. This regression was first
- released in 5.1.0. (@zoulasc)
- - Fix an incorrect reference in jeprof. This functionality was first released
- in 3.0.0. (@prehistoric-penguin)
- - Fix an assertion on the deallocation fast-path. This regression was first
- released in 5.2.0. (@yinan1048576)
- - Fix the TLS_MODEL attribute in headers. This regression was first released
- in 5.0.0. (@zoulasc, @interwq)
-
- Optimizations and refactors:
- - Implement opt.retain on Windows and enable by default on 64-bit. (@interwq,
- @davidtgoldblatt)
- - Optimize away a branch on the operator delete[] path. (@mgrice)
- - Add format annotation to the format generator function. (@zoulasc)
- - Refactor and improve the size class header generation. (@yinan1048576)
- - Remove best fit. (@djwatson)
- - Avoid blocking on background thread locks for stats. (@oranagra, @interwq)
-
-* 5.2.0 (April 2, 2019)
-
- This release includes a few notable improvements, which are summarized below:
- 1) improved fast-path performance from the optimizations by @djwatson; 2)
- reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on
- setting the number of background threads. In addition, peak / spike memory
- usage is improved with certain allocation patterns. As usual, the release and
- prior dev versions have gone through large-scale production testing.
-
- New features:
- - Implement oversize_threshold, which uses a dedicated arena for allocations
- crossing the specified threshold to reduce fragmentation. (@interwq)
- - Add extents usage information to stats. (@tyleretzel)
- - Log time information for sampled allocations. (@tyleretzel)
- - Support 0 size in sdallocx. (@djwatson)
- - Output rate for certain counters in malloc_stats. (@zinoale)
- - Add configure option --enable-readlinkat, which allows the use of readlinkat
- over readlink. (@davidtgoldblatt)
- - Add configure options --{enable,disable}-{static,shared} to allow not
- building unwanted libraries. (@Ericson2314)
- - Add configure option --disable-libdl to enable fully static builds.
- (@interwq)
- - Add mallctl interfaces:
- + opt.oversize_threshold (@interwq)
- + stats.arenas.<i>.extent_avail (@tyleretzel)
- + stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel)
- + stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes
- (@tyleretzel)
-
- Portability improvements:
- - Update MSVC builds. (@maksqwe, @rustyx)
- - Workaround a compiler optimizer bug on s390x. (@rkmisra)
- - Make use of pthread_set_name_np(3) on FreeBSD. (@trasz)
- - Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada)
- - Link against -pthread instead of -lpthread. (@paravoid)
- - Make background_thread not dependent on libdl. (@interwq)
- - Add stringify to fix a linker directive issue on MSVC. (@daverigby)
- - Detect and fall back when 8-bit atomics are unavailable. (@interwq)
- - Fall back to the default pthread_create if dlsym(3) fails. (@interwq)
-
- Optimizations and refactors:
- - Refactor the TSD module. (@davidtgoldblatt)
- - Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq)
- - Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq)
- - Optimize ixalloc by avoiding a size lookup. (@interwq)
- - Implement opt.oversize_threshold which uses a dedicated arena for requests
- crossing the threshold, also eagerly purges the oversize extents. Default
- the threshold to 8 MiB. (@interwq)
- - Clean compilation with -Wextra. (@gnzlbg, @jasone)
- - Refactor the size class module. (@davidtgoldblatt)
- - Refactor the stats emitter. (@tyleretzel)
- - Optimize pow2_ceil. (@rkmisra)
- - Avoid runtime detection of lazy purging on FreeBSD. (@trasz)
- - Optimize mmap(2) alignment handling on FreeBSD. (@trasz)
- - Improve error handling for THP state initialization. (@jsteemann)
- - Rework the malloc() fast path. (@djwatson)
- - Rework the free() fast path. (@djwatson)
- - Refactor and optimize the tcache fill / flush paths. (@djwatson)
- - Optimize sync / lwsync on PowerPC. (@chmeeedalf)
- - Bypass extent_dalloc() when retain is enabled. (@interwq)
- - Optimize the locking on large deallocation. (@interwq)
- - Reduce the number of pages committed from sanity checking in debug build.
- (@trasz, @interwq)
- - Deprecate OSSpinLock. (@interwq)
- - Lower the default number of background threads to 4 (when the feature
- is enabled). (@interwq)
- - Optimize the trylock spin wait. (@djwatson)
- - Use arena index for arena-matching checks. (@interwq)
- - Avoid forced decay on thread termination when using background threads.
- (@interwq)
- - Disable muzzy decay by default. (@djwatson, @interwq)
- - Only initialize libgcc unwinder when profiling is enabled. (@paravoid,
- @interwq)
-
- Bug fixes (all only relevant to jemalloc 5.x):
- - Fix background thread index issues with max_background_threads. (@djwatson,
- @interwq)
- - Fix stats output for opt.lg_extent_max_active_fit. (@interwq)
- - Fix opt.prof_prefix initialization. (@davidtgoldblatt)
- - Properly trigger decay on tcache destroy. (@interwq, @amosbird)
- - Fix tcache.flush. (@interwq)
- - Detect whether explicit extent zero out is necessary with huge pages or
- custom extent hooks, which may change the purge semantics. (@interwq)
- - Fix a side effect caused by extent_max_active_fit combined with decay-based
- purging, where freed extents can accumulate and not be reused for an
- extended period of time. (@interwq, @mpghf)
- - Fix a missing unlock on extent register error handling. (@zoulasc)
-
- Testing:
- - Simplify the Travis script output. (@gnzlbg)
- - Update the test scripts for FreeBSD. (@devnexen)
- - Add unit tests for the producer-consumer pattern. (@interwq)
- - Add Cirrus-CI config for FreeBSD builds. (@jasone)
- - Add size-matching sanity checks on tcache flush. (@davidtgoldblatt,
- @interwq)
-
- Incompatible changes:
- - Remove --with-lg-page-sizes. (@davidtgoldblatt)
-
- Documentation:
- - Attempt to build docs by default, however skip doc building when xsltproc
- is missing. (@interwq, @cmuellner)
-
-* 5.1.0 (May 4, 2018)
-
- This release is primarily about fine-tuning, ranging from several new features
- to numerous notable performance and portability enhancements. The release and
- prior dev versions have been running in multiple large scale applications for
- months, and the cumulative improvements are substantial in many cases.
-
- Given the long and successful production runs, this release is likely a good
- candidate for applications to upgrade, from both jemalloc 5.0 and before. For
- performance-critical applications, the newly added TUNING.md provides
- guidelines on jemalloc tuning.
-
- New features:
- - Implement transparent huge page support for internal metadata. (@interwq)
- - Add opt.thp to allow enabling / disabling transparent huge pages for all
- mappings. (@interwq)
- - Add maximum background thread count option. (@djwatson)
- - Allow prof_active to control opt.lg_prof_interval and prof.gdump.
- (@interwq)
- - Allow arena index lookup based on allocation addresses via mallctl.
- (@lionkov)
- - Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD)
- - Add opt.lg_extent_max_active_fit to set the max ratio between the size of
- the active extent selected (to split off from) and the size of the requested
- allocation. (@interwq, @davidtgoldblatt)
- - Add retain_grow_limit to set the max size when growing virtual address
- space. (@interwq)
- - Add mallctl interfaces:
- + arena.<i>.retain_grow_limit (@interwq)
- + arenas.lookup (@lionkov)
- + max_background_threads (@djwatson)
- + opt.lg_extent_max_active_fit (@interwq)
- + opt.max_background_threads (@djwatson)
- + opt.metadata_thp (@interwq)
- + opt.thp (@interwq)
- + stats.metadata_thp (@interwq)
-
- Portability improvements:
- - Support GNU/kFreeBSD configuration. (@paravoid)
- - Support m68k, nios2 and SH3 architectures. (@paravoid)
- - Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo)
- - Fix symbol listing for cross-compiling. (@tamird)
- - Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid)
- - Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin)
- - Fix MSVC 2015 & 2017 builds. (@rustyx)
- - Improve RISC-V support. (@EdSchouten)
- - Set name mangling script in strict mode. (@nicolov)
- - Avoid MADV_HUGEPAGE on ARM. (@marxin)
- - Modify configure to determine return value of strerror_r.
- (@davidtgoldblatt, @cferris1000)
- - Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani)
- - Fix 32-bit build on MSVC. (@rustyx)
- - Fix external symbol on MSVC. (@maksqwe)
- - Avoid a printf format specifier warning. (@jasone)
- - Add configure option --disable-initial-exec-tls which can allow jemalloc to
- be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD)
- - AArch64: Add ILP32 support. (@cmuellner)
- - Add --with-lg-vaddr configure option to support cross compiling.
- (@cmuellner, @davidtgoldblatt)
-
- Optimizations and refactors:
- - Improve active extent fit with extent_max_active_fit. This considerably
- reduces fragmentation over time and improves virtual memory and metadata
- usage. (@davidtgoldblatt, @interwq)
- - Eagerly coalesce large extents to reduce fragmentation. (@interwq)
- - sdallocx: only read size info when page aligned (i.e. possibly sampled),
- which speeds up the sized deallocation path significantly. (@interwq)
- - Avoid attempting new mappings for in place expansion with retain, since
- it rarely succeeds in practice and causes high overhead. (@interwq)
- - Refactor OOM handling in newImpl. (@wqfish)
- - Add internal fine-grained logging functionality for debugging use.
- (@davidtgoldblatt)
- - Refactor arena / tcache interactions. (@davidtgoldblatt)
- - Refactor extent management with dumpable flag. (@davidtgoldblatt)
- - Add runtime detection of lazy purging. (@interwq)
- - Use pairing heap instead of red-black tree for extents_avail. (@djwatson)
- - Use sysctl on startup in FreeBSD. (@trasz)
- - Use thread local prng state instead of atomic. (@djwatson)
- - Make decay to always purge one more extent than before, because in
- practice large extents are usually the ones that cross the decay threshold.
- Purging the additional extent helps save memory as well as reduce VM
- fragmentation. (@interwq)
- - Fast division by dynamic values. (@davidtgoldblatt)
- - Improve the fit for aligned allocation. (@interwq, @edwinsmith)
- - Refactor extent_t bitpacking. (@rkmisra)
- - Optimize the generated assembly for ticker operations. (@davidtgoldblatt)
- - Convert stats printing to use a structured text emitter. (@davidtgoldblatt)
- - Remove preserve_lru feature for extents management. (@djwatson)
- - Consolidate two memory loads into one on the fast deallocation path.
- (@davidtgoldblatt, @interwq)
-
- Bug fixes (most of the issues are only relevant to jemalloc 5.0):
- - Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt)
- - Validate returned file descriptor before use. (@zonyitoo)
- - Fix a few background thread initialization and shutdown issues. (@interwq)
- - Fix an extent coalesce + decay race by taking both coalescing extents off
- the LRU list. (@interwq)
- - Fix potentially unbound increase during decay, caused by one thread keep
- stashing memory to purge while other threads generating new pages. The
- number of pages to purge is checked to prevent this. (@interwq)
- - Fix a FreeBSD bootstrap assertion. (@strejda, @interwq)
- - Handle 32 bit mutex counters. (@rkmisra)
- - Fix a indexing bug when creating background threads. (@davidtgoldblatt,
- @binliu19)
- - Fix arguments passed to extent_init. (@yuleniwo, @interwq)
- - Fix addresses used for ordering mutexes. (@rkmisra)
- - Fix abort_conf processing during bootstrap. (@interwq)
- - Fix include path order for out-of-tree builds. (@cmuellner)
-
- Incompatible changes:
- - Remove --disable-thp. (@interwq)
- - Remove mallctl interfaces:
- + config.thp (@interwq)
-
- Documentation:
- - Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson)
-
-* 5.0.1 (July 1, 2017)
-
- This bugfix release fixes several issues, most of which are obscure enough
- that typical applications are not impacted.
-
- Bug fixes:
- - Update decay->nunpurged before purging, in order to avoid potential update
- races and subsequent incorrect purging volume. (@interwq)
- - Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy
- locking and/or background threads). This mitigates an initialization
- failure bug for which we still do not have a clear reproduction test case.
- (@interwq)
- - Modify tsd management so that it neither crashes nor leaks if a thread's
- only allocation activity is to call free() after TLS destructors have been
- executed. This behavior was observed when operating with GNU libc, and is
- unlikely to be an issue with other libc implementations. (@interwq)
- - Mask signals during background thread creation. This prevents signals from
- being inadvertently delivered to background threads. (@jasone,
- @davidtgoldblatt, @interwq)
- - Avoid inactivity checks within background threads, in order to prevent
- recursive mutex acquisition. (@interwq)
- - Fix extent_grow_retained() to use the specified hooks when the
- arena.<i>.extent_hooks mallctl is used to override the default hooks.
- (@interwq)
- - Add missing reentrancy support for custom extent hooks which allocate.
- (@interwq)
- - Post-fork(2), re-initialize the list of tcaches associated with each arena
- to contain no tcaches except the forking thread's. (@interwq)
- - Add missing post-fork(2) mutex reinitialization for extent_grow_mtx. This
- fixes potential deadlocks after fork(2). (@interwq)
- - Enforce minimum autoconf version (currently 2.68), since 2.63 is known to
- generate corrupt configure scripts. (@jasone)
- - Ensure that the configured page size (--with-lg-page) is no larger than the
- configured huge page size (--with-lg-hugepage). (@jasone)
-
-* 5.0.0 (June 13, 2017)
-
- Unlike all previous jemalloc releases, this release does not use naturally
- aligned "chunks" for virtual memory management, and instead uses page-aligned
- "extents". This change has few externally visible effects, but the internal
- impacts are... extensive. Many other internal changes combine to make this
- the most cohesively designed version of jemalloc so far, with ample
- opportunity for further enhancements.
-
- Continuous integration is now an integral aspect of development thanks to the
- efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably
- stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a
- side effect the official release frequency may decrease over time.
-
- New features:
- - Implement optional per-CPU arena support; threads choose which arena to use
- based on current CPU rather than on fixed thread-->arena associations.
- (@interwq)
- - Implement two-phase decay of unused dirty pages. Pages transition from
- dirty-->muzzy-->clean, where the first phase transition relies on
- madvise(... MADV_FREE) semantics, and the second phase transition discards
- pages such that they are replaced with demand-zeroed pages on next access.
- (@jasone)
- - Increase decay time resolution from seconds to milliseconds. (@jasone)
- - Implement opt-in per CPU background threads, and use them for asynchronous
- decay-driven unused dirty page purging. (@interwq)
- - Add mutex profiling, which collects a variety of statistics useful for
- diagnosing overhead/contention issues. (@interwq)
- - Add C++ new/delete operator bindings. (@djwatson)
- - Support manually created arena destruction, such that all data and metadata
- are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats
- associated with destroyed arenas. (@jasone)
- - Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing
- merged/destroyed arena statistics via mallctl. (@jasone)
- - Add opt.abort_conf to optionally abort if invalid configuration options are
- detected during initialization. (@interwq)
- - Add opt.stats_print_opts, so that e.g. JSON output can be selected for the
- stats dumped during exit if opt.stats_print is true. (@jasone)
- - Add --with-version=VERSION for use when embedding jemalloc into another
- project's git repository. (@jasone)
- - Add --disable-thp to support cross compiling. (@jasone)
- - Add --with-lg-hugepage to support cross compiling. (@jasone)
- - Add mallctl interfaces (various authors):
- + background_thread
- + opt.abort_conf
- + opt.retain
- + opt.percpu_arena
- + opt.background_thread
- + opt.{dirty,muzzy}_decay_ms
- + opt.stats_print_opts
- + arena.<i>.initialized
- + arena.<i>.destroy
- + arena.<i>.{dirty,muzzy}_decay_ms
- + arena.<i>.extent_hooks
- + arenas.{dirty,muzzy}_decay_ms
- + arenas.bin.<i>.slab_size
- + arenas.nlextents
- + arenas.lextent.<i>.size
- + arenas.create
- + stats.background_thread.{num_threads,num_runs,run_interval}
- + stats.mutexes.{ctl,background_thread,prof,reset}.
- {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
- num_owner_switch}
- + stats.arenas.<i>.{dirty,muzzy}_decay_ms
- + stats.arenas.<i>.uptime
- + stats.arenas.<i>.{pmuzzy,base,internal,resident}
- + stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
- + stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
- + stats.arenas.<i>.bins.<j>.mutex.
- {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
- num_owner_switch}
- + stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
- + stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,
- extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.
- {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
- num_owner_switch}
-
- Portability improvements:
- - Improve reentrant allocation support, such that deadlock is less likely if
- e.g. a system library call in turn allocates memory. (@davidtgoldblatt,
- @interwq)
- - Support static linking of jemalloc with glibc. (@djwatson)
-
- Optimizations and refactors:
- - Organize virtual memory as "extents" of virtual memory pages, rather than as
- naturally aligned "chunks", and store all metadata in arbitrarily distant
- locations. This reduces virtual memory external fragmentation, and will
- interact better with huge pages (not yet explicitly supported). (@jasone)
- - Fold large and huge size classes together; only small and large size classes
- remain. (@jasone)
- - Unify the allocation paths, and merge most fast-path branching decisions.
- (@davidtgoldblatt, @interwq)
- - Embed per thread automatic tcache into thread-specific data, which reduces
- conditional branches and dereferences. Also reorganize tcache to increase
- fast-path data locality. (@interwq)
- - Rewrite atomics to closely model the C11 API, convert various
- synchronization from mutex-based to atomic, and use the explicit memory
- ordering control to resolve various hypothetical races without increasing
- synchronization overhead. (@davidtgoldblatt)
- - Extensively optimize rtree via various methods:
- + Add multiple layers of rtree lookup caching, since rtree lookups are now
- part of fast-path deallocation. (@interwq)
- + Determine rtree layout at compile time. (@jasone)
- + Make the tree shallower for common configurations. (@jasone)
- + Embed the root node in the top-level rtree data structure, thus avoiding
- one level of indirection. (@jasone)
- + Further specialize leaf elements as compared to internal node elements,
- and directly embed extent metadata needed for fast-path deallocation.
- (@jasone)
- + Ignore leading always-zero address bits (architecture-specific).
- (@jasone)
- - Reorganize headers (ongoing work) to make them hermetic, and disentangle
- various module dependencies. (@davidtgoldblatt)
- - Convert various internal data structures such as size class metadata from
- boot-time-initialized to compile-time-initialized. Propagate resulting data
- structure simplifications, such as making arena metadata fixed-size.
- (@jasone)
- - Simplify size class lookups when constrained to size classes that are
- multiples of the page size. This speeds lookups, but the primary benefit is
- complexity reduction in code that was the source of numerous regressions.
- (@jasone)
- - Lock individual extents when possible for localized extent operations,
- rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone)
- - Use first fit layout policy instead of best fit, in order to improve
- packing. (@jasone)
- - If munmap(2) is not in use, use an exponential series to grow each arena's
- virtual memory, so that the number of disjoint virtual memory mappings
- remains low. (@jasone)
- - Implement per arena base allocators, so that arenas never share any virtual
- memory pages. (@jasone)
- - Automatically generate private symbol name mangling macros. (@jasone)
-
- Incompatible changes:
- - Replace chunk hooks with an expanded/normalized set of extent hooks.
- (@jasone)
- - Remove ratio-based purging. (@jasone)
- - Remove --disable-tcache. (@jasone)
- - Remove --disable-tls. (@jasone)
- - Remove --enable-ivsalloc. (@jasone)
- - Remove --with-lg-size-class-group. (@jasone)
- - Remove --with-lg-tiny-min. (@jasone)
- - Remove --disable-cc-silence. (@jasone)
- - Remove --enable-code-coverage. (@jasone)
- - Remove --disable-munmap (replaced by opt.retain). (@jasone)
- - Remove Valgrind support. (@jasone)
- - Remove quarantine support. (@jasone)
- - Remove redzone support. (@jasone)
- - Remove mallctl interfaces (various authors):
- + config.munmap
- + config.tcache
- + config.tls
- + config.valgrind
- + opt.lg_chunk
- + opt.purge
- + opt.lg_dirty_mult
- + opt.decay_time
- + opt.quarantine
- + opt.redzone
- + opt.thp
- + arena.<i>.lg_dirty_mult
- + arena.<i>.decay_time
- + arena.<i>.chunk_hooks
- + arenas.initialized
- + arenas.lg_dirty_mult
- + arenas.decay_time
- + arenas.bin.<i>.run_size
- + arenas.nlruns
- + arenas.lrun.<i>.size
- + arenas.nhchunks
- + arenas.hchunk.<i>.size
- + arenas.extend
- + stats.cactive
- + stats.arenas.<i>.lg_dirty_mult
- + stats.arenas.<i>.decay_time
- + stats.arenas.<i>.metadata.{mapped,allocated}
- + stats.arenas.<i>.{npurge,nmadvise,purged}
- + stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
- + stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
- + stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
- + stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}
-
- Bug fixes:
- - Improve interval-based profile dump triggering to dump only one profile when
- a single allocation's size exceeds the interval. (@jasone)
- - Use prefixed function names (as controlled by --with-jemalloc-prefix) when
- pruning backtrace frames in jeprof. (@jasone)
-
-* 4.5.0 (February 28, 2017)
-
- This is the first release to benefit from much broader continuous integration
- testing, thanks to @davidtgoldblatt. Had we had this testing infrastructure
- in place for prior releases, it would have caught all of the most serious
- regressions fixed by this release.
-
- New features:
- - Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for
- transparent huge page integration. (@jasone)
- - Update zone allocator integration to work with macOS 10.12. (@glandium)
- - Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and
- EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not
- during configuration. (@jasone, @ronawho)
-
- Bug fixes:
- - Fix DSS (sbrk(2)-based) allocation. This regression was first released in
- 4.3.0. (@jasone)
- - Handle race in per size class utilization computation. This functionality
- was first released in 4.0.0. (@interwq)
- - Fix lock order reversal during gdump. (@jasone)
- - Fix/refactor tcache synchronization. This regression was first released in
- 4.0.0. (@jasone)
- - Fix various JSON-formatted malloc_stats_print() bugs. This functionality
- was first released in 4.3.0. (@jasone)
- - Fix huge-aligned allocation. This regression was first released in 4.4.0.
- (@jasone)
- - When transparent huge page integration is enabled, detect what state pages
- start in according to the kernel's current operating mode, and only convert
- arena chunks to non-huge during purging if that is not their initial state.
- This functionality was first released in 4.4.0. (@jasone)
- - Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case.
- This regression was first released in 4.0.0. (@jasone, @428desmo)
- - Properly detect sparc64 when building for Linux. (@glaubitz)
-
-* 4.4.0 (December 3, 2016)
-
- New features:
- - Add configure support for *-*-linux-android. (@cferris1000, @jasone)
- - Add the --disable-syscall configure option, for use on systems that place
- security-motivated limitations on syscall(2). (@jasone)
- - Add support for Debian GNU/kFreeBSD. (@thesam)
-
- Optimizations:
- - Add extent serial numbers and use them where appropriate as a sort key that
- is higher priority than address, so that the allocation policy prefers older
- extents. This tends to improve locality (decrease fragmentation) when
- memory grows downward. (@jasone)
- - Refactor madvise(2) configuration so that MADV_FREE is detected and utilized
- on Linux 4.5 and newer. (@jasone)
- - Mark partially purged arena chunks as non-huge-page. This improves
- interaction with Linux's transparent huge page functionality. (@jasone)
-
- Bug fixes:
- - Fix size class computations for edge conditions involving extremely large
- allocations. This regression was first released in 4.0.0. (@jasone,
- @ingvarha)
- - Remove overly restrictive assertions related to the cactive statistic. This
- regression was first released in 4.1.0. (@jasone)
- - Implement a more reliable detection scheme for os_unfair_lock on macOS.
- (@jszakmeister)
-
-* 4.3.1 (November 7, 2016)
-
- Bug fixes:
- - Fix a severe virtual memory leak. This regression was first released in
- 4.3.0. (@interwq, @jasone)
- - Refactor atomic and prng APIs to restore support for 32-bit platforms that
- use pre-C11 toolchains, e.g. FreeBSD's mips. (@jasone)
-
-* 4.3.0 (November 4, 2016)
-
- This is the first release that passes the test suite for multiple Windows
- configurations, thanks in large part to @glandium setting up continuous
- integration via AppVeyor (and Travis CI for Linux and OS X).
-
- New features:
- - Add "J" (JSON) support to malloc_stats_print(). (@jasone)
- - Add Cray compiler support. (@ronawho)
-
- Optimizations:
- - Add/use adaptive spinning for bootstrapping and radix tree node
- initialization. (@jasone)
-
- Bug fixes:
- - Fix large allocation to search starting in the optimal size class heap,
- which can substantially reduce virtual memory churn and fragmentation. This
- regression was first released in 4.0.0. (@mjp41, @jasone)
- - Fix stats.arenas.<i>.nthreads accounting. (@interwq)
- - Fix and simplify decay-based purging. (@jasone)
- - Make DSS (sbrk(2)-related) operations lockless, which resolves potential
- deadlocks during thread exit. (@jasone)
- - Fix over-sized allocation of radix tree leaf nodes. (@mjp41, @ogaun,
- @jasone)
- - Fix over-sized allocation of arena_t (plus associated stats) data
- structures. (@jasone, @interwq)
- - Fix EXTRA_CFLAGS to not affect configuration. (@jasone)
- - Fix a Valgrind integration bug. (@ronawho)
- - Disallow 0x5a junk filling when running in Valgrind. (@jasone)
- - Fix a file descriptor leak on Linux. This regression was first released in
- 4.2.0. (@vsarunas, @jasone)
- - Fix static linking of jemalloc with glibc. (@djwatson)
- - Use syscall(2) rather than {open,read,close}(2) during boot on Linux. This
- works around other libraries' system call wrappers performing reentrant
- allocation. (@kspinka, @Whissi, @jasone)
- - Fix OS X default zone replacement to work with OS X 10.12. (@glandium,
- @jasone)
- - Fix cached memory management to avoid needless commit/decommit operations
- during purging, which resolves permanent virtual memory map fragmentation
- issues on Windows. (@mjp41, @jasone)
- - Fix TSD fetches to avoid (recursive) allocation. This is relevant to
- non-TLS and Windows configurations. (@jasone)
- - Fix malloc_conf overriding to work on Windows. (@jasone)
- - Forcibly disable lazy-lock on Windows (was forcibly *enabled*). (@jasone)
-
-* 4.2.1 (June 8, 2016)
-
- Bug fixes:
- - Fix bootstrapping issues for configurations that require allocation during
- tsd initialization (e.g. --disable-tls). (@cferris1000, @jasone)
- - Fix gettimeofday() version of nstime_update(). (@ronawho)
- - Fix Valgrind regressions in calloc() and chunk_alloc_wrapper(). (@ronawho)
- - Fix potential VM map fragmentation regression. (@jasone)
- - Fix opt_zero-triggered in-place huge reallocation zeroing. (@jasone)
- - Fix heap profiling context leaks in reallocation edge cases. (@jasone)
-
-* 4.2.0 (May 12, 2016)
-
- New features:
- - Add the arena.<i>.reset mallctl, which makes it possible to discard all of
- an arena's allocations in a single operation. (@jasone)
- - Add the stats.retained and stats.arenas.<i>.retained statistics. (@jasone)
- - Add the --with-version configure option. (@jasone)
- - Support --with-lg-page values larger than actual page size. (@jasone)
-
- Optimizations:
- - Use pairing heaps rather than red-black trees for various hot data
- structures. (@djwatson, @jasone)
- - Streamline fast paths of rtree operations. (@jasone)
- - Optimize the fast paths of calloc() and [m,d,sd]allocx(). (@jasone)
- - Decommit unused virtual memory if the OS does not overcommit. (@jasone)
- - Specify MAP_NORESERVE on Linux if [heuristic] overcommit is active, in order
- to avoid unfortunate interactions during fork(2). (@jasone)
-
- Bug fixes:
- - Fix chunk accounting related to triggering gdump profiles. (@jasone)
- - Link against librt for clock_gettime(2) if glibc < 2.17. (@jasone)
- - Scale leak report summary according to sampling probability. (@jasone)
-
-* 4.1.1 (May 3, 2016)
-
- This bugfix release resolves a variety of mostly minor issues, though the
- bitmap fix is critical for 64-bit Windows.
-
- Bug fixes:
- - Fix the linear scan version of bitmap_sfu() to shift by the proper amount
- even when sizeof(long) is not the same as sizeof(void *), as on 64-bit
- Windows. (@jasone)
- - Fix hashing functions to avoid unaligned memory accesses (and resulting
- crashes). This is relevant at least to some ARM-based platforms.
- (@rkmisra)
- - Fix fork()-related lock rank ordering reversals. These reversals were
- unlikely to cause deadlocks in practice except when heap profiling was
- enabled and active. (@jasone)
- - Fix various chunk leaks in OOM code paths. (@jasone)
- - Fix malloc_stats_print() to print opt.narenas correctly. (@jasone)
- - Fix MSVC-specific build/test issues. (@rustyx, @yuslepukhin)
- - Fix a variety of test failures that were due to test fragility rather than
- core bugs. (@jasone)
-
-* 4.1.0 (February 28, 2016)
-
- This release is primarily about optimizations, but it also incorporates a lot
- of portability-motivated refactoring and enhancements. Many people worked on
- this release, to an extent that even with the omission here of minor changes
- (see git revision history), and of the people who reported and diagnosed
- issues, so much of the work was contributed that starting with this release,
- changes are annotated with author credits to help reflect the collaborative
- effort involved.
-
- New features:
- - Implement decay-based unused dirty page purging, a major optimization with
- mallctl API impact. This is an alternative to the existing ratio-based
- unused dirty page purging, and is intended to eventually become the sole
- purging mechanism. New mallctls:
- + opt.purge
- + opt.decay_time
- + arena.<i>.decay
- + arena.<i>.decay_time
- + arenas.decay_time
- + stats.arenas.<i>.decay_time
- (@jasone, @cevans87)
- - Add --with-malloc-conf, which makes it possible to embed a default
- options string during configuration. This was motivated by the desire to
- specify --with-malloc-conf=purge:decay , since the default must remain
- purge:ratio until the 5.0.0 release. (@jasone)
- - Add MS Visual Studio 2015 support. (@rustyx, @yuslepukhin)
- - Make *allocx() size class overflow behavior defined. The maximum
- size class is now less than PTRDIFF_MAX to protect applications against
- numerical overflow, and all allocation functions are guaranteed to indicate
- errors rather than potentially crashing if the request size exceeds the
- maximum size class. (@jasone)
- - jeprof:
- + Add raw heap profile support. (@jasone)
- + Add --retain and --exclude for backtrace symbol filtering. (@jasone)
-
- Optimizations:
- - Optimize the fast path to combine various bootstrapping and configuration
- checks and execute more streamlined code in the common case. (@interwq)
- - Use linear scan for small bitmaps (used for small object tracking). In
- addition to speeding up bitmap operations on 64-bit systems, this reduces
- allocator metadata overhead by approximately 0.2%. (@djwatson)
- - Separate arena_avail trees, which substantially speeds up run tree
- operations. (@djwatson)
- - Use memoization (boot-time-computed table) for run quantization. Separate
- arena_avail trees reduced the importance of this optimization. (@jasone)
- - Attempt mmap-based in-place huge reallocation. This can dramatically speed
- up incremental huge reallocation. (@jasone)
-
- Incompatible changes:
- - Make opt.narenas unsigned rather than size_t. (@jasone)
-
- Bug fixes:
- - Fix stats.cactive accounting regression. (@rustyx, @jasone)
- - Handle unaligned keys in hash(). This caused problems for some ARM systems.
- (@jasone, @cferris1000)
- - Refactor arenas array. In addition to fixing a fork-related deadlock, this
- makes arena lookups faster and simpler. (@jasone)
- - Move retained memory allocation out of the default chunk allocation
- function, to a location that gets executed even if the application installs
- a custom chunk allocation function. This resolves a virtual memory leak.
- (@buchgr)
- - Fix a potential tsd cleanup leak. (@cferris1000, @jasone)
- - Fix run quantization. In practice this bug had no impact unless
- applications requested memory with alignment exceeding one page.
- (@jasone, @djwatson)
- - Fix LinuxThreads-specific bootstrapping deadlock. (Cosmin Paraschiv)
- - jeprof:
- + Don't discard curl options if timeout is not defined. (@djwatson)
- + Detect failed profile fetches. (@djwatson)
- - Fix stats.arenas.<i>.{dss,lg_dirty_mult,decay_time,pactive,pdirty} for
- --disable-stats case. (@jasone)
-
-* 4.0.4 (October 24, 2015)
-
- This bugfix release fixes another xallocx() regression. No other regressions
- have come to light in over a month, so this is likely a good starting point
- for people who prefer to wait for "dot one" releases with all the major issues
- shaken out.
-
- Bug fixes:
- - Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large
- allocations that have been randomly assigned an offset of 0 when
- --enable-cache-oblivious configure option is enabled.
-
-* 4.0.3 (September 24, 2015)
-
- This bugfix release continues the trend of xallocx() and heap profiling fixes.
-
- Bug fixes:
- - Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large
- allocations when --enable-cache-oblivious configure option is enabled.
- - Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations
- when resizing from/to a size class that is not a multiple of the chunk size.
- - Fix prof_tctx_dump_iter() to filter out nodes that were created after heap
- profile dumping started.
- - Work around a potentially bad thread-specific data initialization
- interaction with NPTL (glibc's pthreads implementation).
-
-* 4.0.2 (September 21, 2015)
-
- This bugfix release addresses a few bugs specific to heap profiling.
-
- Bug fixes:
- - Fix ixallocx_prof_sample() to never modify nor create sampled small
- allocations. xallocx() is in general incapable of moving small allocations,
- so this fix removes buggy code without loss of generality.
- - Fix irallocx_prof_sample() to always allocate large regions, even when
- alignment is non-zero.
- - Fix prof_alloc_rollback() to read tdata from thread-specific data rather
- than dereferencing a potentially invalid tctx.
-
-* 4.0.1 (September 15, 2015)
-
- This is a bugfix release that is somewhat high risk due to the amount of
- refactoring required to address deep xallocx() problems. As a side effect of
- these fixes, xallocx() now tries harder to partially fulfill requests for
- optional extra space. Note that a couple of minor heap profiling
- optimizations are included, but these are better thought of as performance
- fixes that were integral to discovering most of the other bugs.
-
- Optimizations:
- - Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the
- fast path when heap profiling is enabled. Additionally, split a special
- case out into arena_prof_tctx_reset(), which also avoids chunk metadata
- reads.
- - Optimize irallocx_prof() to optimistically update the sampler state. The
- prior implementation appears to have been a holdover from when
- rallocx()/xallocx() functionality was combined as rallocm().
-
- Bug fixes:
- - Fix TLS configuration such that it is enabled by default for platforms on
- which it works correctly.
- - Fix arenas_cache_cleanup() and arena_get_hard() to handle
- allocation/deallocation within the application's thread-specific data
- cleanup functions even after arenas_cache is torn down.
- - Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS.
- - Fix chunk purge hook calls for in-place huge shrinking reallocation to
- specify the old chunk size rather than the new chunk size. This bug caused
- no correctness issues for the default chunk purge function, but was
- visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl.
- - Fix heap profiling bugs:
- + Fix heap profiling to distinguish among otherwise identical sample sites
- with interposed resets (triggered via the "prof.reset" mallctl). This bug
- could cause data structure corruption that would most likely result in a
- segfault.
- + Fix irealloc_prof() to prof_alloc_rollback() on OOM.
- + Make one call to prof_active_get_unlocked() per allocation event, and use
- the result throughout the relevant functions that handle an allocation
- event. Also add a missing check in prof_realloc(). These fixes protect
- allocation events against concurrent prof_active changes.
- + Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample()
- in the correct order.
- + Fix prof_realloc() to call prof_free_sampled_object() after calling
- prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were
- the same, the tctx could have been prematurely destroyed.
- - Fix portability bugs:
- + Don't bitshift by negative amounts when encoding/decoding run sizes in
- chunk header maps. This affected systems with page sizes greater than 8
- KiB.
- + Rename index_t to szind_t to avoid an existing type on Solaris.
- + Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to
- match glibc and avoid compilation errors when including both
- jemalloc/jemalloc.h and malloc.h in C++ code.
- + Don't assume that /bin/sh is appropriate when running size_classes.sh
- during configuration.
- + Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM.
- + Link tests to librt if it contains clock_gettime(2).
-
-* 4.0.0 (August 17, 2015)
-
- This version contains many speed and space optimizations, both minor and
- major. The major themes are generalization, unification, and simplification.
- Although many of these optimizations cause no visible behavior change, their
- cumulative effect is substantial.
-
- New features:
- - Normalize size class spacing to be consistent across the complete size
- range. By default there are four size classes per size doubling, but this
- is now configurable via the --with-lg-size-class-group option. Also add the
- --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
- --with-lg-tiny-min options, which can be used to tweak page and size class
- settings. Impacts:
- + Worst case performance for incrementally growing/shrinking reallocation
- is improved because there are far fewer size classes, and therefore
- copying happens less often.
- + Internal fragmentation is limited to 20% for all but the smallest size
- classes (those less than four times the quantum). (1B + 4 KiB)
- and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
- + Chunk fragmentation tends to be lower because there are fewer distinct run
- sizes to pack.
- - Add support for explicit tcaches. The "tcache.create", "tcache.flush", and
- "tcache.destroy" mallctls control tcache lifetime and flushing, and the
- MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
- control which tcache is used for each operation.
- - Implement per thread heap profiling, as well as the ability to
- enable/disable heap profiling on a per thread basis. Add the "prof.reset",
- "prof.lg_sample", "thread.prof.name", "thread.prof.active",
- "opt.prof_thread_active_init", "prof.thread_active_init", and
- "thread.prof.active" mallctls.
- - Add support for per arena application-specified chunk allocators, configured
- via the "arena.<i>.chunk_hooks" mallctl.
- - Refactor huge allocation to be managed by arenas, so that arenas now
- function as general purpose independent allocators. This is important in
- the context of user-specified chunk allocators, aside from the scalability
- benefits. Related new statistics:
- + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
- "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
- mallctls provide high level per arena huge allocation statistics.
- + The "arenas.nhchunks", "arenas.hchunk.<i>.size",
- "stats.arenas.<i>.hchunks.<j>.nmalloc",
- "stats.arenas.<i>.hchunks.<j>.ndalloc",
- "stats.arenas.<i>.hchunks.<j>.nrequests", and
- "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
- statistics.
- - Add the 'util' column to malloc_stats_print() output, which reports the
- proportion of available regions that are currently in use for each small
- size class.
- - Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
- mallctl), so that it is possible to separately enable junk filling for
- allocation versus deallocation.
- - Add the jemalloc-config script, which provides information about how
- jemalloc was configured, and how to integrate it into application builds.
- - Add metadata statistics, which are accessible via the "stats.metadata",
- "stats.arenas.<i>.metadata.mapped", and
- "stats.arenas.<i>.metadata.allocated" mallctls.
- - Add the "stats.resident" mallctl, which reports the upper limit of
- physically resident memory mapped by the allocator.
- - Add per arena control over unused dirty page purging, via the
- "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and
- "stats.arenas.<i>.lg_dirty_mult" mallctls.
- - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
- feature on/off during program execution.
- - Add sdallocx(), which implements sized deallocation. The primary
- optimization over dallocx() is the removal of a metadata read, which often
- suffers an L1 cache miss.
- - Add missing header includes in jemalloc/jemalloc.h, so that applications
- only have to #include <jemalloc/jemalloc.h>.
- - Add support for additional platforms:
- + Bitrig
- + Cygwin
- + DragonFlyBSD
- + iOS
- + OpenBSD
- + OpenRISC/or1k
-
- Optimizations:
- - Maintain dirty runs in per arena LRUs rather than in per arena trees of
- dirty-run-containing chunks. In practice this change significantly reduces
- dirty page purging volume.
- - Integrate whole chunks into the unused dirty page purging machinery. This
- reduces the cost of repeated huge allocation/deallocation, because it
- effectively introduces a cache of chunks.
- - Split the arena chunk map into two separate arrays, in order to increase
- cache locality for the frequently accessed bits.
- - Move small run metadata out of runs, into arena chunk headers. This reduces
- run fragmentation, smaller runs reduce external fragmentation for small size
- classes, and packed (less uniformly aligned) metadata layout improves CPU
- cache set distribution.
- - Randomly distribute large allocation base pointer alignment relative to page
- boundaries in order to more uniformly utilize CPU cache sets. This can be
- disabled via the --disable-cache-oblivious configure option, and queried via
- the "config.cache_oblivious" mallctl.
- - Micro-optimize the fast paths for the public API functions.
- - Refactor thread-specific data to reside in a single structure. This assures
- that only a single TLS read is necessary per call into the public API.
- - Implement in-place huge allocation growing and shrinking.
- - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
- additional optimizations that reduce maximum lookup depth to one or two
- levels. This resolves what was a concurrency bottleneck for per arena huge
- allocation, because a global data structure is critical for determining
- which arenas own which huge allocations.
-
- Incompatible changes:
- - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
- warnings by default.
- - Assure that the constness of malloc_usable_size()'s return type matches that
- of the system implementation.
- - Change the heap profile dump format to support per thread heap profiling,
- rename pprof to jeprof, and enhance it with the --thread=<n> option. As a
- result, the bundled jeprof must now be used rather than the upstream
- (gperftools) pprof.
- - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
- internally deadlock on some platforms.
- - Change the "arenas.nlruns" mallctl type from size_t to unsigned.
- - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
- "stats.arenas.<i>.bins.<j>.curregs".
- - Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
- - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
- MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.
-
- Removed features:
- - Remove the *allocm() API, which is superseded by the *allocx() API.
- - Remove the --enable-dss options, and make dss non-optional on all platforms
- which support sbrk(2).
- - Remove the "arenas.purge" mallctl, which was obsoleted by the
- "arena.<i>.purge" mallctl in 3.1.0.
- - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
- detects whether it is running inside Valgrind.
- - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
- "stats.huge.ndalloc" mallctls.
- - Remove the --enable-mremap option.
- - Remove the "stats.chunks.current", "stats.chunks.total", and
- "stats.chunks.high" mallctls.
-
- Bug fixes:
- - Fix the cactive statistic to decrease (rather than increase) when active
- memory decreases. This regression was first released in 3.5.0.
- - Fix OOM handling in memalign() and valloc(). A variant of this bug existed
- in all releases since 2.0.0, which introduced these functions.
- - Fix an OOM-related regression in arena_tcache_fill_small(), which could
- cause cache corruption on OOM. This regression was present in all releases
- from 2.2.0 through 3.6.0.
- - Fix size class overflow handling for malloc(), posix_memalign(), memalign(),
- calloc(), and realloc() when profiling is enabled.
- - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
- "secondary" precedence is specified, but sbrk(2) is not supported.
- - Fix fallback lg_floor() implementations to handle extremely large inputs.
- - Ensure the default purgeable zone is after the default zone on OS X.
- - Fix latent bugs in atomic_*().
- - Fix the "arena.<i>.dss" mallctl to handle read-only calls.
- - Fix tls_model configuration to enable the initial-exec model when possible.
- - Mark malloc_conf as a weak symbol so that the application can override it.
- - Correctly detect glibc's adaptive pthread mutexes.
- - Fix the --without-export configure option.
-
-* 3.6.0 (March 31, 2014)
-
- This version contains a critical bug fix for a regression present in 3.5.0 and
- 3.5.1.
-
- Bug fixes:
- - Fix a regression in arena_chunk_alloc() that caused crashes during
- small/large allocation if chunk allocation failed. In the absence of this
- bug, chunk allocation failure would result in allocation failure, e.g. NULL
- return from malloc(). This regression was introduced in 3.5.0.
- - Fix backtracing for gcc intrinsics-based backtracing by specifying
- -fno-omit-frame-pointer to gcc. Note that the application (and all the
- libraries it links to) must also be compiled with this option for
- backtracing to be reliable.
- - Use dss allocation precedence for huge allocations as well as small/large
- allocations.
- - Fix test assertion failure message formatting. This bug did not manifest on
- x86_64 systems because of implementation subtleties in va_list.
- - Fix inconsequential test failures for hash and SFMT code.
-
- New features:
- - Support heap profiling on FreeBSD. This feature depends on the proc
- filesystem being mounted during heap profile dumping.
-
-* 3.5.1 (February 25, 2014)
-
- This version primarily addresses minor bugs in test code.
-
- Bug fixes:
- - Configure Solaris/Illumos to use MADV_FREE.
- - Fix junk filling for mremap(2)-based huge reallocation. This is only
- relevant if configuring with the --enable-mremap option specified.
- - Avoid compilation failure if 'restrict' C99 keyword is not supported by the
- compiler.
- - Add a configure test for SSE2 rather than assuming it is usable on i686
- systems. This fixes test compilation errors, especially on 32-bit Linux
- systems.
- - Fix mallctl argument size mismatches (size_t vs. uint64_t) in the stats unit
- test.
- - Fix/remove flawed alignment-related overflow tests.
- - Prevent compiler optimizations that could change backtraces in the
- prof_accum unit test.
-
-* 3.5.0 (January 22, 2014)
-
- This version focuses on refactoring and automated testing, though it also
- includes some non-trivial heap profiling optimizations not mentioned below.
-
- New features:
- - Add the *allocx() API, which is a successor to the experimental *allocm()
- API. The *allocx() functions are slightly simpler to use because they have
- fewer parameters, they directly return the results of primary interest, and
- mallocx()/rallocx() avoid the strict aliasing pitfall that
- allocm()/rallocm() share with posix_memalign(). Note that *allocm() is
- slated for removal in the next non-bugfix release.
- - Add support for LinuxThreads.
-
- Bug fixes:
- - Unless heap profiling is enabled, disable floating point code and don't link
- with libm. This, in combination with e.g. EXTRA_CFLAGS=-mno-sse on x64
- systems, makes it possible to completely disable floating point register
- use. Some versions of glibc neglect to save/restore caller-saved floating
- point registers during dynamic lazy symbol loading, and the symbol loading
- code uses whatever malloc the application happens to have linked/loaded
- with, the result being potential floating point register corruption.
- - Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
- backtrace creation in imemalign(). This bug impacted posix_memalign() and
- aligned_alloc().
- - Fix a file descriptor leak in a prof_dump_maps() error path.
- - Fix prof_dump() to close the dump file descriptor for all relevant error
- paths.
- - Fix rallocm() to use the arena specified by the ALLOCM_ARENA(s) flag for
- allocation, not just deallocation.
- - Fix a data race for large allocation stats counters.
- - Fix a potential infinite loop during thread exit. This bug occurred on
- Solaris, and could affect other platforms with similar pthreads TSD
- implementations.
- - Don't junk-fill reallocations unless usable size changes. This fixes a
- violation of the *allocx()/*allocm() semantics.
- - Fix growing large reallocation to junk fill new space.
- - Fix huge deallocation to junk fill when munmap is disabled.
- - Change the default private namespace prefix from empty to je_, and change
- --with-private-namespace-prefix so that it prepends an additional prefix
- rather than replacing je_. This reduces the likelihood of applications
- which statically link jemalloc experiencing symbol name collisions.
- - Add missing private namespace mangling (relevant when
- --with-private-namespace is specified).
- - Add and use JEMALLOC_INLINE_C so that static inline functions are marked as
- static even for debug builds.
- - Add a missing mutex unlock in a malloc_init_hard() error path. In practice
- this error path is never executed.
- - Fix numerous bugs in malloc_strotumax() error handling/reporting. These
- bugs had no impact except for malformed inputs.
- - Fix numerous bugs in malloc_snprintf(). These bugs were not exercised by
- existing calls, so they had no impact.
-
-* 3.4.1 (October 20, 2013)
-
- Bug fixes:
- - Fix a race in the "arenas.extend" mallctl that could cause memory corruption
- of internal data structures and subsequent crashes.
- - Fix Valgrind integration flaws that caused Valgrind warnings about reads of
- uninitialized memory in:
- + arena chunk headers
- + internal zero-initialized data structures (relevant to tcache and prof
- code)
- - Preserve errno during the first allocation. A readlink(2) call during
- initialization fails unless /etc/malloc.conf exists, so errno was typically
- set during the first allocation prior to this fix.
- - Fix compilation warnings reported by gcc 4.8.1.
-
-* 3.4.0 (June 2, 2013)
-
- This version is essentially a small bugfix release, but the addition of
- aarch64 support requires that the minor version be incremented.
-
- Bug fixes:
- - Fix race-triggered deadlocks in chunk_record(). These deadlocks were
- typically triggered by multiple threads concurrently deallocating huge
- objects.
-
- New features:
- - Add support for the aarch64 architecture.
-
-* 3.3.1 (March 6, 2013)
-
- This version fixes bugs that are typically encountered only when utilizing
- custom run-time options.
-
- Bug fixes:
- - Fix a locking order bug that could cause deadlock during fork if heap
- profiling were enabled.
- - Fix a chunk recycling bug that could cause the allocator to lose track of
- whether a chunk was zeroed. On FreeBSD, NetBSD, and OS X, it could cause
- corruption if allocating via sbrk(2) (unlikely unless running with the
- "dss:primary" option specified). This was completely harmless on Linux
- unless using mlockall(2) (and unlikely even then, unless the
- --disable-munmap configure option or the "dss:primary" option was
- specified). This regression was introduced in 3.1.0 by the
- mlockall(2)/madvise(2) interaction fix.
- - Fix TLS-related memory corruption that could occur during thread exit if the
- thread never allocated memory. Only the quarantine and prof facilities were
- susceptible.
- - Fix two quarantine bugs:
- + Internal reallocation of the quarantined object array leaked the old
- array.
- + Reallocation failure for internal reallocation of the quarantined object
- array (very unlikely) resulted in memory corruption.
- - Fix Valgrind integration to annotate all internally allocated memory in a
- way that keeps Valgrind happy about internal data structure access.
- - Fix building for s390 systems.
-
-* 3.3.0 (January 23, 2013)
-
- This version includes a few minor performance improvements in addition to the
- listed new features and bug fixes.
-
- New features:
- - Add clipping support to lg_chunk option processing.
- - Add the --enable-ivsalloc option.
- - Add the --without-export option.
- - Add the --disable-zone-allocator option.
-
- Bug fixes:
- - Fix "arenas.extend" mallctl to output the number of arenas.
- - Fix chunk_recycle() to unconditionally inform Valgrind that returned memory
- is undefined.
- - Fix build break on FreeBSD related to alloca.h.
-
-* 3.2.0 (November 9, 2012)
-
- In addition to a couple of bug fixes, this version modifies page run
- allocation and dirty page purging algorithms in order to better control
- page-level virtual memory fragmentation.
-
- Incompatible changes:
- - Change the "opt.lg_dirty_mult" default from 5 to 3 (32:1 to 8:1).
-
- Bug fixes:
- - Fix dss/mmap allocation precedence code to use recyclable mmap memory only
- after primary dss allocation fails.
- - Fix deadlock in the "arenas.purge" mallctl. This regression was introduced
- in 3.1.0 by the addition of the "arena.<i>.purge" mallctl.
-
-* 3.1.0 (October 16, 2012)
-
- New features:
- - Auto-detect whether running inside Valgrind, thus removing the need to
- manually specify MALLOC_CONF=valgrind:true.
- - Add the "arenas.extend" mallctl, which allows applications to create
- manually managed arenas.
- - Add the ALLOCM_ARENA() flag for {,r,d}allocm().
- - Add the "opt.dss", "arena.<i>.dss", and "stats.arenas.<i>.dss" mallctls,
- which provide control over dss/mmap precedence.
- - Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".
- - Define LG_QUANTUM for hppa.
-
- Incompatible changes:
- - Disable tcache by default if running inside Valgrind, in order to avoid
- making unallocated objects appear reachable to Valgrind.
- - Drop const from malloc_usable_size() argument on Linux.
-
- Bug fixes:
- - Fix heap profiling crash if sampled object is freed via realloc(p, 0).
- - Remove const from __*_hook variable declarations, so that glibc can modify
- them during process forking.
- - Fix mlockall(2)/madvise(2) interaction.
- - Fix fork(2)-related deadlocks.
- - Fix error return value for "thread.tcache.enabled" mallctl.
-
-* 3.0.0 (May 11, 2012)
-
- Although this version adds some major new features, the primary focus is on
- internal code cleanup that facilitates maintainability and portability, most
- of which is not reflected in the ChangeLog. This is the first release to
- incorporate substantial contributions from numerous other developers, and the
- result is a more broadly useful allocator (see the git revision history for
- contribution details). Note that the license has been unified, thanks to
- Facebook granting a license under the same terms as the other copyright
- holders (see COPYING).
-
- New features:
- - Implement Valgrind support, redzones, and quarantine.
- - Add support for additional platforms:
- + FreeBSD
- + Mac OS X Lion
- + MinGW
- + Windows (no support yet for replacing the system malloc)
- - Add support for additional architectures:
- + MIPS
- + SH4
- + Tilera
- - Add support for cross compiling.
- - Add nallocm(), which rounds a request size up to the nearest size class
- without actually allocating.
- - Implement aligned_alloc() (blame C11).
- - Add the "thread.tcache.enabled" mallctl.
- - Add the "opt.prof_final" mallctl.
- - Update pprof (from gperftools 2.0).
- - Add the --with-mangling option.
- - Add the --disable-experimental option.
- - Add the --disable-munmap option, and make it the default on Linux.
- - Add the --enable-mremap option, which disables use of mremap(2) by default.
-
- Incompatible changes:
- - Enable stats by default.
- - Enable fill by default.
- - Disable lazy locking by default.
- - Rename the "tcache.flush" mallctl to "thread.tcache.flush".
- - Rename the "arenas.pagesize" mallctl to "arenas.page".
- - Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB).
- - Change the "opt.prof_accum" default from true to false.
-
- Removed features:
- - Remove the swap feature, including the "config.swap", "swap.avail",
- "swap.prezeroed", "swap.nfds", and "swap.fds" mallctls.
- - Remove highruns statistics, including the
- "stats.arenas.<i>.bins.<j>.highruns" and
- "stats.arenas.<i>.lruns.<j>.highruns" mallctls.
- - As part of small size class refactoring, remove the "opt.lg_[qc]space_max",
- "arenas.cacheline", "arenas.subpage", "arenas.[tqcs]space_{min,max}", and
- "arenas.[tqcs]bins" mallctls.
- - Remove the "arenas.chunksize" mallctl.
- - Remove the "opt.lg_prof_tcmax" option.
- - Remove the "opt.lg_prof_bt_max" option.
- - Remove the "opt.lg_tcache_gc_sweep" option.
- - Remove the --disable-tiny option, including the "config.tiny" mallctl.
- - Remove the --enable-dynamic-page-shift configure option.
- - Remove the --enable-sysv configure option.
-
- Bug fixes:
- - Fix a statistics-related bug in the "thread.arena" mallctl that could cause
- invalid statistics and crashes.
- - Work around TLS deallocation via free() on Linux. This bug could cause
- write-after-free memory corruption.
- - Fix a potential deadlock that could occur during interval- and
- growth-triggered heap profile dumps.
- - Fix large calloc() zeroing bugs due to dropping chunk map unzeroed flags.
- - Fix chunk_alloc_dss() to stop claiming memory is zeroed. This bug could
- cause memory corruption and crashes with --enable-dss specified.
- - Fix fork-related bugs that could cause deadlock in children between fork
- and exec.
- - Fix malloc_stats_print() to honor 'b' and 'l' in the opts parameter.
- - Fix realloc(p, 0) to act like free(p).
- - Do not enforce minimum alignment in memalign().
- - Check for NULL pointer in malloc_usable_size().
- - Fix an off-by-one heap profile statistics bug that could be observed in
- interval- and growth-triggered heap profiles.
- - Fix the "epoch" mallctl to update cached stats even if the passed in epoch
- is 0.
- - Fix bin->runcur management to fix a layout policy bug. This bug did not
- affect correctness.
- - Fix a bug in choose_arena_hard() that potentially caused more arenas to be
- initialized than necessary.
- - Add missing "opt.lg_tcache_max" mallctl implementation.
- - Use glibc allocator hooks to make mixed allocator usage less likely.
- - Fix build issues for --disable-tcache.
- - Don't mangle pthread_create() when --with-private-namespace is specified.
-
-* 2.2.5 (November 14, 2011)
-
- Bug fixes:
- - Fix huge_ralloc() race when using mremap(2). This is a serious bug that
- could cause memory corruption and/or crashes.
- - Fix huge_ralloc() to maintain chunk statistics.
- - Fix malloc_stats_print(..., "a") output.
-
-* 2.2.4 (November 5, 2011)
-
- Bug fixes:
- - Initialize arenas_tsd before using it. This bug existed for 2.2.[0-3], as
- well as for --disable-tls builds in earlier releases.
- - Do not assume a 4 KiB page size in test/rallocm.c.
-
-* 2.2.3 (August 31, 2011)
-
- This version fixes numerous bugs related to heap profiling.
-
- Bug fixes:
- - Fix a prof-related race condition. This bug could cause memory corruption,
- but only occurred in non-default configurations (prof_accum:false).
- - Fix off-by-one backtracing issues (make sure that prof_alloc_prep() is
- excluded from backtraces).
- - Fix a prof-related bug in realloc() (only triggered by OOM errors).
- - Fix prof-related bugs in allocm() and rallocm().
- - Fix prof_tdata_cleanup() for --disable-tls builds.
- - Fix a relative include path, to fix objdir builds.
-
-* 2.2.2 (July 30, 2011)
-
- Bug fixes:
- - Fix a build error for --disable-tcache.
- - Fix assertions in arena_purge() (for real this time).
- - Add the --with-private-namespace option. This is a workaround for symbol
- conflicts that can inadvertently arise when using static libraries.
-
-* 2.2.1 (March 30, 2011)
-
- Bug fixes:
- - Implement atomic operations for x86/x64. This fixes compilation failures
- for versions of gcc that are still in wide use.
- - Fix an assertion in arena_purge().
-
-* 2.2.0 (March 22, 2011)
-
- This version incorporates several improvements to algorithms and data
- structures that tend to reduce fragmentation and increase speed.
-
- New features:
- - Add the "stats.cactive" mallctl.
- - Update pprof (from google-perftools 1.7).
- - Improve backtracing-related configuration logic, and add the
- --disable-prof-libgcc option.
-
- Bug fixes:
- - Change default symbol visibility from "internal", to "hidden", which
- decreases the overhead of library-internal function calls.
- - Fix symbol visibility so that it is also set on OS X.
- - Fix a build dependency regression caused by the introduction of the .pic.o
- suffix for PIC object files.
- - Add missing checks for mutex initialization failures.
- - Don't use libgcc-based backtracing except on x64, where it is known to work.
- - Fix deadlocks on OS X that were due to memory allocation in
- pthread_mutex_lock().
- - Heap profiling-specific fixes:
- + Fix memory corruption due to integer overflow in small region index
- computation, when using a small enough sample interval that profiling
- context pointers are stored in small run headers.
- + Fix a bootstrap ordering bug that only occurred with TLS disabled.
- + Fix a rallocm() rsize bug.
- + Fix error detection bugs for aligned memory allocation.
-
-* 2.1.3 (March 14, 2011)
-
- Bug fixes:
- - Fix a cpp logic regression (due to the "thread.{de,}allocatedp" mallctl fix
- for OS X in 2.1.2).
- - Fix a "thread.arena" mallctl bug.
- - Fix a thread cache stats merging bug.
-
-* 2.1.2 (March 2, 2011)
-
- Bug fixes:
- - Fix "thread.{de,}allocatedp" mallctl for OS X.
- - Add missing jemalloc.a to build system.
-
-* 2.1.1 (January 31, 2011)
-
- Bug fixes:
- - Fix aligned huge reallocation (affected allocm()).
- - Fix the ALLOCM_LG_ALIGN macro definition.
- - Fix a heap dumping deadlock.
- - Fix a "thread.arena" mallctl bug.
-
-* 2.1.0 (December 3, 2010)
-
- This version incorporates some optimizations that can't quite be considered
- bug fixes.
-
- New features:
- - Use Linux's mremap(2) for huge object reallocation when possible.
- - Avoid locking in mallctl*() when possible.
- - Add the "thread.[de]allocatedp" mallctl's.
- - Convert the manual page source from roff to DocBook, and generate both roff
- and HTML manuals.
-
- Bug fixes:
- - Fix a crash due to incorrect bootstrap ordering. This only impacted
- --enable-debug --enable-dss configurations.
- - Fix a minor statistics bug for mallctl("swap.avail", ...).
-
-* 2.0.1 (October 29, 2010)
-
- Bug fixes:
- - Fix a race condition in heap profiling that could cause undefined behavior
- if "opt.prof_accum" were disabled.
- - Add missing mutex unlocks for some OOM error paths in the heap profiling
- code.
- - Fix a compilation error for non-C99 builds.
-
-* 2.0.0 (October 24, 2010)
-
- This version focuses on the experimental *allocm() API, and on improved
- run-time configuration/introspection. Nonetheless, numerous performance
- improvements are also included.
-
- New features:
- - Implement the experimental {,r,s,d}allocm() API, which provides a superset
- of the functionality available via malloc(), calloc(), posix_memalign(),
- realloc(), malloc_usable_size(), and free(). These functions can be used to
- allocate/reallocate aligned zeroed memory, ask for optional extra memory
- during reallocation, prevent object movement during reallocation, etc.
- - Replace JEMALLOC_OPTIONS/JEMALLOC_PROF_PREFIX with MALLOC_CONF, which is
- more human-readable, and more flexible. For example:
- JEMALLOC_OPTIONS=AJP
- is now:
- MALLOC_CONF=abort:true,fill:true,stats_print:true
- - Port to Apple OS X. Sponsored by Mozilla.
- - Make it possible for the application to control thread-->arena mappings via
- the "thread.arena" mallctl.
- - Add compile-time support for all TLS-related functionality via pthreads TSD.
- This is mainly of interest for OS X, which does not support TLS, but has a
- TSD implementation with similar performance.
- - Override memalign() and valloc() if they are provided by the system.
- - Add the "arenas.purge" mallctl, which can be used to synchronously purge all
- dirty unused pages.
- - Make cumulative heap profiling data optional, so that it is possible to
- limit the amount of memory consumed by heap profiling data structures.
- - Add per thread allocation counters that can be accessed via the
- "thread.allocated" and "thread.deallocated" mallctls.
-
- Incompatible changes:
- - Remove JEMALLOC_OPTIONS and malloc_options (see MALLOC_CONF above).
- - Increase default backtrace depth from 4 to 128 for heap profiling.
- - Disable interval-based profile dumps by default.
-
- Bug fixes:
- - Remove bad assertions in fork handler functions. These assertions could
- cause aborts for some combinations of configure settings.
- - Fix strerror_r() usage to deal with non-standard semantics in GNU libc.
- - Fix leak context reporting. This bug tended to cause the number of contexts
- to be underreported (though the reported number of objects and bytes were
- correct).
- - Fix a realloc() bug for large in-place growing reallocation. This bug could
- cause memory corruption, but it was hard to trigger.
- - Fix an allocation bug for small allocations that could be triggered if
- multiple threads raced to create a new run of backing pages.
- - Enhance the heap profiler to trigger samples based on usable size, rather
- than request size.
- - Fix a heap profiling bug due to sometimes losing track of requested object
- size for sampled objects.
-
-* 1.0.3 (August 12, 2010)
-
- Bug fixes:
- - Fix the libunwind-based implementation of stack backtracing (used for heap
- profiling). This bug could cause zero-length backtraces to be reported.
- - Add a missing mutex unlock in library initialization code. If multiple
- threads raced to initialize malloc, some of them could end up permanently
- blocked.
-
-* 1.0.2 (May 11, 2010)
-
- Bug fixes:
- - Fix junk filling of large objects, which could cause memory corruption.
- - Add MAP_NORESERVE support for chunk mapping, because otherwise virtual
- memory limits could cause swap file configuration to fail. Contributed by
- Jordan DeLong.
-
-* 1.0.1 (April 14, 2010)
-
- Bug fixes:
- - Fix compilation when --enable-fill is specified.
- - Fix threads-related profiling bugs that affected accuracy and caused memory
- to be leaked during thread exit.
- - Fix dirty page purging race conditions that could cause crashes.
- - Fix crash in tcache flushing code during thread destruction.
-
-* 1.0.0 (April 11, 2010)
-
- This release focuses on speed and run-time introspection. Numerous
- algorithmic improvements make this release substantially faster than its
- predecessors.
-
- New features:
- - Implement autoconf-based configuration system.
- - Add mallctl*(), for the purposes of introspection and run-time
- configuration.
- - Make it possible for the application to manually flush a thread's cache, via
- the "tcache.flush" mallctl.
- - Base maximum dirty page count on proportion of active memory.
- - Compute various additional run-time statistics, including per size class
- statistics for large objects.
- - Expose malloc_stats_print(), which can be called repeatedly by the
- application.
- - Simplify the malloc_message() signature to only take one string argument,
- and incorporate an opaque data pointer argument for use by the application
- in combination with malloc_stats_print().
- - Add support for allocation backed by one or more swap files, and allow the
- application to disable over-commit if swap files are in use.
- - Implement allocation profiling and leak checking.
-
- Removed features:
- - Remove the dynamic arena rebalancing code, since thread-specific caching
- reduces its utility.
-
- Bug fixes:
- - Modify chunk allocation to work when address space layout randomization
- (ASLR) is in use.
- - Fix thread cleanup bugs related to TLS destruction.
- - Handle 0-size allocation requests in posix_memalign().
- - Fix a chunk leak. The leaked chunks were never touched, so this impacted
- virtual memory usage, but not physical memory usage.
-
-* linux_2008082[78]a (August 27/28, 2008)
-
- These snapshot releases are the simple result of incorporating Linux-specific
- support into the FreeBSD malloc sources.
-
---------------------------------------------------------------------------------
-vim:filetype=text:textwidth=80
+Following are change highlights associated with official releases. Important
+bug fixes are all mentioned, but some internal enhancements are omitted here for
+brevity. Much more detail can be found in the git revision history:
+
+ https://github.com/jemalloc/jemalloc
+
+* 5.2.1 (August 5, 2019)
+
+ This release is primarily about Windows. A critical virtual memory leak is
+ resolved on all Windows platforms. The regression was present in all releases
+ since 5.0.0.
+
+ Bug fixes:
+ - Fix a severe virtual memory leak on Windows. This regression was first
+ released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt,
+ @interwq)
+ - Fix size 0 handling in posix_memalign(). This regression was first released
+ in 5.2.0. (@interwq)
+ - Fix the prof_log unit test which may observe unexpected backtraces from
+ compiler optimizations. The test was first added in 5.2.0. (@marxin,
+ @gnzlbg, @interwq)
+ - Fix the declaration of the extent_avail tree. This regression was first
+ released in 5.1.0. (@zoulasc)
+ - Fix an incorrect reference in jeprof. This functionality was first released
+ in 3.0.0. (@prehistoric-penguin)
+ - Fix an assertion on the deallocation fast-path. This regression was first
+ released in 5.2.0. (@yinan1048576)
+ - Fix the TLS_MODEL attribute in headers. This regression was first released
+ in 5.0.0. (@zoulasc, @interwq)
+
+ Optimizations and refactors:
+ - Implement opt.retain on Windows and enable by default on 64-bit. (@interwq,
+ @davidtgoldblatt)
+ - Optimize away a branch on the operator delete[] path. (@mgrice)
+ - Add format annotation to the format generator function. (@zoulasc)
+ - Refactor and improve the size class header generation. (@yinan1048576)
+ - Remove best fit. (@djwatson)
+ - Avoid blocking on background thread locks for stats. (@oranagra, @interwq)
+
+* 5.2.0 (April 2, 2019)
+
+ This release includes a few notable improvements, which are summarized below:
+ 1) improved fast-path performance from the optimizations by @djwatson; 2)
+ reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on
+ setting the number of background threads. In addition, peak / spike memory
+ usage is improved with certain allocation patterns. As usual, the release and
+ prior dev versions have gone through large-scale production testing.
+
+ New features:
+ - Implement oversize_threshold, which uses a dedicated arena for allocations
+ crossing the specified threshold to reduce fragmentation. (@interwq)
+ - Add extents usage information to stats. (@tyleretzel)
+ - Log time information for sampled allocations. (@tyleretzel)
+ - Support 0 size in sdallocx. (@djwatson)
+ - Output rate for certain counters in malloc_stats. (@zinoale)
+ - Add configure option --enable-readlinkat, which allows the use of readlinkat
+ over readlink. (@davidtgoldblatt)
+ - Add configure options --{enable,disable}-{static,shared} to allow not
+ building unwanted libraries. (@Ericson2314)
+ - Add configure option --disable-libdl to enable fully static builds.
+ (@interwq)
+ - Add mallctl interfaces:
+ + opt.oversize_threshold (@interwq)
+ + stats.arenas.<i>.extent_avail (@tyleretzel)
+ + stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel)
+ + stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes
+ (@tyleretzel)
+
+ Portability improvements:
+ - Update MSVC builds. (@maksqwe, @rustyx)
+ - Workaround a compiler optimizer bug on s390x. (@rkmisra)
+ - Make use of pthread_set_name_np(3) on FreeBSD. (@trasz)
+ - Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada)
+ - Link against -pthread instead of -lpthread. (@paravoid)
+ - Make background_thread not dependent on libdl. (@interwq)
+ - Add stringify to fix a linker directive issue on MSVC. (@daverigby)
+ - Detect and fall back when 8-bit atomics are unavailable. (@interwq)
+ - Fall back to the default pthread_create if dlsym(3) fails. (@interwq)
+
+ Optimizations and refactors:
+ - Refactor the TSD module. (@davidtgoldblatt)
+ - Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq)
+ - Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq)
+ - Optimize ixalloc by avoiding a size lookup. (@interwq)
+ - Implement opt.oversize_threshold which uses a dedicated arena for requests
+ crossing the threshold, also eagerly purges the oversize extents. Default
+ the threshold to 8 MiB. (@interwq)
+ - Clean compilation with -Wextra. (@gnzlbg, @jasone)
+ - Refactor the size class module. (@davidtgoldblatt)
+ - Refactor the stats emitter. (@tyleretzel)
+ - Optimize pow2_ceil. (@rkmisra)
+ - Avoid runtime detection of lazy purging on FreeBSD. (@trasz)
+ - Optimize mmap(2) alignment handling on FreeBSD. (@trasz)
+ - Improve error handling for THP state initialization. (@jsteemann)
+ - Rework the malloc() fast path. (@djwatson)
+ - Rework the free() fast path. (@djwatson)
+ - Refactor and optimize the tcache fill / flush paths. (@djwatson)
+ - Optimize sync / lwsync on PowerPC. (@chmeeedalf)
+ - Bypass extent_dalloc() when retain is enabled. (@interwq)
+ - Optimize the locking on large deallocation. (@interwq)
+ - Reduce the number of pages committed from sanity checking in debug build.
+ (@trasz, @interwq)
+ - Deprecate OSSpinLock. (@interwq)
+ - Lower the default number of background threads to 4 (when the feature
+ is enabled). (@interwq)
+ - Optimize the trylock spin wait. (@djwatson)
+ - Use arena index for arena-matching checks. (@interwq)
+ - Avoid forced decay on thread termination when using background threads.
+ (@interwq)
+ - Disable muzzy decay by default. (@djwatson, @interwq)
+ - Only initialize libgcc unwinder when profiling is enabled. (@paravoid,
+ @interwq)
+
+ Bug fixes (all only relevant to jemalloc 5.x):
+ - Fix background thread index issues with max_background_threads. (@djwatson,
+ @interwq)
+ - Fix stats output for opt.lg_extent_max_active_fit. (@interwq)
+ - Fix opt.prof_prefix initialization. (@davidtgoldblatt)
+ - Properly trigger decay on tcache destroy. (@interwq, @amosbird)
+ - Fix tcache.flush. (@interwq)
+ - Detect whether explicit extent zero out is necessary with huge pages or
+ custom extent hooks, which may change the purge semantics. (@interwq)
+ - Fix a side effect caused by extent_max_active_fit combined with decay-based
+ purging, where freed extents can accumulate and not be reused for an
+ extended period of time. (@interwq, @mpghf)
+ - Fix a missing unlock on extent register error handling. (@zoulasc)
+
+ Testing:
+ - Simplify the Travis script output. (@gnzlbg)
+ - Update the test scripts for FreeBSD. (@devnexen)
+ - Add unit tests for the producer-consumer pattern. (@interwq)
+ - Add Cirrus-CI config for FreeBSD builds. (@jasone)
+ - Add size-matching sanity checks on tcache flush. (@davidtgoldblatt,
+ @interwq)
+
+ Incompatible changes:
+ - Remove --with-lg-page-sizes. (@davidtgoldblatt)
+
+ Documentation:
+ - Attempt to build docs by default, however skip doc building when xsltproc
+ is missing. (@interwq, @cmuellner)
+
+* 5.1.0 (May 4, 2018)
+
+ This release is primarily about fine-tuning, ranging from several new features
+ to numerous notable performance and portability enhancements. The release and
+ prior dev versions have been running in multiple large scale applications for
+ months, and the cumulative improvements are substantial in many cases.
+
+ Given the long and successful production runs, this release is likely a good
+ candidate for applications to upgrade, from both jemalloc 5.0 and before. For
+ performance-critical applications, the newly added TUNING.md provides
+ guidelines on jemalloc tuning.
+
+ New features:
+ - Implement transparent huge page support for internal metadata. (@interwq)
+ - Add opt.thp to allow enabling / disabling transparent huge pages for all
+ mappings. (@interwq)
+ - Add maximum background thread count option. (@djwatson)
+ - Allow prof_active to control opt.lg_prof_interval and prof.gdump.
+ (@interwq)
+ - Allow arena index lookup based on allocation addresses via mallctl.
+ (@lionkov)
+ - Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD)
+ - Add opt.lg_extent_max_active_fit to set the max ratio between the size of
+ the active extent selected (to split off from) and the size of the requested
+ allocation. (@interwq, @davidtgoldblatt)
+ - Add retain_grow_limit to set the max size when growing virtual address
+ space. (@interwq)
+ - Add mallctl interfaces:
+ + arena.<i>.retain_grow_limit (@interwq)
+ + arenas.lookup (@lionkov)
+ + max_background_threads (@djwatson)
+ + opt.lg_extent_max_active_fit (@interwq)
+ + opt.max_background_threads (@djwatson)
+ + opt.metadata_thp (@interwq)
+ + opt.thp (@interwq)
+ + stats.metadata_thp (@interwq)
+
+ Portability improvements:
+ - Support GNU/kFreeBSD configuration. (@paravoid)
+ - Support m68k, nios2 and SH3 architectures. (@paravoid)
+ - Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo)
+ - Fix symbol listing for cross-compiling. (@tamird)
+ - Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid)
+ - Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin)
+ - Fix MSVC 2015 & 2017 builds. (@rustyx)
+ - Improve RISC-V support. (@EdSchouten)
+ - Set name mangling script in strict mode. (@nicolov)
+ - Avoid MADV_HUGEPAGE on ARM. (@marxin)
+ - Modify configure to determine return value of strerror_r.
+ (@davidtgoldblatt, @cferris1000)
+ - Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani)
+ - Fix 32-bit build on MSVC. (@rustyx)
+ - Fix external symbol on MSVC. (@maksqwe)
+ - Avoid a printf format specifier warning. (@jasone)
+ - Add configure option --disable-initial-exec-tls which can allow jemalloc to
+ be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD)
+ - AArch64: Add ILP32 support. (@cmuellner)
+ - Add --with-lg-vaddr configure option to support cross compiling.
+ (@cmuellner, @davidtgoldblatt)
+
+ Optimizations and refactors:
+ - Improve active extent fit with extent_max_active_fit. This considerably
+ reduces fragmentation over time and improves virtual memory and metadata
+ usage. (@davidtgoldblatt, @interwq)
+ - Eagerly coalesce large extents to reduce fragmentation. (@interwq)
+ - sdallocx: only read size info when page aligned (i.e. possibly sampled),
+ which speeds up the sized deallocation path significantly. (@interwq)
+ - Avoid attempting new mappings for in place expansion with retain, since
+ it rarely succeeds in practice and causes high overhead. (@interwq)
+ - Refactor OOM handling in newImpl. (@wqfish)
+ - Add internal fine-grained logging functionality for debugging use.
+ (@davidtgoldblatt)
+ - Refactor arena / tcache interactions. (@davidtgoldblatt)
+ - Refactor extent management with dumpable flag. (@davidtgoldblatt)
+ - Add runtime detection of lazy purging. (@interwq)
+ - Use pairing heap instead of red-black tree for extents_avail. (@djwatson)
+ - Use sysctl on startup in FreeBSD. (@trasz)
+ - Use thread local prng state instead of atomic. (@djwatson)
+ - Make decay to always purge one more extent than before, because in
+ practice large extents are usually the ones that cross the decay threshold.
+ Purging the additional extent helps save memory as well as reduce VM
+ fragmentation. (@interwq)
+ - Fast division by dynamic values. (@davidtgoldblatt)
+ - Improve the fit for aligned allocation. (@interwq, @edwinsmith)
+ - Refactor extent_t bitpacking. (@rkmisra)
+ - Optimize the generated assembly for ticker operations. (@davidtgoldblatt)
+ - Convert stats printing to use a structured text emitter. (@davidtgoldblatt)
+ - Remove preserve_lru feature for extents management. (@djwatson)
+ - Consolidate two memory loads into one on the fast deallocation path.
+ (@davidtgoldblatt, @interwq)
+
+ Bug fixes (most of the issues are only relevant to jemalloc 5.0):
+ - Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt)
+ - Validate returned file descriptor before use. (@zonyitoo)
+ - Fix a few background thread initialization and shutdown issues. (@interwq)
+ - Fix an extent coalesce + decay race by taking both coalescing extents off
+ the LRU list. (@interwq)
+ - Fix potentially unbound increase during decay, caused by one thread keep
+ stashing memory to purge while other threads generating new pages. The
+ number of pages to purge is checked to prevent this. (@interwq)
+ - Fix a FreeBSD bootstrap assertion. (@strejda, @interwq)
+ - Handle 32 bit mutex counters. (@rkmisra)
+ - Fix a indexing bug when creating background threads. (@davidtgoldblatt,
+ @binliu19)
+ - Fix arguments passed to extent_init. (@yuleniwo, @interwq)
+ - Fix addresses used for ordering mutexes. (@rkmisra)
+ - Fix abort_conf processing during bootstrap. (@interwq)
+ - Fix include path order for out-of-tree builds. (@cmuellner)
+
+ Incompatible changes:
+ - Remove --disable-thp. (@interwq)
+ - Remove mallctl interfaces:
+ + config.thp (@interwq)
+
+ Documentation:
+ - Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson)
+
+* 5.0.1 (July 1, 2017)
+
+ This bugfix release fixes several issues, most of which are obscure enough
+ that typical applications are not impacted.
+
+ Bug fixes:
+ - Update decay->nunpurged before purging, in order to avoid potential update
+ races and subsequent incorrect purging volume. (@interwq)
+ - Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy
+ locking and/or background threads). This mitigates an initialization
+ failure bug for which we still do not have a clear reproduction test case.
+ (@interwq)
+ - Modify tsd management so that it neither crashes nor leaks if a thread's
+ only allocation activity is to call free() after TLS destructors have been
+ executed. This behavior was observed when operating with GNU libc, and is
+ unlikely to be an issue with other libc implementations. (@interwq)
+ - Mask signals during background thread creation. This prevents signals from
+ being inadvertently delivered to background threads. (@jasone,
+ @davidtgoldblatt, @interwq)
+ - Avoid inactivity checks within background threads, in order to prevent
+ recursive mutex acquisition. (@interwq)
+ - Fix extent_grow_retained() to use the specified hooks when the
+ arena.<i>.extent_hooks mallctl is used to override the default hooks.
+ (@interwq)
+ - Add missing reentrancy support for custom extent hooks which allocate.
+ (@interwq)
+ - Post-fork(2), re-initialize the list of tcaches associated with each arena
+ to contain no tcaches except the forking thread's. (@interwq)
+ - Add missing post-fork(2) mutex reinitialization for extent_grow_mtx. This
+ fixes potential deadlocks after fork(2). (@interwq)
+ - Enforce minimum autoconf version (currently 2.68), since 2.63 is known to
+ generate corrupt configure scripts. (@jasone)
+ - Ensure that the configured page size (--with-lg-page) is no larger than the
+ configured huge page size (--with-lg-hugepage). (@jasone)
+
+* 5.0.0 (June 13, 2017)
+
+ Unlike all previous jemalloc releases, this release does not use naturally
+ aligned "chunks" for virtual memory management, and instead uses page-aligned
+ "extents". This change has few externally visible effects, but the internal
+ impacts are... extensive. Many other internal changes combine to make this
+ the most cohesively designed version of jemalloc so far, with ample
+ opportunity for further enhancements.
+
+ Continuous integration is now an integral aspect of development thanks to the
+ efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably
+ stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a
+ side effect the official release frequency may decrease over time.
+
+ New features:
+ - Implement optional per-CPU arena support; threads choose which arena to use
+ based on current CPU rather than on fixed thread-->arena associations.
+ (@interwq)
+ - Implement two-phase decay of unused dirty pages. Pages transition from
+ dirty-->muzzy-->clean, where the first phase transition relies on
+ madvise(... MADV_FREE) semantics, and the second phase transition discards
+ pages such that they are replaced with demand-zeroed pages on next access.
+ (@jasone)
+ - Increase decay time resolution from seconds to milliseconds. (@jasone)
+ - Implement opt-in per CPU background threads, and use them for asynchronous
+ decay-driven unused dirty page purging. (@interwq)
+ - Add mutex profiling, which collects a variety of statistics useful for
+ diagnosing overhead/contention issues. (@interwq)
+ - Add C++ new/delete operator bindings. (@djwatson)
+ - Support manually created arena destruction, such that all data and metadata
+ are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats
+ associated with destroyed arenas. (@jasone)
+ - Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing
+ merged/destroyed arena statistics via mallctl. (@jasone)
+ - Add opt.abort_conf to optionally abort if invalid configuration options are
+ detected during initialization. (@interwq)
+ - Add opt.stats_print_opts, so that e.g. JSON output can be selected for the
+ stats dumped during exit if opt.stats_print is true. (@jasone)
+ - Add --with-version=VERSION for use when embedding jemalloc into another
+ project's git repository. (@jasone)
+ - Add --disable-thp to support cross compiling. (@jasone)
+ - Add --with-lg-hugepage to support cross compiling. (@jasone)
+ - Add mallctl interfaces (various authors):
+ + background_thread
+ + opt.abort_conf
+ + opt.retain
+ + opt.percpu_arena
+ + opt.background_thread
+ + opt.{dirty,muzzy}_decay_ms
+ + opt.stats_print_opts
+ + arena.<i>.initialized
+ + arena.<i>.destroy
+ + arena.<i>.{dirty,muzzy}_decay_ms
+ + arena.<i>.extent_hooks
+ + arenas.{dirty,muzzy}_decay_ms
+ + arenas.bin.<i>.slab_size
+ + arenas.nlextents
+ + arenas.lextent.<i>.size
+ + arenas.create
+ + stats.background_thread.{num_threads,num_runs,run_interval}
+ + stats.mutexes.{ctl,background_thread,prof,reset}.
+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
+ num_owner_switch}
+ + stats.arenas.<i>.{dirty,muzzy}_decay_ms
+ + stats.arenas.<i>.uptime
+ + stats.arenas.<i>.{pmuzzy,base,internal,resident}
+ + stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
+ + stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
+ + stats.arenas.<i>.bins.<j>.mutex.
+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
+ num_owner_switch}
+ + stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
+ + stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,
+ extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.
+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
+ num_owner_switch}
+
+ Portability improvements:
+ - Improve reentrant allocation support, such that deadlock is less likely if
+ e.g. a system library call in turn allocates memory. (@davidtgoldblatt,
+ @interwq)
+ - Support static linking of jemalloc with glibc. (@djwatson)
+
+ Optimizations and refactors:
+ - Organize virtual memory as "extents" of virtual memory pages, rather than as
+ naturally aligned "chunks", and store all metadata in arbitrarily distant
+ locations. This reduces virtual memory external fragmentation, and will
+ interact better with huge pages (not yet explicitly supported). (@jasone)
+ - Fold large and huge size classes together; only small and large size classes
+ remain. (@jasone)
+ - Unify the allocation paths, and merge most fast-path branching decisions.
+ (@davidtgoldblatt, @interwq)
+ - Embed per thread automatic tcache into thread-specific data, which reduces
+ conditional branches and dereferences. Also reorganize tcache to increase
+ fast-path data locality. (@interwq)
+ - Rewrite atomics to closely model the C11 API, convert various
+ synchronization from mutex-based to atomic, and use the explicit memory
+ ordering control to resolve various hypothetical races without increasing
+ synchronization overhead. (@davidtgoldblatt)
+ - Extensively optimize rtree via various methods:
+ + Add multiple layers of rtree lookup caching, since rtree lookups are now
+ part of fast-path deallocation. (@interwq)
+ + Determine rtree layout at compile time. (@jasone)
+ + Make the tree shallower for common configurations. (@jasone)
+ + Embed the root node in the top-level rtree data structure, thus avoiding
+ one level of indirection. (@jasone)
+ + Further specialize leaf elements as compared to internal node elements,
+ and directly embed extent metadata needed for fast-path deallocation.
+ (@jasone)
+ + Ignore leading always-zero address bits (architecture-specific).
+ (@jasone)
+ - Reorganize headers (ongoing work) to make them hermetic, and disentangle
+ various module dependencies. (@davidtgoldblatt)
+ - Convert various internal data structures such as size class metadata from
+ boot-time-initialized to compile-time-initialized. Propagate resulting data
+ structure simplifications, such as making arena metadata fixed-size.
+ (@jasone)
+ - Simplify size class lookups when constrained to size classes that are
+ multiples of the page size. This speeds lookups, but the primary benefit is
+ complexity reduction in code that was the source of numerous regressions.
+ (@jasone)
+ - Lock individual extents when possible for localized extent operations,
+ rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone)
+ - Use first fit layout policy instead of best fit, in order to improve
+ packing. (@jasone)
+ - If munmap(2) is not in use, use an exponential series to grow each arena's
+ virtual memory, so that the number of disjoint virtual memory mappings
+ remains low. (@jasone)
+ - Implement per arena base allocators, so that arenas never share any virtual
+ memory pages. (@jasone)
+ - Automatically generate private symbol name mangling macros. (@jasone)
+
+ Incompatible changes:
+ - Replace chunk hooks with an expanded/normalized set of extent hooks.
+ (@jasone)
+ - Remove ratio-based purging. (@jasone)
+ - Remove --disable-tcache. (@jasone)
+ - Remove --disable-tls. (@jasone)
+ - Remove --enable-ivsalloc. (@jasone)
+ - Remove --with-lg-size-class-group. (@jasone)
+ - Remove --with-lg-tiny-min. (@jasone)
+ - Remove --disable-cc-silence. (@jasone)
+ - Remove --enable-code-coverage. (@jasone)
+ - Remove --disable-munmap (replaced by opt.retain). (@jasone)
+ - Remove Valgrind support. (@jasone)
+ - Remove quarantine support. (@jasone)
+ - Remove redzone support. (@jasone)
+ - Remove mallctl interfaces (various authors):
+ + config.munmap
+ + config.tcache
+ + config.tls
+ + config.valgrind
+ + opt.lg_chunk
+ + opt.purge
+ + opt.lg_dirty_mult
+ + opt.decay_time
+ + opt.quarantine
+ + opt.redzone
+ + opt.thp
+ + arena.<i>.lg_dirty_mult
+ + arena.<i>.decay_time
+ + arena.<i>.chunk_hooks
+ + arenas.initialized
+ + arenas.lg_dirty_mult
+ + arenas.decay_time
+ + arenas.bin.<i>.run_size
+ + arenas.nlruns
+ + arenas.lrun.<i>.size
+ + arenas.nhchunks
+ + arenas.hchunk.<i>.size
+ + arenas.extend
+ + stats.cactive
+ + stats.arenas.<i>.lg_dirty_mult
+ + stats.arenas.<i>.decay_time
+ + stats.arenas.<i>.metadata.{mapped,allocated}
+ + stats.arenas.<i>.{npurge,nmadvise,purged}
+ + stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
+ + stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
+ + stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
+ + stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}
+
+ Bug fixes:
+ - Improve interval-based profile dump triggering to dump only one profile when
+ a single allocation's size exceeds the interval. (@jasone)
+ - Use prefixed function names (as controlled by --with-jemalloc-prefix) when
+ pruning backtrace frames in jeprof. (@jasone)
+
+* 4.5.0 (February 28, 2017)
+
+ This is the first release to benefit from much broader continuous integration
+ testing, thanks to @davidtgoldblatt. Had we had this testing infrastructure
+ in place for prior releases, it would have caught all of the most serious
+ regressions fixed by this release.
+
+ New features:
+ - Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for
+ transparent huge page integration. (@jasone)
+ - Update zone allocator integration to work with macOS 10.12. (@glandium)
+ - Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and
+ EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not
+ during configuration. (@jasone, @ronawho)
+
+ Bug fixes:
+ - Fix DSS (sbrk(2)-based) allocation. This regression was first released in
+ 4.3.0. (@jasone)
+ - Handle race in per size class utilization computation. This functionality
+ was first released in 4.0.0. (@interwq)
+ - Fix lock order reversal during gdump. (@jasone)
+ - Fix/refactor tcache synchronization. This regression was first released in
+ 4.0.0. (@jasone)
+ - Fix various JSON-formatted malloc_stats_print() bugs. This functionality
+ was first released in 4.3.0. (@jasone)
+ - Fix huge-aligned allocation. This regression was first released in 4.4.0.
+ (@jasone)
+ - When transparent huge page integration is enabled, detect what state pages
+ start in according to the kernel's current operating mode, and only convert
+ arena chunks to non-huge during purging if that is not their initial state.
+ This functionality was first released in 4.4.0. (@jasone)
+ - Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case.
+ This regression was first released in 4.0.0. (@jasone, @428desmo)
+ - Properly detect sparc64 when building for Linux. (@glaubitz)
+
+* 4.4.0 (December 3, 2016)
+
+ New features:
+ - Add configure support for *-*-linux-android. (@cferris1000, @jasone)
+ - Add the --disable-syscall configure option, for use on systems that place
+ security-motivated limitations on syscall(2). (@jasone)
+ - Add support for Debian GNU/kFreeBSD. (@thesam)
+
+ Optimizations:
+ - Add extent serial numbers and use them where appropriate as a sort key that
+ is higher priority than address, so that the allocation policy prefers older
+ extents. This tends to improve locality (decrease fragmentation) when
+ memory grows downward. (@jasone)
+ - Refactor madvise(2) configuration so that MADV_FREE is detected and utilized
+ on Linux 4.5 and newer. (@jasone)
+ - Mark partially purged arena chunks as non-huge-page. This improves
+ interaction with Linux's transparent huge page functionality. (@jasone)
+
+ Bug fixes:
+ - Fix size class computations for edge conditions involving extremely large
+ allocations. This regression was first released in 4.0.0. (@jasone,
+ @ingvarha)
+ - Remove overly restrictive assertions related to the cactive statistic. This
+ regression was first released in 4.1.0. (@jasone)
+ - Implement a more reliable detection scheme for os_unfair_lock on macOS.
+ (@jszakmeister)
+
+* 4.3.1 (November 7, 2016)
+
+ Bug fixes:
+ - Fix a severe virtual memory leak. This regression was first released in
+ 4.3.0. (@interwq, @jasone)
+ - Refactor atomic and prng APIs to restore support for 32-bit platforms that
+ use pre-C11 toolchains, e.g. FreeBSD's mips. (@jasone)
+
+* 4.3.0 (November 4, 2016)
+
+ This is the first release that passes the test suite for multiple Windows
+ configurations, thanks in large part to @glandium setting up continuous
+ integration via AppVeyor (and Travis CI for Linux and OS X).
+
+ New features:
+ - Add "J" (JSON) support to malloc_stats_print(). (@jasone)
+ - Add Cray compiler support. (@ronawho)
+
+ Optimizations:
+ - Add/use adaptive spinning for bootstrapping and radix tree node
+ initialization. (@jasone)
+
+ Bug fixes:
+ - Fix large allocation to search starting in the optimal size class heap,
+ which can substantially reduce virtual memory churn and fragmentation. This
+ regression was first released in 4.0.0. (@mjp41, @jasone)
+ - Fix stats.arenas.<i>.nthreads accounting. (@interwq)
+ - Fix and simplify decay-based purging. (@jasone)
+ - Make DSS (sbrk(2)-related) operations lockless, which resolves potential
+ deadlocks during thread exit. (@jasone)
+ - Fix over-sized allocation of radix tree leaf nodes. (@mjp41, @ogaun,
+ @jasone)
+ - Fix over-sized allocation of arena_t (plus associated stats) data
+ structures. (@jasone, @interwq)
+ - Fix EXTRA_CFLAGS to not affect configuration. (@jasone)
+ - Fix a Valgrind integration bug. (@ronawho)
+ - Disallow 0x5a junk filling when running in Valgrind. (@jasone)
+ - Fix a file descriptor leak on Linux. This regression was first released in
+ 4.2.0. (@vsarunas, @jasone)
+ - Fix static linking of jemalloc with glibc. (@djwatson)
+ - Use syscall(2) rather than {open,read,close}(2) during boot on Linux. This
+ works around other libraries' system call wrappers performing reentrant
+ allocation. (@kspinka, @Whissi, @jasone)
+ - Fix OS X default zone replacement to work with OS X 10.12. (@glandium,
+ @jasone)
+ - Fix cached memory management to avoid needless commit/decommit operations
+ during purging, which resolves permanent virtual memory map fragmentation
+ issues on Windows. (@mjp41, @jasone)
+ - Fix TSD fetches to avoid (recursive) allocation. This is relevant to
+ non-TLS and Windows configurations. (@jasone)
+ - Fix malloc_conf overriding to work on Windows. (@jasone)
+ - Forcibly disable lazy-lock on Windows (was forcibly *enabled*). (@jasone)
+
+* 4.2.1 (June 8, 2016)
+
+ Bug fixes:
+ - Fix bootstrapping issues for configurations that require allocation during
+ tsd initialization (e.g. --disable-tls). (@cferris1000, @jasone)
+ - Fix gettimeofday() version of nstime_update(). (@ronawho)
+ - Fix Valgrind regressions in calloc() and chunk_alloc_wrapper(). (@ronawho)
+ - Fix potential VM map fragmentation regression. (@jasone)
+ - Fix opt_zero-triggered in-place huge reallocation zeroing. (@jasone)
+ - Fix heap profiling context leaks in reallocation edge cases. (@jasone)
+
+* 4.2.0 (May 12, 2016)
+
+ New features:
+ - Add the arena.<i>.reset mallctl, which makes it possible to discard all of
+ an arena's allocations in a single operation. (@jasone)
+ - Add the stats.retained and stats.arenas.<i>.retained statistics. (@jasone)
+ - Add the --with-version configure option. (@jasone)
+ - Support --with-lg-page values larger than actual page size. (@jasone)
+
+ Optimizations:
+ - Use pairing heaps rather than red-black trees for various hot data
+ structures. (@djwatson, @jasone)
+ - Streamline fast paths of rtree operations. (@jasone)
+ - Optimize the fast paths of calloc() and [m,d,sd]allocx(). (@jasone)
+ - Decommit unused virtual memory if the OS does not overcommit. (@jasone)
+ - Specify MAP_NORESERVE on Linux if [heuristic] overcommit is active, in order
+ to avoid unfortunate interactions during fork(2). (@jasone)
+
+ Bug fixes:
+ - Fix chunk accounting related to triggering gdump profiles. (@jasone)
+ - Link against librt for clock_gettime(2) if glibc < 2.17. (@jasone)
+ - Scale leak report summary according to sampling probability. (@jasone)
+
+* 4.1.1 (May 3, 2016)
+
+ This bugfix release resolves a variety of mostly minor issues, though the
+ bitmap fix is critical for 64-bit Windows.
+
+ Bug fixes:
+ - Fix the linear scan version of bitmap_sfu() to shift by the proper amount
+ even when sizeof(long) is not the same as sizeof(void *), as on 64-bit
+ Windows. (@jasone)
+ - Fix hashing functions to avoid unaligned memory accesses (and resulting
+ crashes). This is relevant at least to some ARM-based platforms.
+ (@rkmisra)
+ - Fix fork()-related lock rank ordering reversals. These reversals were
+ unlikely to cause deadlocks in practice except when heap profiling was
+ enabled and active. (@jasone)
+ - Fix various chunk leaks in OOM code paths. (@jasone)
+ - Fix malloc_stats_print() to print opt.narenas correctly. (@jasone)
+ - Fix MSVC-specific build/test issues. (@rustyx, @yuslepukhin)
+ - Fix a variety of test failures that were due to test fragility rather than
+ core bugs. (@jasone)
+
+* 4.1.0 (February 28, 2016)
+
+ This release is primarily about optimizations, but it also incorporates a lot
+ of portability-motivated refactoring and enhancements. Many people worked on
+ this release, to an extent that even with the omission here of minor changes
+ (see git revision history), and of the people who reported and diagnosed
+ issues, so much of the work was contributed that starting with this release,
+ changes are annotated with author credits to help reflect the collaborative
+ effort involved.
+
+ New features:
+ - Implement decay-based unused dirty page purging, a major optimization with
+ mallctl API impact. This is an alternative to the existing ratio-based
+ unused dirty page purging, and is intended to eventually become the sole
+ purging mechanism. New mallctls:
+ + opt.purge
+ + opt.decay_time
+ + arena.<i>.decay
+ + arena.<i>.decay_time
+ + arenas.decay_time
+ + stats.arenas.<i>.decay_time
+ (@jasone, @cevans87)
+ - Add --with-malloc-conf, which makes it possible to embed a default
+ options string during configuration. This was motivated by the desire to
+ specify --with-malloc-conf=purge:decay , since the default must remain
+ purge:ratio until the 5.0.0 release. (@jasone)
+ - Add MS Visual Studio 2015 support. (@rustyx, @yuslepukhin)
+ - Make *allocx() size class overflow behavior defined. The maximum
+ size class is now less than PTRDIFF_MAX to protect applications against
+ numerical overflow, and all allocation functions are guaranteed to indicate
+ errors rather than potentially crashing if the request size exceeds the
+ maximum size class. (@jasone)
+ - jeprof:
+ + Add raw heap profile support. (@jasone)
+ + Add --retain and --exclude for backtrace symbol filtering. (@jasone)
+
+ Optimizations:
+ - Optimize the fast path to combine various bootstrapping and configuration
+ checks and execute more streamlined code in the common case. (@interwq)
+ - Use linear scan for small bitmaps (used for small object tracking). In
+ addition to speeding up bitmap operations on 64-bit systems, this reduces
+ allocator metadata overhead by approximately 0.2%. (@djwatson)
+ - Separate arena_avail trees, which substantially speeds up run tree
+ operations. (@djwatson)
+ - Use memoization (boot-time-computed table) for run quantization. Separate
+ arena_avail trees reduced the importance of this optimization. (@jasone)
+ - Attempt mmap-based in-place huge reallocation. This can dramatically speed
+ up incremental huge reallocation. (@jasone)
+
+ Incompatible changes:
+ - Make opt.narenas unsigned rather than size_t. (@jasone)
+
+ Bug fixes:
+ - Fix stats.cactive accounting regression. (@rustyx, @jasone)
+ - Handle unaligned keys in hash(). This caused problems for some ARM systems.
+ (@jasone, @cferris1000)
+ - Refactor arenas array. In addition to fixing a fork-related deadlock, this
+ makes arena lookups faster and simpler. (@jasone)
+ - Move retained memory allocation out of the default chunk allocation
+ function, to a location that gets executed even if the application installs
+ a custom chunk allocation function. This resolves a virtual memory leak.
+ (@buchgr)
+ - Fix a potential tsd cleanup leak. (@cferris1000, @jasone)
+ - Fix run quantization. In practice this bug had no impact unless
+ applications requested memory with alignment exceeding one page.
+ (@jasone, @djwatson)
+ - Fix LinuxThreads-specific bootstrapping deadlock. (Cosmin Paraschiv)
+ - jeprof:
+ + Don't discard curl options if timeout is not defined. (@djwatson)
+ + Detect failed profile fetches. (@djwatson)
+ - Fix stats.arenas.<i>.{dss,lg_dirty_mult,decay_time,pactive,pdirty} for
+ --disable-stats case. (@jasone)
+
+* 4.0.4 (October 24, 2015)
+
+ This bugfix release fixes another xallocx() regression. No other regressions
+ have come to light in over a month, so this is likely a good starting point
+ for people who prefer to wait for "dot one" releases with all the major issues
+ shaken out.
+
+ Bug fixes:
+ - Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large
+ allocations that have been randomly assigned an offset of 0 when
+ --enable-cache-oblivious configure option is enabled.
+
+* 4.0.3 (September 24, 2015)
+
+ This bugfix release continues the trend of xallocx() and heap profiling fixes.
+
+ Bug fixes:
+ - Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large
+ allocations when --enable-cache-oblivious configure option is enabled.
+ - Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations
+ when resizing from/to a size class that is not a multiple of the chunk size.
+ - Fix prof_tctx_dump_iter() to filter out nodes that were created after heap
+ profile dumping started.
+ - Work around a potentially bad thread-specific data initialization
+ interaction with NPTL (glibc's pthreads implementation).
+
+* 4.0.2 (September 21, 2015)
+
+ This bugfix release addresses a few bugs specific to heap profiling.
+
+ Bug fixes:
+ - Fix ixallocx_prof_sample() to never modify nor create sampled small
+ allocations. xallocx() is in general incapable of moving small allocations,
+ so this fix removes buggy code without loss of generality.
+ - Fix irallocx_prof_sample() to always allocate large regions, even when
+ alignment is non-zero.
+ - Fix prof_alloc_rollback() to read tdata from thread-specific data rather
+ than dereferencing a potentially invalid tctx.
+
+* 4.0.1 (September 15, 2015)
+
+ This is a bugfix release that is somewhat high risk due to the amount of
+ refactoring required to address deep xallocx() problems. As a side effect of
+ these fixes, xallocx() now tries harder to partially fulfill requests for
+ optional extra space. Note that a couple of minor heap profiling
+ optimizations are included, but these are better thought of as performance
+ fixes that were integral to discovering most of the other bugs.
+
+ Optimizations:
+ - Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the
+ fast path when heap profiling is enabled. Additionally, split a special
+ case out into arena_prof_tctx_reset(), which also avoids chunk metadata
+ reads.
+ - Optimize irallocx_prof() to optimistically update the sampler state. The
+ prior implementation appears to have been a holdover from when
+ rallocx()/xallocx() functionality was combined as rallocm().
+
+ Bug fixes:
+ - Fix TLS configuration such that it is enabled by default for platforms on
+ which it works correctly.
+ - Fix arenas_cache_cleanup() and arena_get_hard() to handle
+ allocation/deallocation within the application's thread-specific data
+ cleanup functions even after arenas_cache is torn down.
+ - Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS.
+ - Fix chunk purge hook calls for in-place huge shrinking reallocation to
+ specify the old chunk size rather than the new chunk size. This bug caused
+ no correctness issues for the default chunk purge function, but was
+ visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl.
+ - Fix heap profiling bugs:
+ + Fix heap profiling to distinguish among otherwise identical sample sites
+ with interposed resets (triggered via the "prof.reset" mallctl). This bug
+ could cause data structure corruption that would most likely result in a
+ segfault.
+ + Fix irealloc_prof() to prof_alloc_rollback() on OOM.
+ + Make one call to prof_active_get_unlocked() per allocation event, and use
+ the result throughout the relevant functions that handle an allocation
+ event. Also add a missing check in prof_realloc(). These fixes protect
+ allocation events against concurrent prof_active changes.
+ + Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample()
+ in the correct order.
+ + Fix prof_realloc() to call prof_free_sampled_object() after calling
+ prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were
+ the same, the tctx could have been prematurely destroyed.
+ - Fix portability bugs:
+ + Don't bitshift by negative amounts when encoding/decoding run sizes in
+ chunk header maps. This affected systems with page sizes greater than 8
+ KiB.
+ + Rename index_t to szind_t to avoid an existing type on Solaris.
+ + Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to
+ match glibc and avoid compilation errors when including both
+ jemalloc/jemalloc.h and malloc.h in C++ code.
+ + Don't assume that /bin/sh is appropriate when running size_classes.sh
+ during configuration.
+ + Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM.
+ + Link tests to librt if it contains clock_gettime(2).
+
+* 4.0.0 (August 17, 2015)
+
+ This version contains many speed and space optimizations, both minor and
+ major. The major themes are generalization, unification, and simplification.
+ Although many of these optimizations cause no visible behavior change, their
+ cumulative effect is substantial.
+
+ New features:
+ - Normalize size class spacing to be consistent across the complete size
+ range. By default there are four size classes per size doubling, but this
+ is now configurable via the --with-lg-size-class-group option. Also add the
+ --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
+ --with-lg-tiny-min options, which can be used to tweak page and size class
+ settings. Impacts:
+ + Worst case performance for incrementally growing/shrinking reallocation
+ is improved because there are far fewer size classes, and therefore
+ copying happens less often.
+ + Internal fragmentation is limited to 20% for all but the smallest size
+ classes (those less than four times the quantum). (1B + 4 KiB)
+ and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
+ + Chunk fragmentation tends to be lower because there are fewer distinct run
+ sizes to pack.
+ - Add support for explicit tcaches. The "tcache.create", "tcache.flush", and
+ "tcache.destroy" mallctls control tcache lifetime and flushing, and the
+ MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
+ control which tcache is used for each operation.
+ - Implement per thread heap profiling, as well as the ability to
+ enable/disable heap profiling on a per thread basis. Add the "prof.reset",
+ "prof.lg_sample", "thread.prof.name", "thread.prof.active",
+ "opt.prof_thread_active_init", "prof.thread_active_init", and
+ "thread.prof.active" mallctls.
+ - Add support for per arena application-specified chunk allocators, configured
+ via the "arena.<i>.chunk_hooks" mallctl.
+ - Refactor huge allocation to be managed by arenas, so that arenas now
+ function as general purpose independent allocators. This is important in
+ the context of user-specified chunk allocators, aside from the scalability
+ benefits. Related new statistics:
+ + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
+ "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
+ mallctls provide high level per arena huge allocation statistics.
+ + The "arenas.nhchunks", "arenas.hchunk.<i>.size",
+ "stats.arenas.<i>.hchunks.<j>.nmalloc",
+ "stats.arenas.<i>.hchunks.<j>.ndalloc",
+ "stats.arenas.<i>.hchunks.<j>.nrequests", and
+ "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
+ statistics.
+ - Add the 'util' column to malloc_stats_print() output, which reports the
+ proportion of available regions that are currently in use for each small
+ size class.
+ - Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
+ mallctl), so that it is possible to separately enable junk filling for
+ allocation versus deallocation.
+ - Add the jemalloc-config script, which provides information about how
+ jemalloc was configured, and how to integrate it into application builds.
+ - Add metadata statistics, which are accessible via the "stats.metadata",
+ "stats.arenas.<i>.metadata.mapped", and
+ "stats.arenas.<i>.metadata.allocated" mallctls.
+ - Add the "stats.resident" mallctl, which reports the upper limit of
+ physically resident memory mapped by the allocator.
+ - Add per arena control over unused dirty page purging, via the
+ "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and
+ "stats.arenas.<i>.lg_dirty_mult" mallctls.
+ - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
+ feature on/off during program execution.
+ - Add sdallocx(), which implements sized deallocation. The primary
+ optimization over dallocx() is the removal of a metadata read, which often
+ suffers an L1 cache miss.
+ - Add missing header includes in jemalloc/jemalloc.h, so that applications
+ only have to #include <jemalloc/jemalloc.h>.
+ - Add support for additional platforms:
+ + Bitrig
+ + Cygwin
+ + DragonFlyBSD
+ + iOS
+ + OpenBSD
+ + OpenRISC/or1k
+
+ Optimizations:
+ - Maintain dirty runs in per arena LRUs rather than in per arena trees of
+ dirty-run-containing chunks. In practice this change significantly reduces
+ dirty page purging volume.
+ - Integrate whole chunks into the unused dirty page purging machinery. This
+ reduces the cost of repeated huge allocation/deallocation, because it
+ effectively introduces a cache of chunks.
+ - Split the arena chunk map into two separate arrays, in order to increase
+ cache locality for the frequently accessed bits.
+ - Move small run metadata out of runs, into arena chunk headers. This reduces
+ run fragmentation, smaller runs reduce external fragmentation for small size
+ classes, and packed (less uniformly aligned) metadata layout improves CPU
+ cache set distribution.
+ - Randomly distribute large allocation base pointer alignment relative to page
+ boundaries in order to more uniformly utilize CPU cache sets. This can be
+ disabled via the --disable-cache-oblivious configure option, and queried via
+ the "config.cache_oblivious" mallctl.
+ - Micro-optimize the fast paths for the public API functions.
+ - Refactor thread-specific data to reside in a single structure. This assures
+ that only a single TLS read is necessary per call into the public API.
+ - Implement in-place huge allocation growing and shrinking.
+ - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
+ additional optimizations that reduce maximum lookup depth to one or two
+ levels. This resolves what was a concurrency bottleneck for per arena huge
+ allocation, because a global data structure is critical for determining
+ which arenas own which huge allocations.
+
+ Incompatible changes:
+ - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
+ warnings by default.
+ - Assure that the constness of malloc_usable_size()'s return type matches that
+ of the system implementation.
+ - Change the heap profile dump format to support per thread heap profiling,
+ rename pprof to jeprof, and enhance it with the --thread=<n> option. As a
+ result, the bundled jeprof must now be used rather than the upstream
+ (gperftools) pprof.
+ - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
+ internally deadlock on some platforms.
+ - Change the "arenas.nlruns" mallctl type from size_t to unsigned.
+ - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
+ "stats.arenas.<i>.bins.<j>.curregs".
+ - Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
+ - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
+ MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.
+
+ Removed features:
+ - Remove the *allocm() API, which is superseded by the *allocx() API.
+ - Remove the --enable-dss options, and make dss non-optional on all platforms
+ which support sbrk(2).
+ - Remove the "arenas.purge" mallctl, which was obsoleted by the
+ "arena.<i>.purge" mallctl in 3.1.0.
+ - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
+ detects whether it is running inside Valgrind.
+ - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
+ "stats.huge.ndalloc" mallctls.
+ - Remove the --enable-mremap option.
+ - Remove the "stats.chunks.current", "stats.chunks.total", and
+ "stats.chunks.high" mallctls.
+
+ Bug fixes:
+ - Fix the cactive statistic to decrease (rather than increase) when active
+ memory decreases. This regression was first released in 3.5.0.
+ - Fix OOM handling in memalign() and valloc(). A variant of this bug existed
+ in all releases since 2.0.0, which introduced these functions.
+ - Fix an OOM-related regression in arena_tcache_fill_small(), which could
+ cause cache corruption on OOM. This regression was present in all releases
+ from 2.2.0 through 3.6.0.
+ - Fix size class overflow handling for malloc(), posix_memalign(), memalign(),
+ calloc(), and realloc() when profiling is enabled.
+ - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
+ "secondary" precedence is specified, but sbrk(2) is not supported.
+ - Fix fallback lg_floor() implementations to handle extremely large inputs.
+ - Ensure the default purgeable zone is after the default zone on OS X.
+ - Fix latent bugs in atomic_*().
+ - Fix the "arena.<i>.dss" mallctl to handle read-only calls.
+ - Fix tls_model configuration to enable the initial-exec model when possible.
+ - Mark malloc_conf as a weak symbol so that the application can override it.
+ - Correctly detect glibc's adaptive pthread mutexes.
+ - Fix the --without-export configure option.
+
+* 3.6.0 (March 31, 2014)
+
+ This version contains a critical bug fix for a regression present in 3.5.0 and
+ 3.5.1.
+
+ Bug fixes:
+ - Fix a regression in arena_chunk_alloc() that caused crashes during
+ small/large allocation if chunk allocation failed. In the absence of this
+ bug, chunk allocation failure would result in allocation failure, e.g. NULL
+ return from malloc(). This regression was introduced in 3.5.0.
+ - Fix backtracing for gcc intrinsics-based backtracing by specifying
+ -fno-omit-frame-pointer to gcc. Note that the application (and all the
+ libraries it links to) must also be compiled with this option for
+ backtracing to be reliable.
+ - Use dss allocation precedence for huge allocations as well as small/large
+ allocations.
+ - Fix test assertion failure message formatting. This bug did not manifest on
+ x86_64 systems because of implementation subtleties in va_list.
+ - Fix inconsequential test failures for hash and SFMT code.
+
+ New features:
+ - Support heap profiling on FreeBSD. This feature depends on the proc
+ filesystem being mounted during heap profile dumping.
+
+* 3.5.1 (February 25, 2014)
+
+ This version primarily addresses minor bugs in test code.
+
+ Bug fixes:
+ - Configure Solaris/Illumos to use MADV_FREE.
+ - Fix junk filling for mremap(2)-based huge reallocation. This is only
+ relevant if configuring with the --enable-mremap option specified.
+ - Avoid compilation failure if 'restrict' C99 keyword is not supported by the
+ compiler.
+ - Add a configure test for SSE2 rather than assuming it is usable on i686
+ systems. This fixes test compilation errors, especially on 32-bit Linux
+ systems.
+ - Fix mallctl argument size mismatches (size_t vs. uint64_t) in the stats unit
+ test.
+ - Fix/remove flawed alignment-related overflow tests.
+ - Prevent compiler optimizations that could change backtraces in the
+ prof_accum unit test.
+
+* 3.5.0 (January 22, 2014)
+
+ This version focuses on refactoring and automated testing, though it also
+ includes some non-trivial heap profiling optimizations not mentioned below.
+
+ New features:
+ - Add the *allocx() API, which is a successor to the experimental *allocm()
+ API. The *allocx() functions are slightly simpler to use because they have
+ fewer parameters, they directly return the results of primary interest, and
+ mallocx()/rallocx() avoid the strict aliasing pitfall that
+ allocm()/rallocm() share with posix_memalign(). Note that *allocm() is
+ slated for removal in the next non-bugfix release.
+ - Add support for LinuxThreads.
+
+ Bug fixes:
+ - Unless heap profiling is enabled, disable floating point code and don't link
+ with libm. This, in combination with e.g. EXTRA_CFLAGS=-mno-sse on x64
+ systems, makes it possible to completely disable floating point register
+ use. Some versions of glibc neglect to save/restore caller-saved floating
+ point registers during dynamic lazy symbol loading, and the symbol loading
+ code uses whatever malloc the application happens to have linked/loaded
+ with, the result being potential floating point register corruption.
+ - Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
+ backtrace creation in imemalign(). This bug impacted posix_memalign() and
+ aligned_alloc().
+ - Fix a file descriptor leak in a prof_dump_maps() error path.
+ - Fix prof_dump() to close the dump file descriptor for all relevant error
+ paths.
+ - Fix rallocm() to use the arena specified by the ALLOCM_ARENA(s) flag for
+ allocation, not just deallocation.
+ - Fix a data race for large allocation stats counters.
+ - Fix a potential infinite loop during thread exit. This bug occurred on
+ Solaris, and could affect other platforms with similar pthreads TSD
+ implementations.
+ - Don't junk-fill reallocations unless usable size changes. This fixes a
+ violation of the *allocx()/*allocm() semantics.
+ - Fix growing large reallocation to junk fill new space.
+ - Fix huge deallocation to junk fill when munmap is disabled.
+ - Change the default private namespace prefix from empty to je_, and change
+ --with-private-namespace-prefix so that it prepends an additional prefix
+ rather than replacing je_. This reduces the likelihood of applications
+ which statically link jemalloc experiencing symbol name collisions.
+ - Add missing private namespace mangling (relevant when
+ --with-private-namespace is specified).
+ - Add and use JEMALLOC_INLINE_C so that static inline functions are marked as
+ static even for debug builds.
+ - Add a missing mutex unlock in a malloc_init_hard() error path. In practice
+ this error path is never executed.
+ - Fix numerous bugs in malloc_strotumax() error handling/reporting. These
+ bugs had no impact except for malformed inputs.
+ - Fix numerous bugs in malloc_snprintf(). These bugs were not exercised by
+ existing calls, so they had no impact.
+
+* 3.4.1 (October 20, 2013)
+
+ Bug fixes:
+ - Fix a race in the "arenas.extend" mallctl that could cause memory corruption
+ of internal data structures and subsequent crashes.
+ - Fix Valgrind integration flaws that caused Valgrind warnings about reads of
+ uninitialized memory in:
+ + arena chunk headers
+ + internal zero-initialized data structures (relevant to tcache and prof
+ code)
+ - Preserve errno during the first allocation. A readlink(2) call during
+ initialization fails unless /etc/malloc.conf exists, so errno was typically
+ set during the first allocation prior to this fix.
+ - Fix compilation warnings reported by gcc 4.8.1.
+
+* 3.4.0 (June 2, 2013)
+
+ This version is essentially a small bugfix release, but the addition of
+ aarch64 support requires that the minor version be incremented.
+
+ Bug fixes:
+ - Fix race-triggered deadlocks in chunk_record(). These deadlocks were
+ typically triggered by multiple threads concurrently deallocating huge
+ objects.
+
+ New features:
+ - Add support for the aarch64 architecture.
+
+* 3.3.1 (March 6, 2013)
+
+ This version fixes bugs that are typically encountered only when utilizing
+ custom run-time options.
+
+ Bug fixes:
+ - Fix a locking order bug that could cause deadlock during fork if heap
+ profiling were enabled.
+ - Fix a chunk recycling bug that could cause the allocator to lose track of
+ whether a chunk was zeroed. On FreeBSD, NetBSD, and OS X, it could cause
+ corruption if allocating via sbrk(2) (unlikely unless running with the
+ "dss:primary" option specified). This was completely harmless on Linux
+ unless using mlockall(2) (and unlikely even then, unless the
+ --disable-munmap configure option or the "dss:primary" option was
+ specified). This regression was introduced in 3.1.0 by the
+ mlockall(2)/madvise(2) interaction fix.
+ - Fix TLS-related memory corruption that could occur during thread exit if the
+ thread never allocated memory. Only the quarantine and prof facilities were
+ susceptible.
+ - Fix two quarantine bugs:
+ + Internal reallocation of the quarantined object array leaked the old
+ array.
+ + Reallocation failure for internal reallocation of the quarantined object
+ array (very unlikely) resulted in memory corruption.
+ - Fix Valgrind integration to annotate all internally allocated memory in a
+ way that keeps Valgrind happy about internal data structure access.
+ - Fix building for s390 systems.
+
+* 3.3.0 (January 23, 2013)
+
+ This version includes a few minor performance improvements in addition to the
+ listed new features and bug fixes.
+
+ New features:
+ - Add clipping support to lg_chunk option processing.
+ - Add the --enable-ivsalloc option.
+ - Add the --without-export option.
+ - Add the --disable-zone-allocator option.
+
+ Bug fixes:
+ - Fix "arenas.extend" mallctl to output the number of arenas.
+ - Fix chunk_recycle() to unconditionally inform Valgrind that returned memory
+ is undefined.
+ - Fix build break on FreeBSD related to alloca.h.
+
+* 3.2.0 (November 9, 2012)
+
+ In addition to a couple of bug fixes, this version modifies page run
+ allocation and dirty page purging algorithms in order to better control
+ page-level virtual memory fragmentation.
+
+ Incompatible changes:
+ - Change the "opt.lg_dirty_mult" default from 5 to 3 (32:1 to 8:1).
+
+ Bug fixes:
+ - Fix dss/mmap allocation precedence code to use recyclable mmap memory only
+ after primary dss allocation fails.
+ - Fix deadlock in the "arenas.purge" mallctl. This regression was introduced
+ in 3.1.0 by the addition of the "arena.<i>.purge" mallctl.
+
+* 3.1.0 (October 16, 2012)
+
+ New features:
+ - Auto-detect whether running inside Valgrind, thus removing the need to
+ manually specify MALLOC_CONF=valgrind:true.
+ - Add the "arenas.extend" mallctl, which allows applications to create
+ manually managed arenas.
+ - Add the ALLOCM_ARENA() flag for {,r,d}allocm().
+ - Add the "opt.dss", "arena.<i>.dss", and "stats.arenas.<i>.dss" mallctls,
+ which provide control over dss/mmap precedence.
+ - Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".
+ - Define LG_QUANTUM for hppa.
+
+ Incompatible changes:
+ - Disable tcache by default if running inside Valgrind, in order to avoid
+ making unallocated objects appear reachable to Valgrind.
+ - Drop const from malloc_usable_size() argument on Linux.
+
+ Bug fixes:
+ - Fix heap profiling crash if sampled object is freed via realloc(p, 0).
+ - Remove const from __*_hook variable declarations, so that glibc can modify
+ them during process forking.
+ - Fix mlockall(2)/madvise(2) interaction.
+ - Fix fork(2)-related deadlocks.
+ - Fix error return value for "thread.tcache.enabled" mallctl.
+
+* 3.0.0 (May 11, 2012)
+
+ Although this version adds some major new features, the primary focus is on
+ internal code cleanup that facilitates maintainability and portability, most
+ of which is not reflected in the ChangeLog. This is the first release to
+ incorporate substantial contributions from numerous other developers, and the
+ result is a more broadly useful allocator (see the git revision history for
+ contribution details). Note that the license has been unified, thanks to
+ Facebook granting a license under the same terms as the other copyright
+ holders (see COPYING).
+
+ New features:
+ - Implement Valgrind support, redzones, and quarantine.
+ - Add support for additional platforms:
+ + FreeBSD
+ + Mac OS X Lion
+ + MinGW
+ + Windows (no support yet for replacing the system malloc)
+ - Add support for additional architectures:
+ + MIPS
+ + SH4
+ + Tilera
+ - Add support for cross compiling.
+ - Add nallocm(), which rounds a request size up to the nearest size class
+ without actually allocating.
+ - Implement aligned_alloc() (blame C11).
+ - Add the "thread.tcache.enabled" mallctl.
+ - Add the "opt.prof_final" mallctl.
+ - Update pprof (from gperftools 2.0).
+ - Add the --with-mangling option.
+ - Add the --disable-experimental option.
+ - Add the --disable-munmap option, and make it the default on Linux.
+ - Add the --enable-mremap option, which disables use of mremap(2) by default.
+
+ Incompatible changes:
+ - Enable stats by default.
+ - Enable fill by default.
+ - Disable lazy locking by default.
+ - Rename the "tcache.flush" mallctl to "thread.tcache.flush".
+ - Rename the "arenas.pagesize" mallctl to "arenas.page".
+ - Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB).
+ - Change the "opt.prof_accum" default from true to false.
+
+ Removed features:
+ - Remove the swap feature, including the "config.swap", "swap.avail",
+ "swap.prezeroed", "swap.nfds", and "swap.fds" mallctls.
+ - Remove highruns statistics, including the
+ "stats.arenas.<i>.bins.<j>.highruns" and
+ "stats.arenas.<i>.lruns.<j>.highruns" mallctls.
+ - As part of small size class refactoring, remove the "opt.lg_[qc]space_max",
+ "arenas.cacheline", "arenas.subpage", "arenas.[tqcs]space_{min,max}", and
+ "arenas.[tqcs]bins" mallctls.
+ - Remove the "arenas.chunksize" mallctl.
+ - Remove the "opt.lg_prof_tcmax" option.
+ - Remove the "opt.lg_prof_bt_max" option.
+ - Remove the "opt.lg_tcache_gc_sweep" option.
+ - Remove the --disable-tiny option, including the "config.tiny" mallctl.
+ - Remove the --enable-dynamic-page-shift configure option.
+ - Remove the --enable-sysv configure option.
+
+ Bug fixes:
+ - Fix a statistics-related bug in the "thread.arena" mallctl that could cause
+ invalid statistics and crashes.
+ - Work around TLS deallocation via free() on Linux. This bug could cause
+ write-after-free memory corruption.
+ - Fix a potential deadlock that could occur during interval- and
+ growth-triggered heap profile dumps.
+ - Fix large calloc() zeroing bugs due to dropping chunk map unzeroed flags.
+ - Fix chunk_alloc_dss() to stop claiming memory is zeroed. This bug could
+ cause memory corruption and crashes with --enable-dss specified.
+ - Fix fork-related bugs that could cause deadlock in children between fork
+ and exec.
+ - Fix malloc_stats_print() to honor 'b' and 'l' in the opts parameter.
+ - Fix realloc(p, 0) to act like free(p).
+ - Do not enforce minimum alignment in memalign().
+ - Check for NULL pointer in malloc_usable_size().
+ - Fix an off-by-one heap profile statistics bug that could be observed in
+ interval- and growth-triggered heap profiles.
+ - Fix the "epoch" mallctl to update cached stats even if the passed in epoch
+ is 0.
+ - Fix bin->runcur management to fix a layout policy bug. This bug did not
+ affect correctness.
+ - Fix a bug in choose_arena_hard() that potentially caused more arenas to be
+ initialized than necessary.
+ - Add missing "opt.lg_tcache_max" mallctl implementation.
+ - Use glibc allocator hooks to make mixed allocator usage less likely.
+ - Fix build issues for --disable-tcache.
+ - Don't mangle pthread_create() when --with-private-namespace is specified.
+
+* 2.2.5 (November 14, 2011)
+
+ Bug fixes:
+ - Fix huge_ralloc() race when using mremap(2). This is a serious bug that
+ could cause memory corruption and/or crashes.
+ - Fix huge_ralloc() to maintain chunk statistics.
+ - Fix malloc_stats_print(..., "a") output.
+
+* 2.2.4 (November 5, 2011)
+
+ Bug fixes:
+ - Initialize arenas_tsd before using it. This bug existed for 2.2.[0-3], as
+ well as for --disable-tls builds in earlier releases.
+ - Do not assume a 4 KiB page size in test/rallocm.c.
+
+* 2.2.3 (August 31, 2011)
+
+ This version fixes numerous bugs related to heap profiling.
+
+ Bug fixes:
+ - Fix a prof-related race condition. This bug could cause memory corruption,
+ but only occurred in non-default configurations (prof_accum:false).
+ - Fix off-by-one backtracing issues (make sure that prof_alloc_prep() is
+ excluded from backtraces).
+ - Fix a prof-related bug in realloc() (only triggered by OOM errors).
+ - Fix prof-related bugs in allocm() and rallocm().
+ - Fix prof_tdata_cleanup() for --disable-tls builds.
+ - Fix a relative include path, to fix objdir builds.
+
+* 2.2.2 (July 30, 2011)
+
+ Bug fixes:
+ - Fix a build error for --disable-tcache.
+ - Fix assertions in arena_purge() (for real this time).
+ - Add the --with-private-namespace option. This is a workaround for symbol
+ conflicts that can inadvertently arise when using static libraries.
+
+* 2.2.1 (March 30, 2011)
+
+ Bug fixes:
+ - Implement atomic operations for x86/x64. This fixes compilation failures
+ for versions of gcc that are still in wide use.
+ - Fix an assertion in arena_purge().
+
+* 2.2.0 (March 22, 2011)
+
+ This version incorporates several improvements to algorithms and data
+ structures that tend to reduce fragmentation and increase speed.
+
+ New features:
+ - Add the "stats.cactive" mallctl.
+ - Update pprof (from google-perftools 1.7).
+ - Improve backtracing-related configuration logic, and add the
+ --disable-prof-libgcc option.
+
+ Bug fixes:
+ - Change default symbol visibility from "internal", to "hidden", which
+ decreases the overhead of library-internal function calls.
+ - Fix symbol visibility so that it is also set on OS X.
+ - Fix a build dependency regression caused by the introduction of the .pic.o
+ suffix for PIC object files.
+ - Add missing checks for mutex initialization failures.
+ - Don't use libgcc-based backtracing except on x64, where it is known to work.
+ - Fix deadlocks on OS X that were due to memory allocation in
+ pthread_mutex_lock().
+ - Heap profiling-specific fixes:
+ + Fix memory corruption due to integer overflow in small region index
+ computation, when using a small enough sample interval that profiling
+ context pointers are stored in small run headers.
+ + Fix a bootstrap ordering bug that only occurred with TLS disabled.
+ + Fix a rallocm() rsize bug.
+ + Fix error detection bugs for aligned memory allocation.
+
+* 2.1.3 (March 14, 2011)
+
+ Bug fixes:
+ - Fix a cpp logic regression (due to the "thread.{de,}allocatedp" mallctl fix
+ for OS X in 2.1.2).
+ - Fix a "thread.arena" mallctl bug.
+ - Fix a thread cache stats merging bug.
+
+* 2.1.2 (March 2, 2011)
+
+ Bug fixes:
+ - Fix "thread.{de,}allocatedp" mallctl for OS X.
+ - Add missing jemalloc.a to build system.
+
+* 2.1.1 (January 31, 2011)
+
+ Bug fixes:
+ - Fix aligned huge reallocation (affected allocm()).
+ - Fix the ALLOCM_LG_ALIGN macro definition.
+ - Fix a heap dumping deadlock.
+ - Fix a "thread.arena" mallctl bug.
+
+* 2.1.0 (December 3, 2010)
+
+ This version incorporates some optimizations that can't quite be considered
+ bug fixes.
+
+ New features:
+ - Use Linux's mremap(2) for huge object reallocation when possible.
+ - Avoid locking in mallctl*() when possible.
+ - Add the "thread.[de]allocatedp" mallctl's.
+ - Convert the manual page source from roff to DocBook, and generate both roff
+ and HTML manuals.
+
+ Bug fixes:
+ - Fix a crash due to incorrect bootstrap ordering. This only impacted
+ --enable-debug --enable-dss configurations.
+ - Fix a minor statistics bug for mallctl("swap.avail", ...).
+
+* 2.0.1 (October 29, 2010)
+
+ Bug fixes:
+ - Fix a race condition in heap profiling that could cause undefined behavior
+ if "opt.prof_accum" were disabled.
+ - Add missing mutex unlocks for some OOM error paths in the heap profiling
+ code.
+ - Fix a compilation error for non-C99 builds.
+
+* 2.0.0 (October 24, 2010)
+
+ This version focuses on the experimental *allocm() API, and on improved
+ run-time configuration/introspection. Nonetheless, numerous performance
+ improvements are also included.
+
+ New features:
+ - Implement the experimental {,r,s,d}allocm() API, which provides a superset
+ of the functionality available via malloc(), calloc(), posix_memalign(),
+ realloc(), malloc_usable_size(), and free(). These functions can be used to
+ allocate/reallocate aligned zeroed memory, ask for optional extra memory
+ during reallocation, prevent object movement during reallocation, etc.
+ - Replace JEMALLOC_OPTIONS/JEMALLOC_PROF_PREFIX with MALLOC_CONF, which is
+ more human-readable, and more flexible. For example:
+ JEMALLOC_OPTIONS=AJP
+ is now:
+ MALLOC_CONF=abort:true,fill:true,stats_print:true
+ - Port to Apple OS X. Sponsored by Mozilla.
+ - Make it possible for the application to control thread-->arena mappings via
+ the "thread.arena" mallctl.
+ - Add compile-time support for all TLS-related functionality via pthreads TSD.
+ This is mainly of interest for OS X, which does not support TLS, but has a
+ TSD implementation with similar performance.
+ - Override memalign() and valloc() if they are provided by the system.
+ - Add the "arenas.purge" mallctl, which can be used to synchronously purge all
+ dirty unused pages.
+ - Make cumulative heap profiling data optional, so that it is possible to
+ limit the amount of memory consumed by heap profiling data structures.
+ - Add per thread allocation counters that can be accessed via the
+ "thread.allocated" and "thread.deallocated" mallctls.
+
+ Incompatible changes:
+ - Remove JEMALLOC_OPTIONS and malloc_options (see MALLOC_CONF above).
+ - Increase default backtrace depth from 4 to 128 for heap profiling.
+ - Disable interval-based profile dumps by default.
+
+ Bug fixes:
+ - Remove bad assertions in fork handler functions. These assertions could
+ cause aborts for some combinations of configure settings.
+ - Fix strerror_r() usage to deal with non-standard semantics in GNU libc.
+ - Fix leak context reporting. This bug tended to cause the number of contexts
+ to be underreported (though the reported number of objects and bytes were
+ correct).
+ - Fix a realloc() bug for large in-place growing reallocation. This bug could
+ cause memory corruption, but it was hard to trigger.
+ - Fix an allocation bug for small allocations that could be triggered if
+ multiple threads raced to create a new run of backing pages.
+ - Enhance the heap profiler to trigger samples based on usable size, rather
+ than request size.
+ - Fix a heap profiling bug due to sometimes losing track of requested object
+ size for sampled objects.
+
+* 1.0.3 (August 12, 2010)
+
+ Bug fixes:
+ - Fix the libunwind-based implementation of stack backtracing (used for heap
+ profiling). This bug could cause zero-length backtraces to be reported.
+ - Add a missing mutex unlock in library initialization code. If multiple
+ threads raced to initialize malloc, some of them could end up permanently
+ blocked.
+
+* 1.0.2 (May 11, 2010)
+
+ Bug fixes:
+ - Fix junk filling of large objects, which could cause memory corruption.
+ - Add MAP_NORESERVE support for chunk mapping, because otherwise virtual
+ memory limits could cause swap file configuration to fail. Contributed by
+ Jordan DeLong.
+
+* 1.0.1 (April 14, 2010)
+
+ Bug fixes:
+ - Fix compilation when --enable-fill is specified.
+ - Fix threads-related profiling bugs that affected accuracy and caused memory
+ to be leaked during thread exit.
+ - Fix dirty page purging race conditions that could cause crashes.
+ - Fix crash in tcache flushing code during thread destruction.
+
+* 1.0.0 (April 11, 2010)
+
+ This release focuses on speed and run-time introspection. Numerous
+ algorithmic improvements make this release substantially faster than its
+ predecessors.
+
+ New features:
+ - Implement autoconf-based configuration system.
+ - Add mallctl*(), for the purposes of introspection and run-time
+ configuration.
+ - Make it possible for the application to manually flush a thread's cache, via
+ the "tcache.flush" mallctl.
+ - Base maximum dirty page count on proportion of active memory.
+ - Compute various additional run-time statistics, including per size class
+ statistics for large objects.
+ - Expose malloc_stats_print(), which can be called repeatedly by the
+ application.
+ - Simplify the malloc_message() signature to only take one string argument,
+ and incorporate an opaque data pointer argument for use by the application
+ in combination with malloc_stats_print().
+ - Add support for allocation backed by one or more swap files, and allow the
+ application to disable over-commit if swap files are in use.
+ - Implement allocation profiling and leak checking.
+
+ Removed features:
+ - Remove the dynamic arena rebalancing code, since thread-specific caching
+ reduces its utility.
+
+ Bug fixes:
+ - Modify chunk allocation to work when address space layout randomization
+ (ASLR) is in use.
+ - Fix thread cleanup bugs related to TLS destruction.
+ - Handle 0-size allocation requests in posix_memalign().
+ - Fix a chunk leak. The leaked chunks were never touched, so this impacted
+ virtual memory usage, but not physical memory usage.
+
+* linux_2008082[78]a (August 27/28, 2008)
+
+ These snapshot releases are the simple result of incorporating Linux-specific
+ support into the FreeBSD malloc sources.
+
+--------------------------------------------------------------------------------
+vim:filetype=text:textwidth=80