summaryrefslogtreecommitdiffstats
path: root/src/transient_detector.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Remove dead transient hooks, hack override, and unused RMS helperDaniil Cherednik2026-04-091-73/+0
|
* atrac3: add boundary transient thresholding to prune low-value gain transitionsDaniil Cherednik2026-04-081-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem Gain curve generation emitted many +/-1 level transitions that do not correspond to strong local transients. These points consume gain-info bits and can create low-level modulation artifacts without improving transient handling. Solution Introduce explicit transient evidence gating at transition boundaries in CalcCurve(), and wire it to the existing dynamic min-score path. What changed - Added BoundaryTransientScore(env, loc, win): - computes local ratio around each subframe boundary - R = max(max_right/max_left, max_left/max_right) - short symmetric window (win=3 subframes) - Re-enabled minScore usage in CalcCurve() (previously ignored). - For each level transition candidate at loc=sf+1: - keep unconditionally if loc==targetSf (tail neutral anchor) - keep unconditionally if |deltaLevel| >= 2 (strong step) - otherwise keep only if BoundaryTransientScore(loc) >= minScore - Added YAML telemetry: - transient_min_score - transient_window - transition_pruned {loc, delta, score} Why this is safe - Strong transitions are preserved. - Rightmost transition is preserved to keep proper return-to-neutral anchoring. - Only low-confidence small toggles are removed. Measured impact (current branch comparison) Baseline: ea4d33b38 (before this change) Tracks: show_me_your_spine.wav, 13.wav Gain-info bits / points: - spine: 191,697 -> 150,297 bits (delta -41,400; -21.6%) 15,593 -> 10,993 points (delta -4,600) - 13.wav: 1,299,035 -> 979,931 bits (delta -319,104; -24.6%) 97,035 -> 61,579 points (delta -35,456) Subjective note User listening reports improved sound and fixes for some low-level artifacts.
* atrac3: make sticky gain quantization conditional and tune thresholdsDaniil Cherednik2026-04-081-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem The distribution-aware sticky quantizer reduced gain-curve bitrate, but in some release/transient frames it over-merged nearby transitions. On spine around 17.657s (ch1/band2), this collapsed the curve shape and could produce an audible spike. What changed - Added frame-level sticky eligibility gating in CalcCurve(). - Sticky is now enabled only when both conditions hold: - intra-frame ratio is limited: max_gain / target <= kStickyMaxIntraFrameRatio - inter-frame target jump is limited: prev_target / target (symmetric) <= kStickyMaxInterFrameRatio - Added local uncertainty guard for sticky hold: - require idx span from [subframeLow, subframeHigh] quantization to be narrow (idxSpan <= 1) before allowing prev-level hold. - Added YAML diagnostics per band/frame to make gating decisions auditable: - sticky_frame_eligible - sticky_intra_ratio - sticky_inter_ratio Threshold tuning Swept candidate pairs on both tracks: - show_me_your_spine.wav - 13.wav Pairs tested: (5,6), (5,8), (6,8), (6,10), (7,8), (7,10), (8,12) Selected: - kStickyMaxIntraFrameRatio = 7.0 - kStickyMaxInterFrameRatio = 10.0 Reason for selection - Keeps safety behavior on known failure site: frame 760, ch1, band2 remains sticky_frame_eligible=false and retains non-collapsed curve shape (loc 1,2,5,7). - Improves gain-modulation bitrate vs previous 6/8 tuning while avoiding fully open behavior. Measured gain-modulation bits (spine + 13.wav) - 6/8: 1,493,639 bits - 7/10: 1,490,732 bits (selected, -2,907 bits vs 6/8) - 8/12: 1,488,824 bits (lowest in sweep; not selected to keep extra margin)
* atrac3: add distribution-aware sticky gain quantizationDaniil Cherednik2026-04-081-6/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem Gain curve construction still produced many +/-1 level toggles across long runs (e.g. 7<->8 chatter). These transitions are usually quantization noise from subframe-level RMS rounding, not real envelope changes, and they consume gain bit budget without improving transient protection. Feature Introduce distribution-aware sticky quantization for subframe gain levels. Instead of quantizing only the subframe centre estimate, we also track a robust within-subframe range and suppress one-step toggles when the previous level is still consistent with that range. Implementation 1) AnalyzeGain now optionally returns per-subframe low/high energy estimates (robust inter-quantile bounds from micro-chunk analysis inside each subframe). 2) CalcCurve now accepts optional subframe low/high vectors. 3) During sfLevel quantization: - compute centre level via RelationToIdx(filtered/target) - if new level differs from previous by exactly 1, and previous level is still inside [idx(low), idx(high)], keep previous level (sticky hold) 4) CreateSubbandInfo wires the new AnalyzeGain outputs into CalcCurve. 5) Existing point0 guard/boundary logic remains intact; this feature operates earlier at sfLevel formation. Why this is safe - Only suppresses +/-1 oscillation when previous level is still supported by observed subframe distribution. - Does not clamp large transitions or remove structurally important points. - Keeps curve scan/priority flow unchanged after sfLevel is formed. Measured impact on current HEAD (gain-info bits) Bit accounting uses ATRAC3 gain syntax: per channel header + per band point-count fields + 9 bits per gain point. show_me_your_spine.wav: - base: 219,552 bits (18,688 points) - with sticky: 172,158 bits (13,422 points) - saved: 47,394 bits, 5,266 points (-21.59% gain-info bits) 13.wav: - base: 1,537,724 bits (123,556 points) - with sticky: 1,146,746 bits (80,114 points) - saved: 390,978 bits, 43,442 points (-25.43% gain-info bits)
* atrac3: allow to configure median filter during gain curve calculationDaniil Cherednik2026-04-081-26/+22
|
* atrac3: Prefer largest locations of gain curve points.Daniil Cherednik2026-04-081-2/+6
|
* atrac3: use in.back() as staircase target instead of nextLevelDaniil Cherednik2026-04-081-4/+6
| | | | | | | | | | | | | | | For non-plateau frames, nextLevel (first lookahead subframe of the next frame) can be 6× higher than in.back() on release frames. Using it as the staircase target caused tail subframes to appear below target → spurious amplifying points (e.g. {level:7, loc:31}) on release tails, and underestimated ATT on the peak (33× ratio reduced to 5× because the wrong target inflated the denominator). Fix: always use in.back() (actual last subframe of the analysis window) as the staircase target. That is where the signal truly returns to within this frame. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* atrac3: replace CalcCurve with staircase level-scan algorithmDaniil Cherednik2026-04-081-52/+85
| | | | | | | | | | | | | | | Replaces the monotone-triplet transient detector with a level-based staircase scan that builds the gain curve from the target subframe leftward. The new algorithm correctly handles rising transients by attenuating the loud peak region rather than the quiet onset. Key changes: - 3-point median filter on gain[] suppresses isolated spikes - Per-sf level = RelationToIdx(filtered[sf] / target) - Scan leftward from first-neutral-sf, emit one point per level change - Priority trim: keep up to 6 points with largest |ΔLevel| first Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* atrac3: fix gc_scale at loc=0 using prev_target/target ratioDaniil Cherednik2026-04-081-0/+19
| | | | | | | | | | | | | | | | | | When the first detected transient is at location 0, the CalcCurve loop computed level = RelationToIdx(in[0]/target). But in[0] is the ramp START — for loc=0 there is no pre-ramp region, so in[0] is not the right amplitude reference for gc_scale, which divides ALL of bufCur (the previous frame's MDCT window). The external point0 block derives its formula as: hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] ≈ target point0Level = RelationToIdx(prevTarget / hpfRmsNextMod) ≈ prevTarget/target For loc=0, hpfRmsNextModValid is false and the block cannot fire. Override curve[0].Level inside CalcCurve with RelationToIdx(savedPrevTarget/target) — consistent with the loc>0 formula and correctly bridges cross-frame amplitude. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* atrac3: point0 in HPF domain using prevTarget/hpfRmsNextMod; enable band 2Daniil Cherednik2026-04-081-0/+1
| | | | | | | | | | | | | | Point0 calculation switched from raw-PCM RMS ratio to HPF-filtered domain: - prevTarget (stored as ctx.LastTarget from previous CalcCurve call) replaces rmsCur - hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] replaces rmsNextMod (only the pre-ramp constant-level zone, same domain as gain[]) - TCurveBuilderCtx gains LastTarget field; CalcCurve stores target before returning Also extends gain control to band 2 (~11–16 kHz) by changing the skip threshold from band >= 2 to band >= 3. Perceptually sounds better; regression metrics worsen due to broadband measurement not capturing per-band HF improvement. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* atrac3: log plateau result and target source inside CalcCurveDaniil Cherednik2026-04-081-1/+12
| | | | | | | | | | Pass yamlLog into CalcCurve so plateau_level, plateau_max_raw, plateau_release, and target/source are emitted directly from the function that computes them, instead of via stale TCurveBuilderCtx fields. Remove LastTarget and LastTargetFromPlateau from the context struct entirely. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* atrac3: plateau-based target selection in CalcCurveDaniil Cherednik2026-04-081-5/+104
| | | | | | | | | | | | | | | | | | | | | | | Introduces FindPlateau() which finds the maximum sustained amplitude level where at least 3 consecutive subframes exceed it (sliding-window minimum approach). The plateau is used as the CalcCurve target instead of nextLevel when the frame contains a genuine sustained peak that does not end in a release. Release detection uses two conditions: - Hard tail: last subframe < 10% of plateau (clear ring-down) - Soft tail: last subframe < 50% of plateau with no post-plateau recovery above 70% A MaxRaw guard (plateau >= 40% of frame peak) prevents the quiet noise floor from being mistaken for a plateau when the frame contains a much louder transient spike. When plateau is used as target, pre-plateau quiet regions produce AMP curves normalizing toward the sustained peak, reducing the extreme ATT levels that were causing post-echo artifacts. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* atrac3: ratio-scored transients, per-band gain boost, pre-echo reductionDaniil Cherednik2026-03-151-98/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrites the ATRAC3 gain control pipeline to eliminate noise flashes and substantially reduce pre-echo artifacts: Transient detection & curve building (transient_detector.cpp/h): - Replace legacy heuristic detector with ratio-scored DetectTransients: score = peak/floor for rising (c/a) or falling (a/c) triplets, scored and ranked, top-N kept, sorted by location - Add explicit point0 derived from windowed-RMS match between bufCur and the curve-modulated bufNext (CalcWindowedRmsAfterCurve) - Replace RegionMax with RegionRMS for smoother region amplitude estimate - Add per-band detection thresholds kMinScorePerBand[4] = {1.9,1.9,2.1,2.2} - Dynamic minScore: scale threshold by min(1.5, max(1.0, overlapRatio)) to suppress false-positive curves when previous frame dominates - Scale constraint: curve[0].Level >= 3 to cap cross-frame amplification at 2x Bit allocation (atrac3_bitstream.cpp/h): - Add GainBoostPerBand[NumQMF] to TSingleChannelElement, computed in CreateSubbandInfo and applied in CalcBitsAllocation - levelBoost: compensate for Demodulate's GainLevel[minLevel] attenuation - scaleBoost: compensate for next-frame cross-frame scale via lookahead - Both capped (kLevelBoostCap=1, kScaleBoostCap=2) to avoid bit starvation Upsampler (atrac3denc.cpp): - Raise cutoff from 600 Hz to 800 Hz for tighter band separation Tests (gain_processor_ut.cpp): - Relax fixed curve shape assertions to ExpectCurveReasonable (bounds checks) - Relax quantization error bound for dense-event spacing (<=128 samples) Results (branch new_psy vs original baseline): riddler: pre-echo worse 43->15/479 frames, 0 flashes spine: pre-echo worse 192->107/1804 frames, 0 flashes Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* Integrate TSpectralUpsampler into ATRAC3 gain control and fix CalcCurve ctx ↵Daniil Cherednik2026-03-081-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tracking Encoder (atrac3denc.cpp / atrac3denc.h): - Replace old TTransientParam / TransientParamsHistory with TSpectralUpsampler- based CreateSubbandInfo(): analyses the upsampled QMF band, computes gain[] and nextLevel from the contiguous look-ahead buffer, and calls CalcCurve to build ATRAC3 gain-curve points. - highFreqRatio guard: skip CalcCurve for sub-bass bands where the HPF signal is too weak to produce meaningful gain control. CalcCurve (transient_detector.cpp): - Fix Issue 1 (FFT-window context mismatch at frame boundary): Store ctx.LastLevel = in.back() instead of target (nextLevel). in.back() and the next call's gain[0] are both analysis-domain estimates of adjacent 8-sample blocks — no cross-domain FFT-window divergence that produced false boundary transients. - Guard against zero savedLastLevel (first frame or post-reset): return empty curve rather than emitting scaleLevel=15 (GainLevel=1/2048) which would cause extreme amplification in the gain modulator. - Tighten gain-point budget to 7 (< MaxGainPointsNum=8) to match the 3-bit count field in the ATRAC3 bitstream. Tests (gain_processor_ut.cpp): - Add BoundaryLevelMismatch suite: Issue1_FalseTransientOnConstantTone_AfterOnset, Issue1_MdctRoundtrip_NoGain, Issue1_MdctRoundtrip_WithGain, Issue1_RoundtripWithGainAndQuantization. - Quantization test threshold set to 400× kQuantStep: correct two-point gain curves for a 9:1 amplitude-ratio signal produce at most ~323× peak error (scale×level=16 × ~8× IMDCT base noise); pathological false transients would cause signal-level reconstruction errors well above this bound. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* Add CalcCurve and TSpectralUpsampler for transient detectionDaniil Cherednik2026-03-061-1/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | CalcCurve (transient_detector.cpp/h): - Recursive divide-and-conquer FindTransients scans the gain vector for monotonic 3-subframe windows (rising or falling); kMinScore=2.0 filters out oscillations smaller than a factor of 2 (no-op at Level 4). - RelationToIdx maps an amplitude ratio to an ATRAC3 gain Level index. - TCurveBuilderCtx carries LastLevel across frames; CalcCurve prepends it as a virtual boundary element to detect Location=0 attacks. - budget=8 matches ATRAC3 SubbandInfo::MaxGainPointsNum. TSpectralUpsampler (transient_spectral_upsampler.cpp/h): - Applies a Planck-taper window (ε=0.15) to a 512-sample context window, forward-FFTs, applies a 3-bin raised-cosine HPF, zero-pads to 4096 bins, and inverse-FFTs to give an 8× upsampled output. - Returns highFreqRatio = Σ|X[k]·H[k]|²/Σ|X[k]|²; callers skip CalcCurve when this is below kHighFreqThreshold=0.05, preventing false transients from Planck noise-floor variation in sub-cutoff frames. Tests: - gain_processor_ut: upsampled-path blocks added to all FreqDomain tests; CalcCurve negative tests (NegativeTests suite). - transient_spectral_upsampler_ut: OutputSize, DCIsRemovedByLowCutFilter, HighFreqSinePreservesRMS (parametrised), ChirpNoTransient (0→5510 Hz sweep at 689 Hz low-cut, Len1024/16384/262144). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* [refactoring] Get rid of TFloat parameter. Use float.Daniil Cherednik2024-12-211-18/+18
|
* Allow to configure atracdenc to use float instead of doubleDaniil Cherednik2024-07-091-5/+5
| | | | | | | | | The default is double for hystorical reason. Actually no any reason to use double precision float point calculation for all processing. AT3P encoding will use float except places were we really need double precision. So this default will be changed to float in near future and this option will be removed.
* Fix CodeQL (Multiplication result converted to larger type) warningDaniil Cherednik2022-02-221-3/+3
|
* Support for MS Windows platformDaniil Cherednik2019-05-151-1/+1
| | | | | | - VS build (2017 tested) - Media Foundation Framework support to read/write pcm data (instead of libsndfile)
* Simplest gain control - scale first part of frame if the attack transient ↵Daniil Cherednik2017-10-171-1/+1
| | | | was detected
* Set LGPL v2.1 licenseDaniil Cherednik2017-02-121-0/+18
|
* Dirty implementation of atrac3 encoder:Daniil Cherednik2016-09-021-8/+34
| | | | | | | - no JS mode - constant quantiser for tonal components - gain controll implemented but produces some artifacts with real signals. - etc...
* Compilation warnings fixedDaniil Cherednik2016-07-171-2/+2
|
* Fix building with cmake < 3.1Daniil Cherednik2016-07-171-0/+1
|
* Fix memory corruption: wrong size of HPFBuffer.Daniil Cherednik2016-04-221-7/+5
|
* Some improvements of transient detectorDaniil Cherednik2015-12-241-6/+13
| | | | | - use one last subblock of previous frame - detect postecho distortion
* initial implementation of transient detection addedDaniil Cherednik2015-12-231-0/+51