| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem
Gain curve generation emitted many +/-1 level transitions that do not correspond
to strong local transients. These points consume gain-info bits and can create
low-level modulation artifacts without improving transient handling.
Solution
Introduce explicit transient evidence gating at transition boundaries in
CalcCurve(), and wire it to the existing dynamic min-score path.
What changed
- Added BoundaryTransientScore(env, loc, win):
- computes local ratio around each subframe boundary
- R = max(max_right/max_left, max_left/max_right)
- short symmetric window (win=3 subframes)
- Re-enabled minScore usage in CalcCurve() (previously ignored).
- For each level transition candidate at loc=sf+1:
- keep unconditionally if loc==targetSf (tail neutral anchor)
- keep unconditionally if |deltaLevel| >= 2 (strong step)
- otherwise keep only if BoundaryTransientScore(loc) >= minScore
- Added YAML telemetry:
- transient_min_score
- transient_window
- transition_pruned {loc, delta, score}
Why this is safe
- Strong transitions are preserved.
- Rightmost transition is preserved to keep proper return-to-neutral anchoring.
- Only low-confidence small toggles are removed.
Measured impact (current branch comparison)
Baseline: ea4d33b38 (before this change)
Tracks: show_me_your_spine.wav, 13.wav
Gain-info bits / points:
- spine: 191,697 -> 150,297 bits (delta -41,400; -21.6%)
15,593 -> 10,993 points (delta -4,600)
- 13.wav: 1,299,035 -> 979,931 bits (delta -319,104; -24.6%)
97,035 -> 61,579 points (delta -35,456)
Subjective note
User listening reports improved sound and fixes for some low-level artifacts.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem
The distribution-aware sticky quantizer reduced gain-curve bitrate, but in some
release/transient frames it over-merged nearby transitions. On spine around
17.657s (ch1/band2), this collapsed the curve shape and could produce an
audible spike.
What changed
- Added frame-level sticky eligibility gating in CalcCurve().
- Sticky is now enabled only when both conditions hold:
- intra-frame ratio is limited: max_gain / target <= kStickyMaxIntraFrameRatio
- inter-frame target jump is limited: prev_target / target (symmetric) <= kStickyMaxInterFrameRatio
- Added local uncertainty guard for sticky hold:
- require idx span from [subframeLow, subframeHigh] quantization to be narrow
(idxSpan <= 1) before allowing prev-level hold.
- Added YAML diagnostics per band/frame to make gating decisions auditable:
- sticky_frame_eligible
- sticky_intra_ratio
- sticky_inter_ratio
Threshold tuning
Swept candidate pairs on both tracks:
- show_me_your_spine.wav
- 13.wav
Pairs tested:
(5,6), (5,8), (6,8), (6,10), (7,8), (7,10), (8,12)
Selected:
- kStickyMaxIntraFrameRatio = 7.0
- kStickyMaxInterFrameRatio = 10.0
Reason for selection
- Keeps safety behavior on known failure site:
frame 760, ch1, band2 remains sticky_frame_eligible=false
and retains non-collapsed curve shape (loc 1,2,5,7).
- Improves gain-modulation bitrate vs previous 6/8 tuning while avoiding fully
open behavior.
Measured gain-modulation bits (spine + 13.wav)
- 6/8: 1,493,639 bits
- 7/10: 1,490,732 bits (selected, -2,907 bits vs 6/8)
- 8/12: 1,488,824 bits (lowest in sweep; not selected to keep extra margin)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem
Gain curve construction still produced many +/-1 level toggles across long runs
(e.g. 7<->8 chatter). These transitions are usually quantization noise from
subframe-level RMS rounding, not real envelope changes, and they consume gain
bit budget without improving transient protection.
Feature
Introduce distribution-aware sticky quantization for subframe gain levels.
Instead of quantizing only the subframe centre estimate, we also track a robust
within-subframe range and suppress one-step toggles when the previous level is
still consistent with that range.
Implementation
1) AnalyzeGain now optionally returns per-subframe low/high energy estimates
(robust inter-quantile bounds from micro-chunk analysis inside each subframe).
2) CalcCurve now accepts optional subframe low/high vectors.
3) During sfLevel quantization:
- compute centre level via RelationToIdx(filtered/target)
- if new level differs from previous by exactly 1, and previous level is still
inside [idx(low), idx(high)], keep previous level (sticky hold)
4) CreateSubbandInfo wires the new AnalyzeGain outputs into CalcCurve.
5) Existing point0 guard/boundary logic remains intact; this feature operates
earlier at sfLevel formation.
Why this is safe
- Only suppresses +/-1 oscillation when previous level is still supported by
observed subframe distribution.
- Does not clamp large transitions or remove structurally important points.
- Keeps curve scan/priority flow unchanged after sfLevel is formed.
Measured impact on current HEAD (gain-info bits)
Bit accounting uses ATRAC3 gain syntax: per channel header + per band point-count
fields + 9 bits per gain point.
show_me_your_spine.wav:
- base: 219,552 bits (18,688 points)
- with sticky: 172,158 bits (13,422 points)
- saved: 47,394 bits, 5,266 points (-21.59% gain-info bits)
13.wav:
- base: 1,537,724 bits (123,556 points)
- with sticky: 1,146,746 bits (80,114 points)
- saved: 390,978 bits, 43,442 points (-25.43% gain-info bits)
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For non-plateau frames, nextLevel (first lookahead subframe of the next
frame) can be 6× higher than in.back() on release frames. Using it as
the staircase target caused tail subframes to appear below target →
spurious amplifying points (e.g. {level:7, loc:31}) on release tails,
and underestimated ATT on the peak (33× ratio reduced to 5× because the
wrong target inflated the denominator).
Fix: always use in.back() (actual last subframe of the analysis window)
as the staircase target. That is where the signal truly returns to
within this frame.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces the monotone-triplet transient detector with a level-based
staircase scan that builds the gain curve from the target subframe
leftward. The new algorithm correctly handles rising transients by
attenuating the loud peak region rather than the quiet onset.
Key changes:
- 3-point median filter on gain[] suppresses isolated spikes
- Per-sf level = RelationToIdx(filtered[sf] / target)
- Scan leftward from first-neutral-sf, emit one point per level change
- Priority trim: keep up to 6 points with largest |ΔLevel| first
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the first detected transient is at location 0, the CalcCurve loop
computed level = RelationToIdx(in[0]/target). But in[0] is the ramp
START — for loc=0 there is no pre-ramp region, so in[0] is not the right
amplitude reference for gc_scale, which divides ALL of bufCur (the
previous frame's MDCT window).
The external point0 block derives its formula as:
hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] ≈ target
point0Level = RelationToIdx(prevTarget / hpfRmsNextMod) ≈ prevTarget/target
For loc=0, hpfRmsNextModValid is false and the block cannot fire.
Override curve[0].Level inside CalcCurve with RelationToIdx(savedPrevTarget/target)
— consistent with the loc>0 formula and correctly bridges cross-frame amplitude.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Point0 calculation switched from raw-PCM RMS ratio to HPF-filtered domain:
- prevTarget (stored as ctx.LastTarget from previous CalcCurve call) replaces rmsCur
- hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] replaces rmsNextMod
(only the pre-ramp constant-level zone, same domain as gain[])
- TCurveBuilderCtx gains LastTarget field; CalcCurve stores target before returning
Also extends gain control to band 2 (~11–16 kHz) by changing the skip threshold
from band >= 2 to band >= 3. Perceptually sounds better; regression metrics
worsen due to broadband measurement not capturing per-band HF improvement.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
Pass yamlLog into CalcCurve so plateau_level, plateau_max_raw,
plateau_release, and target/source are emitted directly from the
function that computes them, instead of via stale TCurveBuilderCtx
fields. Remove LastTarget and LastTargetFromPlateau from the context
struct entirely.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduces FindPlateau() which finds the maximum sustained amplitude
level where at least 3 consecutive subframes exceed it (sliding-window
minimum approach). The plateau is used as the CalcCurve target instead
of nextLevel when the frame contains a genuine sustained peak that does
not end in a release.
Release detection uses two conditions:
- Hard tail: last subframe < 10% of plateau (clear ring-down)
- Soft tail: last subframe < 50% of plateau with no post-plateau
recovery above 70%
A MaxRaw guard (plateau >= 40% of frame peak) prevents the quiet noise
floor from being mistaken for a plateau when the frame contains a
much louder transient spike.
When plateau is used as target, pre-plateau quiet regions produce
AMP curves normalizing toward the sustained peak, reducing the
extreme ATT levels that were causing post-echo artifacts.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rewrites the ATRAC3 gain control pipeline to eliminate noise flashes
and substantially reduce pre-echo artifacts:
Transient detection & curve building (transient_detector.cpp/h):
- Replace legacy heuristic detector with ratio-scored DetectTransients:
score = peak/floor for rising (c/a) or falling (a/c) triplets, scored
and ranked, top-N kept, sorted by location
- Add explicit point0 derived from windowed-RMS match between bufCur and
the curve-modulated bufNext (CalcWindowedRmsAfterCurve)
- Replace RegionMax with RegionRMS for smoother region amplitude estimate
- Add per-band detection thresholds kMinScorePerBand[4] = {1.9,1.9,2.1,2.2}
- Dynamic minScore: scale threshold by min(1.5, max(1.0, overlapRatio)) to
suppress false-positive curves when previous frame dominates
- Scale constraint: curve[0].Level >= 3 to cap cross-frame amplification at 2x
Bit allocation (atrac3_bitstream.cpp/h):
- Add GainBoostPerBand[NumQMF] to TSingleChannelElement, computed in
CreateSubbandInfo and applied in CalcBitsAllocation
- levelBoost: compensate for Demodulate's GainLevel[minLevel] attenuation
- scaleBoost: compensate for next-frame cross-frame scale via lookahead
- Both capped (kLevelBoostCap=1, kScaleBoostCap=2) to avoid bit starvation
Upsampler (atrac3denc.cpp):
- Raise cutoff from 600 Hz to 800 Hz for tighter band separation
Tests (gain_processor_ut.cpp):
- Relax fixed curve shape assertions to ExpectCurveReasonable (bounds checks)
- Relax quantization error bound for dense-event spacing (<=128 samples)
Results (branch new_psy vs original baseline):
riddler: pre-echo worse 43->15/479 frames, 0 flashes
spine: pre-echo worse 192->107/1804 frames, 0 flashes
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tracking
Encoder (atrac3denc.cpp / atrac3denc.h):
- Replace old TTransientParam / TransientParamsHistory with TSpectralUpsampler-
based CreateSubbandInfo(): analyses the upsampled QMF band, computes gain[]
and nextLevel from the contiguous look-ahead buffer, and calls CalcCurve to
build ATRAC3 gain-curve points.
- highFreqRatio guard: skip CalcCurve for sub-bass bands where the HPF signal
is too weak to produce meaningful gain control.
CalcCurve (transient_detector.cpp):
- Fix Issue 1 (FFT-window context mismatch at frame boundary):
Store ctx.LastLevel = in.back() instead of target (nextLevel).
in.back() and the next call's gain[0] are both analysis-domain estimates of
adjacent 8-sample blocks — no cross-domain FFT-window divergence that produced
false boundary transients.
- Guard against zero savedLastLevel (first frame or post-reset): return empty
curve rather than emitting scaleLevel=15 (GainLevel=1/2048) which would cause
extreme amplification in the gain modulator.
- Tighten gain-point budget to 7 (< MaxGainPointsNum=8) to match the 3-bit
count field in the ATRAC3 bitstream.
Tests (gain_processor_ut.cpp):
- Add BoundaryLevelMismatch suite: Issue1_FalseTransientOnConstantTone_AfterOnset,
Issue1_MdctRoundtrip_NoGain, Issue1_MdctRoundtrip_WithGain,
Issue1_RoundtripWithGainAndQuantization.
- Quantization test threshold set to 400× kQuantStep: correct two-point gain
curves for a 9:1 amplitude-ratio signal produce at most ~323× peak error
(scale×level=16 × ~8× IMDCT base noise); pathological false transients would
cause signal-level reconstruction errors well above this bound.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CalcCurve (transient_detector.cpp/h):
- Recursive divide-and-conquer FindTransients scans the gain vector for
monotonic 3-subframe windows (rising or falling); kMinScore=2.0 filters
out oscillations smaller than a factor of 2 (no-op at Level 4).
- RelationToIdx maps an amplitude ratio to an ATRAC3 gain Level index.
- TCurveBuilderCtx carries LastLevel across frames; CalcCurve prepends it
as a virtual boundary element to detect Location=0 attacks.
- budget=8 matches ATRAC3 SubbandInfo::MaxGainPointsNum.
TSpectralUpsampler (transient_spectral_upsampler.cpp/h):
- Applies a Planck-taper window (ε=0.15) to a 512-sample context window,
forward-FFTs, applies a 3-bin raised-cosine HPF, zero-pads to 4096 bins,
and inverse-FFTs to give an 8× upsampled output.
- Returns highFreqRatio = Σ|X[k]·H[k]|²/Σ|X[k]|²; callers skip CalcCurve
when this is below kHighFreqThreshold=0.05, preventing false transients
from Planck noise-floor variation in sub-cutoff frames.
Tests:
- gain_processor_ut: upsampled-path blocks added to all FreqDomain tests;
CalcCurve negative tests (NegativeTests suite).
- transient_spectral_upsampler_ut: OutputSize, DCIsRemovedByLowCutFilter,
HighFreqSinePreservesRMS (parametrised), ChirpNoTransient (0→5510 Hz
sweep at 689 Hz low-cut, Len1024/16384/262144).
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
|
| |
|
|
|
|
|
|
|
| |
The default is double for hystorical reason.
Actually no any reason to use double precision float point
calculation for all processing. AT3P encoding will use float
except places were we really need double precision.
So this default will be changed to float in near future and
this option will be removed.
|
| | |
|
| |
|
|
|
|
| |
- VS build (2017 tested)
- Media Foundation Framework support to read/write pcm data (instead of
libsndfile)
|
| |
|
|
| |
was detected
|
| | |
|
| |
|
|
|
|
|
| |
- no JS mode
- constant quantiser for tonal components
- gain controll implemented but produces some artifacts with real signals.
- etc...
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
| |
- use one last subblock of previous frame
- detect postecho distortion
|
| |
|