| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Widen the ASCII fopen mode string with std::wstring's iterator-pair
constructor instead of maintaining a hand-coded mapping. ASCII (0-127)
codepoints are valid wchar_t code units on Windows, which is the only
alphabet fopen modes use.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
|
| |
|
|
| |
Reuse the shared UTF-8 path helper in Media Foundation, normalize compressed output open errors, cover AEA encode/decode paths, and expand integration tests for ATRAC1 and decode filenames.
|
| |
|
|
| |
Report libsndfile open failures before sample-rate validation, use UTF-16 Windows opens for PCM and compressed containers, and add integration tests for missing input plus UTF-8 input/output filenames.
|
| | |
|
| |
|
|
|
|
|
|
|
| |
* Add MSYS2 build support
Add a selectable PCM I/O backend so MSVC builds can keep Media Foundation while MSYS2/MinGW builds use libsndfile.
Teach the libsndfile finder about MINGW_PREFIX and add a Windows MSYS2 CI job that builds the libsndfile backend.
Fix and enable tests for MSYS2 builds
|
| |
|
|
|
| |
AT3 RIFF output always stores a two-channel ATRAC3 stream.
Mono input is encoded as duplicated single channels or as joint stereo with an empty side channel.
|
| | |
|
| |\
| |
| |
| |
| | |
Conservative initial implementation of ATRAC3 tonal extraction.
This is a safe first step tuned for stability and bitstream compatibility, not maximum aggressiveness. It gives the most benefit on synthetic signals and material with strong, steady pure tones (for example simple electronic leads and solo tonal instruments).
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Add shared CalcSpectralFlatnessPerBfu helper in atrac_psy_common
with BFU-table mapping.
- Implement ATRAC3 tonal extraction: compute MDCT energy, estimate
per-BFU flatness, extract up to 5-bin strongest tonal run in
low-flatness BFUs, and zero extracted bins in residual.
- Map extracted tonal bins into TTonalBlocks and integrate them into
bitstream coding.
- Update ATRAC3 bit allocation - reduce residual bits for BFUs with tonal
blocks, and increase tonal quantizer selection.
- Restore --notonal CLI option in main.cpp for A/B comparison.
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AT3-in-WAV writer produces headers that Sony's psp_at3tool rejects for
files longer than around forty seconds. The tool prints "input file is
illegal file or over 2G Byte" and refuses to decode. ffmpeg accepts the
file but decodes it without any encoder-delay compensation, leaving a
variable lag of several hundred samples relative to the source. The two
observations have a common root cause: the header we write is missing
fields that downstream decoders rely on.
This patch addresses three concrete issues in src/at3.cpp.
First, the writer emits no fact chunk. The fact chunk is optional in the
general RIFF specification but is how WAVEFORMATEX based codecs announce
the total number of decoded samples per channel. psp_at3tool uses the
sample count together with samples-per-frame to decide how much PCM to
produce and where to stop. Without a fact chunk the tool falls back to a
short default and either truncates output or, for longer streams, rejects
the file outright. ffmpeg uses the same field to skip encoder priming
samples. Sony's own AT3 files carry this chunk with a fixed eight byte
payload containing total_samples and samples_per_frame. We now write the
same structure.
Second, the bytes_per_frame field in the ATRAC3 extradata was hardcoded
to 0x10 with an XXX comment. The correct value for standard ATRAC3 is
0x1000, that is 4096, which corresponds to the PCM bytes represented by
one frame (1024 samples per channel times two channels times two bytes
per sample). Sony's encoder writes 4096 at this offset and both ffmpeg
and psp_at3tool validate against that number. The previous value of
sixteen bytes per frame is nonsensical and was part of why psp_at3tool
misestimated the playback length.
Third, the RIFF chunk_size field was being written as the full file size.
By the RIFF specification this field should hold the size of everything
that follows the field itself, that is file_size minus eight. Writing the
full size is tolerated by ffmpeg but violates the specification and makes
the file look larger than it is to strict parsers.
Because the PCM engine can flush additional frames after the initially
estimated numFrames count (due to look-ahead tail during encoding), the
three length fields chunk_size, total_samples, and subchunk2_size were
stale by one to three frames relative to the actual data on disk. To
keep them consistent, TAt3 now counts frames as WriteFrame is called and
seeks back to overwrite the three length fields in the destructor, so
the final file describes its real contents.
The patch is purely a container metadata fix. The encoded AT3 payload is
byte-identical to before. After this change, output from atracdenc for
long test tracks (90 and 186 seconds, 132 kbps LP2) is accepted and fully
decoded by psp_at3tool in a single pass, and ffmpeg decodes with a
constant small codec latency instead of the previous variable drift.
This made it possible to run a proper triple comparison against Sony's
reference encoder, which previously looked catastrophic (gap around
-22 dB SNR) purely due to the alignment problem but sits at roughly
-0.5 to -1.4 dB SNR once the container headers are correct.
Signed-off-by: hilman2 <[email protected]>
|
| | |
|
| | |
|
| |\
| |
| | |
New experimental gain control code for atrac3
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem
Gain curve generation emitted many +/-1 level transitions that do not correspond
to strong local transients. These points consume gain-info bits and can create
low-level modulation artifacts without improving transient handling.
Solution
Introduce explicit transient evidence gating at transition boundaries in
CalcCurve(), and wire it to the existing dynamic min-score path.
What changed
- Added BoundaryTransientScore(env, loc, win):
- computes local ratio around each subframe boundary
- R = max(max_right/max_left, max_left/max_right)
- short symmetric window (win=3 subframes)
- Re-enabled minScore usage in CalcCurve() (previously ignored).
- For each level transition candidate at loc=sf+1:
- keep unconditionally if loc==targetSf (tail neutral anchor)
- keep unconditionally if |deltaLevel| >= 2 (strong step)
- otherwise keep only if BoundaryTransientScore(loc) >= minScore
- Added YAML telemetry:
- transient_min_score
- transient_window
- transition_pruned {loc, delta, score}
Why this is safe
- Strong transitions are preserved.
- Rightmost transition is preserved to keep proper return-to-neutral anchoring.
- Only low-confidence small toggles are removed.
Measured impact (current branch comparison)
Baseline: ea4d33b38 (before this change)
Tracks: show_me_your_spine.wav, 13.wav
Gain-info bits / points:
- spine: 191,697 -> 150,297 bits (delta -41,400; -21.6%)
15,593 -> 10,993 points (delta -4,600)
- 13.wav: 1,299,035 -> 979,931 bits (delta -319,104; -24.6%)
97,035 -> 61,579 points (delta -35,456)
Subjective note
User listening reports improved sound and fixes for some low-level artifacts.
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem
The distribution-aware sticky quantizer reduced gain-curve bitrate, but in some
release/transient frames it over-merged nearby transitions. On spine around
17.657s (ch1/band2), this collapsed the curve shape and could produce an
audible spike.
What changed
- Added frame-level sticky eligibility gating in CalcCurve().
- Sticky is now enabled only when both conditions hold:
- intra-frame ratio is limited: max_gain / target <= kStickyMaxIntraFrameRatio
- inter-frame target jump is limited: prev_target / target (symmetric) <= kStickyMaxInterFrameRatio
- Added local uncertainty guard for sticky hold:
- require idx span from [subframeLow, subframeHigh] quantization to be narrow
(idxSpan <= 1) before allowing prev-level hold.
- Added YAML diagnostics per band/frame to make gating decisions auditable:
- sticky_frame_eligible
- sticky_intra_ratio
- sticky_inter_ratio
Threshold tuning
Swept candidate pairs on both tracks:
- show_me_your_spine.wav
- 13.wav
Pairs tested:
(5,6), (5,8), (6,8), (6,10), (7,8), (7,10), (8,12)
Selected:
- kStickyMaxIntraFrameRatio = 7.0
- kStickyMaxInterFrameRatio = 10.0
Reason for selection
- Keeps safety behavior on known failure site:
frame 760, ch1, band2 remains sticky_frame_eligible=false
and retains non-collapsed curve shape (loc 1,2,5,7).
- Improves gain-modulation bitrate vs previous 6/8 tuning while avoiding fully
open behavior.
Measured gain-modulation bits (spine + 13.wav)
- 6/8: 1,493,639 bits
- 7/10: 1,490,732 bits (selected, -2,907 bits vs 6/8)
- 8/12: 1,488,824 bits (lowest in sweep; not selected to keep extra margin)
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem
Gain curve construction still produced many +/-1 level toggles across long runs
(e.g. 7<->8 chatter). These transitions are usually quantization noise from
subframe-level RMS rounding, not real envelope changes, and they consume gain
bit budget without improving transient protection.
Feature
Introduce distribution-aware sticky quantization for subframe gain levels.
Instead of quantizing only the subframe centre estimate, we also track a robust
within-subframe range and suppress one-step toggles when the previous level is
still consistent with that range.
Implementation
1) AnalyzeGain now optionally returns per-subframe low/high energy estimates
(robust inter-quantile bounds from micro-chunk analysis inside each subframe).
2) CalcCurve now accepts optional subframe low/high vectors.
3) During sfLevel quantization:
- compute centre level via RelationToIdx(filtered/target)
- if new level differs from previous by exactly 1, and previous level is still
inside [idx(low), idx(high)], keep previous level (sticky hold)
4) CreateSubbandInfo wires the new AnalyzeGain outputs into CalcCurve.
5) Existing point0 guard/boundary logic remains intact; this feature operates
earlier at sfLevel formation.
Why this is safe
- Only suppresses +/-1 oscillation when previous level is still supported by
observed subframe distribution.
- Does not clamp large transitions or remove structurally important points.
- Keeps curve scan/priority flow unchanged after sfLevel is formed.
Measured impact on current HEAD (gain-info bits)
Bit accounting uses ATRAC3 gain syntax: per channel header + per band point-count
fields + 9 bits per gain point.
show_me_your_spine.wav:
- base: 219,552 bits (18,688 points)
- with sticky: 172,158 bits (13,422 points)
- saved: 47,394 bits, 5,266 points (-21.59% gain-info bits)
13.wav:
- base: 1,537,724 bits (123,556 points)
- with sticky: 1,146,746 bits (80,114 points)
- saved: 390,978 bits, 43,442 points (-25.43% gain-info bits)
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problem
The current point0 guard decides keep/revert only from an early-frame mismatch score.
That can revert a newly inserted point0 even when it is needed for frame-boundary
continuity. In ATRAC3 demodulation, the next frame's first gain level is reused as a
scale term for the overlap region, so removing point0 can change boundary scale by
multiple quantization steps and create audible artifacts.
Root cause
For frames like 13.wav around 45.1s, point0_guard reverted point0 in key bands,
which changed first-point scale and increased boundary mismatch despite a locally
better early-fit score.
Change
- Keep existing early mismatch metric (fit + leakage proxy).
- Add boundary-aware keep criterion inside point0_guard:
* compute desired boundary scale in the same HPF domain:
desiredScale = LimitRel(prevTarget / hpfRmsNextMod)
* compare log2 distance of first-level scale before/after point0 insertion
* if point0 reduces boundary error by a material margin (0.2 bits), force keep
even when early-fit score slightly worsens
- Apply guard only when point0 actually changes the curve.
- Add YAML telemetry for boundary error before/after to support analysis.
Implementation details
- Added helper utilities to reconstruct subframe-average divisors from curve points
and score early mismatch consistently.
- Updated point0 insertion/update flow to track whether point0 changed.
- Extended guard decision to combine:
* early mismatch tolerance (existing behavior), and
* boundary continuity improvement (new behavior).
Observed effect (focused check)
- On 13 clip (~45.1s), exact bad subframe (t=45.103129s, frame256=7769, sf32=23):
ratio vs no-gain reduced from 9.30x to 1.51x after this change.
- Frame 1942 YAML now shows point0 kept in bands where boundary error drops
substantially.
Notes
- No full regression run in this commit (intentional for fast iteration).
|
| | | |
|
| | | |
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For non-plateau frames, nextLevel (first lookahead subframe of the next
frame) can be 6× higher than in.back() on release frames. Using it as
the staircase target caused tail subframes to appear below target →
spurious amplifying points (e.g. {level:7, loc:31}) on release tails,
and underestimated ATT on the peak (33× ratio reduced to 5× because the
wrong target inflated the denominator).
Fix: always use in.back() (actual last subframe of the analysis window)
as the staircase target. That is where the signal truly returns to
within this frame.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Replaces the monotone-triplet transient detector with a level-based
staircase scan that builds the gain curve from the target subframe
leftward. The new algorithm correctly handles rising transients by
attenuating the loud peak region rather than the quiet onset.
Key changes:
- 3-point median filter on gain[] suppresses isolated spikes
- Per-sf level = RelationToIdx(filtered[sf] / target)
- Scan leftward from first-neutral-sf, emit one point per level change
- Priority trim: keep up to 6 points with largest |ΔLevel| first
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the first detected transient is at location 0, the CalcCurve loop
computed level = RelationToIdx(in[0]/target). But in[0] is the ramp
START — for loc=0 there is no pre-ramp region, so in[0] is not the right
amplitude reference for gc_scale, which divides ALL of bufCur (the
previous frame's MDCT window).
The external point0 block derives its formula as:
hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] ≈ target
point0Level = RelationToIdx(prevTarget / hpfRmsNextMod) ≈ prevTarget/target
For loc=0, hpfRmsNextModValid is false and the block cannot fire.
Override curve[0].Level inside CalcCurve with RelationToIdx(savedPrevTarget/target)
— consistent with the loc>0 formula and correctly bridges cross-frame amplitude.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Point0 calculation switched from raw-PCM RMS ratio to HPF-filtered domain:
- prevTarget (stored as ctx.LastTarget from previous CalcCurve call) replaces rmsCur
- hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] replaces rmsNextMod
(only the pre-ramp constant-level zone, same domain as gain[])
- TCurveBuilderCtx gains LastTarget field; CalcCurve stores target before returning
Also extends gain control to band 2 (~11–16 kHz) by changing the skip threshold
from band >= 2 to band >= 3. Perceptually sounds better; regression metrics
worsen due to broadband measurement not capturing per-band HF improvement.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pass yamlLog into CalcCurve so plateau_level, plateau_max_raw,
plateau_release, and target/source are emitted directly from the
function that computes them, instead of via stale TCurveBuilderCtx
fields. Remove LastTarget and LastTargetFromPlateau from the context
struct entirely.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduces FindPlateau() which finds the maximum sustained amplitude
level where at least 3 consecutive subframes exceed it (sliding-window
minimum approach). The plateau is used as the CalcCurve target instead
of nextLevel when the frame contains a genuine sustained peak that does
not end in a release.
Release detection uses two conditions:
- Hard tail: last subframe < 10% of plateau (clear ring-down)
- Soft tail: last subframe < 50% of plateau with no post-plateau
recovery above 70%
A MaxRaw guard (plateau >= 40% of frame peak) prevents the quiet noise
floor from being mistaken for a plateau when the frame contains a
much louder transient spike.
When plateau is used as target, pre-plateau quiet regions produce
AMP curves normalizing toward the sustained peak, reducing the
extreme ATT levels that were causing post-echo artifacts.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Removes the block that suppressed curves with Level<=2 when ratio was
below 10x or overlap was high, and the soft-min level clamping that
raised extreme levels to 3. Also removes the scale constraint that
forced curve[0].Level >= 3.
These heuristics degraded sound quality. The gain curve analysis
should produce correct levels directly rather than post-hoc clamping.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
Firing CalcCurve on noise-floor content wastes bitrate and can produce
extreme Level values when the target amplitude is tiny.
No regressions: spine 22/1804 pre-echo, riddler 9/479, 0 flashes.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A loc=0 point with level=4 (GainLevel=1.0, neutral) carries no information
when it would be the only curve point — the decoder uses gc_scale=1.0 for
frames with no curve, which is identical. Inserting it wastes bitstream
entries and leaves fewer bits for spectral quantization.
The guard is necessary only when other (non-loc-0) points exist: without a
neutral anchor, the decoder reads gc_scale from the first non-zero-loc point
(e.g. L2@10 → gc_scale=4× on the previous frame's OLA overlap).
Result: 0 regressions, marginal post-echo improvement (spine 38→37, riddler 6→5).
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Two heuristics that were workarounds for the old (broken) window domain
are now redundant since point0 is computed correctly at the end of the
band loop using the MDCT-input-domain energy ratio:
1. "Delay early attack points when overlap dominates" — shifted
curvePoints[0].Location +2 when hpfOverlapRatio > 0.9. point0 now
inserts at Location=0 afterward anyway, making the location shift
irrelevant to the frame boundary correction.
2. "Soft-cap first point level under overlap dominance" — forced
curvePoints[0].Level = 4 (neutral) when hpfOverlapRatio > 0.9.
Was suppressing legitimate CalcCurve attenuation; point0 computes
the correct level from rmsCur/rmsNextMod directly.
Result vs previous commit:
spine: post-echo 40→38/1804
riddler: pre-echo 11→9/479
0 noise flashes, SNR unchanged.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Three correctness fixes:
1. Fix MDCT-domain window mismatch in point0 energy computation.
bufCur already has EncodeWindow[i] baked in from the previous
frame's MDCT prep; applying 2*DecodeWindow a second time caused
a systematic ~1.58x bias in rmsCur vs rmsNextMod for stationary
signals. Fix: use bufCur[i] directly for rmsCur; use
EncodeWindow[255-pos] in CalcWindowedRmsAfterCurve so both sides
are in the same MDCT-input domain.
2. Move point0 insertion to after all gain curve modifications.
Previously point0 was computed before aggressive_suppressed gate,
softMin floor, and scale constraint could clear/raise curvePoints,
making the point0 level stale. Now point0 is inserted at the very
end of the band loop using the final curvePoints state.
Also remove the early-return guard (frameEndLevel > 1e-6f) on
scaleBoost: silent-ending frames need the lookahead correction most.
3. Remove dead crossover logic.
The crossover block (rmsCurSub/rmsNextSub comparison) used
2*DecodeWindow on bufCur which already has EncodeWindow baked in,
making rmsCurSub[0] ≈ 0 always. The guard (crossover >= 2) never
fired. Confirmed dead via YAML log analysis across all test tracks.
Result vs baseline (c040b03, branch new_psy_cont):
spine: pre-echo 45→22/1804, post-echo 58→41/1804, SNR +0.5 dB
riddler: pre-echo 11→10/479, post-echo 6/479 unchanged
0 noise flashes maintained on all tracks.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For quiet-to-loud transients, CalcCurve produces curve points with
level >= 4 (GainLevel <= 1), meaning Modulate amplifies the quiet
prefix of bufNext. The unit test GainModulation_ReducesSpectralEnergy_
QuietToLoudTransient proves this reduces HF spectral leakage by 10x by
smoothing the step function seen by the MDCT.
Previously these curves were discarded by a !anyActive guard and only a
p0-only cross-frame correction was applied. Now they fall through to the
normal processing path, with two guards:
1. kMaxAmplifyLevel=7 cap: limits amplification to 8x max.
2. kMinHfrForAmplify=0.3 threshold: skips amplifying curves when HFR is
low. Low HFR means HPF gain[] doesn't represent the full-band signal:
a tiny HPF transient can produce level 9 (32x amplification) on a
loud full-band signal, catastrophically over-inflating MDCT
coefficients and exhausting the bit budget (causing noise flashes).
Also precomputes anyAttenuating before point0 insertion so it correctly
reflects the CalcCurve output.
Results vs baseline (new_psy branch):
spine: 55->45/1804 worse pre-echo (-18%), SNR 20.9->21.9 dB (+1.0)
riddler: 10->11/479 worse pre-echo, SNR 23.7->24.2 dB (+0.5)
0 noise flashes on all tracks.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
overlapRatio (full-band PCM energy) was used to raise dynamicMinScore,
suppress aggressive curves, delay early-attack points, and soft-cap
point0 level. This caused bass-heavy previous frames to suppress
legitimate HPF-domain transients in the current frame — the two domains
are only loosely correlated.
Replace with hpfOverlapRatio = mean(prevGain[]) / mean(curGain[]),
both in the HPF-upsampled analysis domain, stored per-band per-channel
in TCurveBuilderCtx::LastHpfEnergy. Full-band overlapRatio is retained
for the kLowOverlapRelax "attack frame" checks (overlapRatio < 0.6)
and for YAML logging.
Regression (0 flashes maintained):
riddler: pre-echo worse 12→10/479, median SNR 25.1→25.4 dB
spine: pre-echo worse 65→55/1804, median SNR unchanged
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
The threshold approach (skip if p0Level > 7) and the clamp approach
(min(p0Level, 7)) are now equivalent since p0Level is computed from
plain windowed RMS of bufNext rather than the inflated rmsNextMod.
Any genuine Level>7 case (bufCur ≤ 1/8 of bufNext) gets capped to
Level 7 (8× amplification) instead of being skipped, providing at
least partial cross-frame normalisation.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
At the no_active_points skip path, when CalcCurve detected a mild
transient but all adjusted levels ended up >= 4 (no attenuation
needed), the existing code discarded the entire curve including the
computed p0.
If bufCur is significantly quieter than bufNext (Level 5-7 range),
emitting a p0-only curve lets Modulate amplify the quiet bufCur before
MDCT, normalising the cross-frame energy step that causes HF leakage
(pre-echo).
Key details:
- Recompute the ratio against plain windowed RMS of bufNext rather
than rmsNextMod. Using rmsNextMod would inflate p0Level when the
discarded curve points have Level>4, causing over-correction.
- Cap at Level 7 (8x amplification) to avoid bit allocation distortion
from amplifying extremely quiet frames.
Results on test tracks (baseline: 0 flashes, 107/1804 spine pre-echo):
spine: 0 flashes, 64/1804 pre-echo (was 107, -40%)
riddler: 0 flashes, 12/479 pre-echo (was 6, slight increase)
SNR: 20.8 dB spine / 23.5 dB riddler (unchanged)
|
| | |
| |
| |
| | |
This reverts commit b9afb2b09446e6d322421554234c33b17d138e51.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Three complementary guards based on YAML-log analysis of pre-echo frames:
1. Raise kHfrRef 0.30 → 0.50: the HFR-proportional minScore scale now kicks
in earlier, suppressing candidates with score < 5.7× on frames where the
spectral upsampler is operating below half its reference HFR.
2. Add kMinReliableHfr = 0.12: hard skip for bands where HFR is so low that
even the scaled minScore cannot filter all spurious multi-point curves.
Logged as 'skip: low_hfr_unreliable' in the YAML stream.
3. Add kMinGainLevel = 3e-4: skip gain processing entirely when the loudest
32-subframe RMS is below this threshold. Near-silence bands produce
extreme relative ratios from tiny absolute spikes, generating gain curves
that worsen reconstruction noise rather than reducing pre-echo.
Riddler (10 s): pre-echo worse 6 → 5/479, mean SNR gain 22.9 → 23.5 dB.
Spine (30 s): pre-echo worse 107 → 26/1804, mean SNR gain 20.3 → 20.6 dB.
Noise flashes remain 0 on both tracks. All 18 unit tests pass.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Inspired by edge264's YAML logging approach (FOSDEM 2026): emit every
gain control decision as a self-contained YAML stream so any frame can
be grep'd, analyzed with Python, or used to craft custom test curves.
./atracdenc -e atrac3 -i in.wav -o out.oma --yaml-log gain.yaml
Each YAML document covers one frame (frame:/time: header) with channels
and bands nested below. Per-band fields emitted in pipeline order:
high_freq_ratio, overlap_ratio, dynamic_min_score, gain[], next_level
pcm_qmf (256 raw QMF samples, non-modulated, non-windowed)
curve_raw, rms_cur, rms_next_mod, point0_level, crossover
curve_final, max_gain, ratio, level_boost, scale_boost, total_boost
gain_boost (or skip: <reason> at each early-exit point)
New files / changes:
src/yaml_log.h - TYamlFmtGuard (stream format RAII),
YamlWriteFloatSeq (vector + ptr/len overloads)
src/atrac/at3/atrac3.h - YamlLog field in TAtrac3EncoderSettings
src/atrac3denc.h - FrameNum counter + YamlLog pointer on TAtrac3Encoder
src/atrac3denc.cpp - frame header in GetLambda, incremental YAML in
CreateSubbandInfo at every decision point
src/main.cpp - --yaml-log <file> CLI flag
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Bands 2-3 (>~11 kHz) carry little audible masking energy, so pre-echo
there is largely inaudible. Instead of modulating gain in those bands,
detect their transients and redirect the resulting bit boost to band 0
where it benefits audible-range reconstruction.
Also simplify per-band score thresholds to a single kMinScore=1.9
constant — with bands 2-3 no longer emitting gain curves, the
per-band distinction had no measurable effect.
Results vs previous commit:
riddler: 10/479 (unchanged)
spine: 59/1804 (was 61)
flashes: 0
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rewrites the ATRAC3 gain control pipeline to eliminate noise flashes
and substantially reduce pre-echo artifacts:
Transient detection & curve building (transient_detector.cpp/h):
- Replace legacy heuristic detector with ratio-scored DetectTransients:
score = peak/floor for rising (c/a) or falling (a/c) triplets, scored
and ranked, top-N kept, sorted by location
- Add explicit point0 derived from windowed-RMS match between bufCur and
the curve-modulated bufNext (CalcWindowedRmsAfterCurve)
- Replace RegionMax with RegionRMS for smoother region amplitude estimate
- Add per-band detection thresholds kMinScorePerBand[4] = {1.9,1.9,2.1,2.2}
- Dynamic minScore: scale threshold by min(1.5, max(1.0, overlapRatio)) to
suppress false-positive curves when previous frame dominates
- Scale constraint: curve[0].Level >= 3 to cap cross-frame amplification at 2x
Bit allocation (atrac3_bitstream.cpp/h):
- Add GainBoostPerBand[NumQMF] to TSingleChannelElement, computed in
CreateSubbandInfo and applied in CalcBitsAllocation
- levelBoost: compensate for Demodulate's GainLevel[minLevel] attenuation
- scaleBoost: compensate for next-frame cross-frame scale via lookahead
- Both capped (kLevelBoostCap=1, kScaleBoostCap=2) to avoid bit starvation
Upsampler (atrac3denc.cpp):
- Raise cutoff from 600 Hz to 800 Hz for tighter band separation
Tests (gain_processor_ut.cpp):
- Relax fixed curve shape assertions to ExpectCurveReasonable (bounds checks)
- Relax quantization error bound for dense-event spacing (<=128 samples)
Results (branch new_psy vs original baseline):
riddler: pre-echo worse 43->15/479 frames, 0 flashes
spine: pre-echo worse 192->107/1804 frames, 0 flashes
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
tracking
Encoder (atrac3denc.cpp / atrac3denc.h):
- Replace old TTransientParam / TransientParamsHistory with TSpectralUpsampler-
based CreateSubbandInfo(): analyses the upsampled QMF band, computes gain[]
and nextLevel from the contiguous look-ahead buffer, and calls CalcCurve to
build ATRAC3 gain-curve points.
- highFreqRatio guard: skip CalcCurve for sub-bass bands where the HPF signal
is too weak to produce meaningful gain control.
CalcCurve (transient_detector.cpp):
- Fix Issue 1 (FFT-window context mismatch at frame boundary):
Store ctx.LastLevel = in.back() instead of target (nextLevel).
in.back() and the next call's gain[0] are both analysis-domain estimates of
adjacent 8-sample blocks — no cross-domain FFT-window divergence that produced
false boundary transients.
- Guard against zero savedLastLevel (first frame or post-reset): return empty
curve rather than emitting scaleLevel=15 (GainLevel=1/2048) which would cause
extreme amplification in the gain modulator.
- Tighten gain-point budget to 7 (< MaxGainPointsNum=8) to match the 3-bit
count field in the ATRAC3 bitstream.
Tests (gain_processor_ut.cpp):
- Add BoundaryLevelMismatch suite: Issue1_FalseTransientOnConstantTone_AfterOnset,
Issue1_MdctRoundtrip_NoGain, Issue1_MdctRoundtrip_WithGain,
Issue1_RoundtripWithGainAndQuantization.
- Quantization test threshold set to 400× kQuantStep: correct two-point gain
curves for a 9:1 amplitude-ratio signal produce at most ~323× peak error
(scale×level=16 × ~8× IMDCT base noise); pathological false transients would
cause signal-level reconstruction errors well above this bound.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Add LookAheadBuf[2][4][640] per channel/band with layout
[prev_128 | current_256 | lookahead_256]; the first 512 samples
are a ready-made input for TSpectralUpsampler::Process()
- Return LOOK_AHEAD on first lambda call so the encoder always has
the next frame available when building gain curves
- QMF output goes directly into the appropriate slot (offset 128 or
384) avoiding any intermediate copy; advance via single memmove
- Remove float* in[4] from CreateSubbandInfo; source data now read
from upInput[band]+128 (current slot of LookAheadBuf)
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
CalcCurve (transient_detector.cpp/h):
- Recursive divide-and-conquer FindTransients scans the gain vector for
monotonic 3-subframe windows (rising or falling); kMinScore=2.0 filters
out oscillations smaller than a factor of 2 (no-op at Level 4).
- RelationToIdx maps an amplitude ratio to an ATRAC3 gain Level index.
- TCurveBuilderCtx carries LastLevel across frames; CalcCurve prepends it
as a virtual boundary element to detect Location=0 attacks.
- budget=8 matches ATRAC3 SubbandInfo::MaxGainPointsNum.
TSpectralUpsampler (transient_spectral_upsampler.cpp/h):
- Applies a Planck-taper window (ε=0.15) to a 512-sample context window,
forward-FFTs, applies a 3-bin raised-cosine HPF, zero-pads to 4096 bins,
and inverse-FFTs to give an 8× upsampled output.
- Returns highFreqRatio = Σ|X[k]·H[k]|²/Σ|X[k]|²; callers skip CalcCurve
when this is below kHighFreqThreshold=0.05, preventing false transients
from Planck noise-floor variation in sub-cutoff frames.
Tests:
- gain_processor_ut: upsampled-path blocks added to all FreqDomain tests;
CalcCurve negative tests (NegativeTests suite).
- transient_spectral_upsampler_ut: OutputSize, DCIsRemovedByLowCutFilter,
HighFreqSinePreservesRMS (parametrised), ChirpNoTransient (0→5510 Hz
sweep at 689 Hz low-cut, Len1024/16384/262144).
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Demonstrates handling a burst where the post-burst quiet level (A_after=2)
differs from the pre-burst quiet level (A_before=1).
Strategy: normalise to A_after using {{5,4},{2,12}}:
- scale = GainLevel[5] = 0.5 → amplifies quiet prefix ×2 to A_after
- Level=2 (GainLevel=4) attenuates burst ÷4 to A_after
- Remainder [104..255] already at A_after → untouched
Modulated bufNext is uniform A_after throughout → near-zero HF leakage
in both frame 1 and frame 2. No compensating gain needed. Release ramp
uses gainInc = 2^(-2/8) (GainLevel 4.0→1.0, 2-octave span to neutral).
Includes full TDAC round-trip verification.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Both tests now normalise to the dominant amplitude using the minimum
number of gain points, with no compensating gain needed in frame 2:
- AttackAndRelease (QUIET→LOUD→QUIET burst): switch from 3-point
{{7,4},{4,12},{7,31}} (normalise to A_loud, needed {{1,1}} frame 2
compensation) to 2-point {{4,4},{1,12}} (scale=1.0 leaves quiet
bufCur unchanged, loud burst attenuated ÷8 to A_quiet, quiet tail
falls in untouched remainder). Strengthen frame 2 assertion to ×10.
- ReleaseAndAttack (LOUD→QUIET→LOUD dip): new test with 2-point
{{4,4},{7,12}} (scale=1.0 leaves loud bufCur unchanged, quiet dip
amplified ×8 to A_loud, loud tail falls in untouched remainder).
Frame 2 is plain A_loud with no compensating gain.
Both include full TDAC round-trip verification.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Demonstrates that a QUIET->LOUD transient inside bufNext can be handled
in the current frame by amplifying the quiet prefix to match the loud
suffix, rather than deferring to the next frame.
Signal (3 frames):
frame 0: quiet (A=1) -- primes overlap
frame 1: quiet[0..55] + attack ramp[56..63] + loud[64..255]
frame 2: all loud (A=8) -- continuation
Gain: {{7, 7}} on frame 1
Level=7 -> GainLevel[7]=0.125 -> scale=0.125 (amplify x8)
Location=7 -> lastPos=56; constant [0..55] amplified x8 to A_loud
Transition [56..63]: level 0.125->1.0 at gainInc_atk=2^(3/8) rate;
signal pre-shaped with matching ramp so Modulate divides it out
exactly -> uniform A_loud*sin across the entire MDCT window.
Remainder [64..255]: loud signal untouched (already at A_loud). No
compensating gain needed on frame 2 -- loud bufCur and loud bufNext
are already matched.
Assertions:
EXPECT_LT(hf1_mod * 10, hf1_nomod) -- frame 1: HF leakage reduced >10x
EXPECT_LE(hf2_mod * 10, hf2_nomod) -- frame 2: no regression
Round-trip: Mdct(Modulate) -> Midct(Demodulate) recovers signal.
frame 1 Midct: Demodulate(siCur=empty, siNext={{7,7}})
frame 2 Midct: Demodulate(siCur={{7,7}}, siNext=empty)
Verified with EXPECT_NEAR(..., 1e-5) over frames 1 and 2.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
| | |\
| |/
|/| |
|
| | |
| |
| | |
Signed-off-by: Ronnie Sahlberg <[email protected]>
|