summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Drop FileModeToWide allow-listfix_input_file_pathDaniil Cherednik2026-05-031-40/+2
| | | | | | | | | Widen the ASCII fopen mode string with std::wstring's iterator-pair constructor instead of maintaining a hand-coded mapping. ASCII (0-127) codepoints are valid wchar_t code units on Windows, which is the only alphabet fopen modes use. Co-Authored-By: Claude Opus 4.7 <[email protected]>
* Clean up UTF-8 path handlingDaniil Cherednik2026-05-039-91/+188
| | | | Reuse the shared UTF-8 path helper in Media Foundation, normalize compressed output open errors, cover AEA encode/decode paths, and expand integration tests for ATRAC1 and decode filenames.
* Fix UTF-8 input and output pathsDaniil Cherednik2026-05-039-21/+368
| | | | Report libsndfile open failures before sample-rate validation, use UTF-16 Windows opens for PCM and compressed containers, and add integration tests for missing input plus UTF-8 input/output filenames.
* Package MSYS2 runtime artifact (#71)Daniil Cherednik2026-05-032-1/+142
|
* Add MSYS2 build support (#70)Daniil Cherednik2026-05-039-37/+143
| | | | | | | | | * Add MSYS2 build support Add a selectable PCM I/O backend so MSVC builds can keep Media Foundation while MSYS2/MinGW builds use libsndfile. Teach the libsndfile finder about MINGW_PREFIX and add a Windows MSYS2 CI job that builds the libsndfile backend. Fix and enable tests for MSYS2 builds
* Fix mono AT3 RIFF channel count. issues/67Daniil Cherednik2026-05-021-1/+1
| | | | | AT3 RIFF output always stores a two-channel ATRAC3 stream. Mono input is encoded as duplicated single channels or as joint stereo with an empty side channel.
* Initialize bit allocator LastLambda during the start allocation. Fix issues/69Daniil Cherednik2026-05-021-0/+1
|
* Merge pull request #66 from dcherednik/new_psyDaniil Cherednik2026-04-208-18/+581
|\ | | | | | | | | Conservative initial implementation of ATRAC3 tonal extraction. This is a safe first step tuned for stability and bitstream compatibility, not maximum aggressiveness. It gives the most benefit on synthetic signals and material with strong, steady pure tones (for example simple electronic leads and solo tonal instruments).
| * atrac3: reimplement tonal encoding. Use flatness-based tonal extractionnew_psyDaniil Cherednik2026-04-208-18/+581
| | | | | | | | | | | | | | | | | | | | | | | | | | - Add shared CalcSpectralFlatnessPerBfu helper in atrac_psy_common with BFU-table mapping. - Implement ATRAC3 tonal extraction: compute MDCT energy, estimate per-BFU flatness, extract up to 5-bin strongest tonal run in low-flatness BFUs, and zero extracted bins in residual. - Map extracted tonal bins into TTonalBlocks and integrate them into bitstream coding. - Update ATRAC3 bit allocation - reduce residual bits for BFUs with tonal blocks, and increase tonal quantizer selection. - Restore --notonal CLI option in main.cpp for A/B comparison.
* | at3: write fact chunk, fix bytes_per_frame and chunk sizeshilman22026-04-171-5/+50
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AT3-in-WAV writer produces headers that Sony's psp_at3tool rejects for files longer than around forty seconds. The tool prints "input file is illegal file or over 2G Byte" and refuses to decode. ffmpeg accepts the file but decodes it without any encoder-delay compensation, leaving a variable lag of several hundred samples relative to the source. The two observations have a common root cause: the header we write is missing fields that downstream decoders rely on. This patch addresses three concrete issues in src/at3.cpp. First, the writer emits no fact chunk. The fact chunk is optional in the general RIFF specification but is how WAVEFORMATEX based codecs announce the total number of decoded samples per channel. psp_at3tool uses the sample count together with samples-per-frame to decide how much PCM to produce and where to stop. Without a fact chunk the tool falls back to a short default and either truncates output or, for longer streams, rejects the file outright. ffmpeg uses the same field to skip encoder priming samples. Sony's own AT3 files carry this chunk with a fixed eight byte payload containing total_samples and samples_per_frame. We now write the same structure. Second, the bytes_per_frame field in the ATRAC3 extradata was hardcoded to 0x10 with an XXX comment. The correct value for standard ATRAC3 is 0x1000, that is 4096, which corresponds to the PCM bytes represented by one frame (1024 samples per channel times two channels times two bytes per sample). Sony's encoder writes 4096 at this offset and both ffmpeg and psp_at3tool validate against that number. The previous value of sixteen bytes per frame is nonsensical and was part of why psp_at3tool misestimated the playback length. Third, the RIFF chunk_size field was being written as the full file size. By the RIFF specification this field should hold the size of everything that follows the field itself, that is file_size minus eight. Writing the full size is tolerated by ffmpeg but violates the specification and makes the file look larger than it is to strict parsers. Because the PCM engine can flush additional frames after the initially estimated numFrames count (due to look-ahead tail during encoding), the three length fields chunk_size, total_samples, and subchunk2_size were stale by one to three frames relative to the actual data on disk. To keep them consistent, TAt3 now counts frames as WriteFrame is called and seeks back to overwrite the three length fields in the destructor, so the final file describes its real contents. The patch is purely a container metadata fix. The encoded AT3 payload is byte-identical to before. After this change, output from atracdenc for long test tracks (90 and 186 seconds, 132 kbps LP2) is accepted and fully decoded by psp_at3tool in a single pass, and ffmpeg decodes with a constant small codec latency instead of the previous variable drift. This made it possible to run a proper triple comparison against Sony's reference encoder, which previously looked catastrophic (gap around -22 dB SNR) purely due to the alignment problem but sits at roughly -0.5 to -1.4 dB SNR once the container headers are correct. Signed-off-by: hilman2 <[email protected]>
* [AT3] refactor bitstream allocation loop to bs_encodeDaniil Cherednik2026-04-112-267/+309
|
* Remove dead transient hooks, hack override, and unused RMS helperDaniil Cherednik2026-04-093-143/+1
|
* Merge new_psy -> masterDaniil Cherednik2026-04-0930-386/+7310
|\ | | | | New experimental gain control code for atrac3
| * Fix Windows C++17 build and MSVC flag warningsDaniil Cherednik2026-04-092-9/+11
| |
| * atrac3: add boundary transient thresholding to prune low-value gain transitionsDaniil Cherednik2026-04-081-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem Gain curve generation emitted many +/-1 level transitions that do not correspond to strong local transients. These points consume gain-info bits and can create low-level modulation artifacts without improving transient handling. Solution Introduce explicit transient evidence gating at transition boundaries in CalcCurve(), and wire it to the existing dynamic min-score path. What changed - Added BoundaryTransientScore(env, loc, win): - computes local ratio around each subframe boundary - R = max(max_right/max_left, max_left/max_right) - short symmetric window (win=3 subframes) - Re-enabled minScore usage in CalcCurve() (previously ignored). - For each level transition candidate at loc=sf+1: - keep unconditionally if loc==targetSf (tail neutral anchor) - keep unconditionally if |deltaLevel| >= 2 (strong step) - otherwise keep only if BoundaryTransientScore(loc) >= minScore - Added YAML telemetry: - transient_min_score - transient_window - transition_pruned {loc, delta, score} Why this is safe - Strong transitions are preserved. - Rightmost transition is preserved to keep proper return-to-neutral anchoring. - Only low-confidence small toggles are removed. Measured impact (current branch comparison) Baseline: ea4d33b38 (before this change) Tracks: show_me_your_spine.wav, 13.wav Gain-info bits / points: - spine: 191,697 -> 150,297 bits (delta -41,400; -21.6%) 15,593 -> 10,993 points (delta -4,600) - 13.wav: 1,299,035 -> 979,931 bits (delta -319,104; -24.6%) 97,035 -> 61,579 points (delta -35,456) Subjective note User listening reports improved sound and fixes for some low-level artifacts.
| * atrac3: remove band3 transient boost redirection to band0Daniil Cherednik2026-04-081-6/+2
| |
| * atrac3: make sticky gain quantization conditional and tune thresholdsDaniil Cherednik2026-04-081-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem The distribution-aware sticky quantizer reduced gain-curve bitrate, but in some release/transient frames it over-merged nearby transitions. On spine around 17.657s (ch1/band2), this collapsed the curve shape and could produce an audible spike. What changed - Added frame-level sticky eligibility gating in CalcCurve(). - Sticky is now enabled only when both conditions hold: - intra-frame ratio is limited: max_gain / target <= kStickyMaxIntraFrameRatio - inter-frame target jump is limited: prev_target / target (symmetric) <= kStickyMaxInterFrameRatio - Added local uncertainty guard for sticky hold: - require idx span from [subframeLow, subframeHigh] quantization to be narrow (idxSpan <= 1) before allowing prev-level hold. - Added YAML diagnostics per band/frame to make gating decisions auditable: - sticky_frame_eligible - sticky_intra_ratio - sticky_inter_ratio Threshold tuning Swept candidate pairs on both tracks: - show_me_your_spine.wav - 13.wav Pairs tested: (5,6), (5,8), (6,8), (6,10), (7,8), (7,10), (8,12) Selected: - kStickyMaxIntraFrameRatio = 7.0 - kStickyMaxInterFrameRatio = 10.0 Reason for selection - Keeps safety behavior on known failure site: frame 760, ch1, band2 remains sticky_frame_eligible=false and retains non-collapsed curve shape (loc 1,2,5,7). - Improves gain-modulation bitrate vs previous 6/8 tuning while avoiding fully open behavior. Measured gain-modulation bits (spine + 13.wav) - 6/8: 1,493,639 bits - 7/10: 1,490,732 bits (selected, -2,907 bits vs 6/8) - 8/12: 1,488,824 bits (lowest in sweep; not selected to keep extra margin)
| * atrac3: add distribution-aware sticky gain quantizationDaniil Cherednik2026-04-083-10/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem Gain curve construction still produced many +/-1 level toggles across long runs (e.g. 7<->8 chatter). These transitions are usually quantization noise from subframe-level RMS rounding, not real envelope changes, and they consume gain bit budget without improving transient protection. Feature Introduce distribution-aware sticky quantization for subframe gain levels. Instead of quantizing only the subframe centre estimate, we also track a robust within-subframe range and suppress one-step toggles when the previous level is still consistent with that range. Implementation 1) AnalyzeGain now optionally returns per-subframe low/high energy estimates (robust inter-quantile bounds from micro-chunk analysis inside each subframe). 2) CalcCurve now accepts optional subframe low/high vectors. 3) During sfLevel quantization: - compute centre level via RelationToIdx(filtered/target) - if new level differs from previous by exactly 1, and previous level is still inside [idx(low), idx(high)], keep previous level (sticky hold) 4) CreateSubbandInfo wires the new AnalyzeGain outputs into CalcCurve. 5) Existing point0 guard/boundary logic remains intact; this feature operates earlier at sfLevel formation. Why this is safe - Only suppresses +/-1 oscillation when previous level is still supported by observed subframe distribution. - Does not clamp large transitions or remove structurally important points. - Keeps curve scan/priority flow unchanged after sfLevel is formed. Measured impact on current HEAD (gain-info bits) Bit accounting uses ATRAC3 gain syntax: per channel header + per band point-count fields + 9 bits per gain point. show_me_your_spine.wav: - base: 219,552 bits (18,688 points) - with sticky: 172,158 bits (13,422 points) - saved: 47,394 bits, 5,266 points (-21.59% gain-info bits) 13.wav: - base: 1,537,724 bits (123,556 points) - with sticky: 1,146,746 bits (80,114 points) - saved: 390,978 bits, 43,442 points (-25.43% gain-info bits)
| * atrac3: make point0 guard boundary-aware to avoid overlap artifactsDaniil Cherednik2026-04-081-1/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem The current point0 guard decides keep/revert only from an early-frame mismatch score. That can revert a newly inserted point0 even when it is needed for frame-boundary continuity. In ATRAC3 demodulation, the next frame's first gain level is reused as a scale term for the overlap region, so removing point0 can change boundary scale by multiple quantization steps and create audible artifacts. Root cause For frames like 13.wav around 45.1s, point0_guard reverted point0 in key bands, which changed first-point scale and increased boundary mismatch despite a locally better early-fit score. Change - Keep existing early mismatch metric (fit + leakage proxy). - Add boundary-aware keep criterion inside point0_guard: * compute desired boundary scale in the same HPF domain: desiredScale = LimitRel(prevTarget / hpfRmsNextMod) * compare log2 distance of first-level scale before/after point0 insertion * if point0 reduces boundary error by a material margin (0.2 bits), force keep even when early-fit score slightly worsens - Apply guard only when point0 actually changes the curve. - Add YAML telemetry for boundary error before/after to support analysis. Implementation details - Added helper utilities to reconstruct subframe-average divisors from curve points and score early mismatch consistently. - Updated point0 insertion/update flow to track whether point0 changed. - Extended guard decision to combine: * early mismatch tolerance (existing behavior), and * boundary continuity improvement (new behavior). Observed effect (focused check) - On 13 clip (~45.1s), exact bad subframe (t=45.103129s, frame256=7769, sf32=23): ratio vs no-gain reduced from 9.30x to 1.51x after this change. - Frame 1942 YAML now shows point0 kept in bands where boundary error drops substantially. Notes - No full regression run in this commit (intentional for fast iteration).
| * atrac3: allow to configure median filter during gain curve calculationDaniil Cherednik2026-04-081-26/+22
| |
| * atrac3: trim redundant point0 and skip point0 on band 3Daniil Cherednik2026-04-081-1/+9
| |
| * atrac3: Prefer largest locations of gain curve points.Daniil Cherednik2026-04-081-2/+6
| |
| * atrac3: use in.back() as staircase target instead of nextLevelDaniil Cherednik2026-04-081-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For non-plateau frames, nextLevel (first lookahead subframe of the next frame) can be 6× higher than in.back() on release frames. Using it as the staircase target caused tail subframes to appear below target → spurious amplifying points (e.g. {level:7, loc:31}) on release tails, and underestimated ATT on the peak (33× ratio reduced to 5× because the wrong target inflated the denominator). Fix: always use in.back() (actual last subframe of the analysis window) as the staircase target. That is where the signal truly returns to within this frame. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: replace CalcCurve with staircase level-scan algorithmDaniil Cherednik2026-04-082-54/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replaces the monotone-triplet transient detector with a level-based staircase scan that builds the gain curve from the target subframe leftward. The new algorithm correctly handles rising transients by attenuating the loud peak region rather than the quiet onset. Key changes: - 3-point median filter on gain[] suppresses isolated spikes - Per-sf level = RelationToIdx(filtered[sf] / target) - Scan leftward from first-neutral-sf, emit one point per level change - Priority trim: keep up to 6 points with largest |ΔLevel| first Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: fix gc_scale at loc=0 using prev_target/target ratioDaniil Cherednik2026-04-081-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the first detected transient is at location 0, the CalcCurve loop computed level = RelationToIdx(in[0]/target). But in[0] is the ramp START — for loc=0 there is no pre-ramp region, so in[0] is not the right amplitude reference for gc_scale, which divides ALL of bufCur (the previous frame's MDCT window). The external point0 block derives its formula as: hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] ≈ target point0Level = RelationToIdx(prevTarget / hpfRmsNextMod) ≈ prevTarget/target For loc=0, hpfRmsNextModValid is false and the block cannot fire. Override curve[0].Level inside CalcCurve with RelationToIdx(savedPrevTarget/target) — consistent with the loc>0 formula and correctly bridges cross-frame amplitude. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: point0 in HPF domain using prevTarget/hpfRmsNextMod; enable band 2Daniil Cherednik2026-04-083-37/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Point0 calculation switched from raw-PCM RMS ratio to HPF-filtered domain: - prevTarget (stored as ctx.LastTarget from previous CalcCurve call) replaces rmsCur - hpfRmsNextMod = mean(gain[0..loc-1]) / GainLevel[pts[0].Level] replaces rmsNextMod (only the pre-ramp constant-level zone, same domain as gain[]) - TCurveBuilderCtx gains LastTarget field; CalcCurve stores target before returning Also extends gain control to band 2 (~11–16 kHz) by changing the skip threshold from band >= 2 to band >= 3. Perceptually sounds better; regression metrics worsen due to broadband measurement not capturing per-band HF improvement. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: log plateau result and target source inside CalcCurveDaniil Cherednik2026-04-083-3/+17
| | | | | | | | | | | | | | | | | | | | Pass yamlLog into CalcCurve so plateau_level, plateau_max_raw, plateau_release, and target/source are emitted directly from the function that computes them, instead of via stale TCurveBuilderCtx fields. Remove LastTarget and LastTargetFromPlateau from the context struct entirely. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: plateau-based target selection in CalcCurveDaniil Cherednik2026-04-081-5/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduces FindPlateau() which finds the maximum sustained amplitude level where at least 3 consecutive subframes exceed it (sliding-window minimum approach). The plateau is used as the CalcCurve target instead of nextLevel when the frame contains a genuine sustained peak that does not end in a release. Release detection uses two conditions: - Hard tail: last subframe < 10% of plateau (clear ring-down) - Soft tail: last subframe < 50% of plateau with no post-plateau recovery above 70% A MaxRaw guard (plateau >= 40% of frame peak) prevents the quiet noise floor from being mistaken for a plateau when the frame contains a much louder transient spike. When plateau is used as target, pre-plateau quiet regions produce AMP curves normalizing toward the sustained peak, reducing the extreme ATT levels that were causing post-echo artifacts. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: remove aggressive_suppressed heuristic and level clampingDaniil Cherednik2026-04-081-49/+0
| | | | | | | | | | | | | | | | | | | | | | | | Removes the block that suppressed curves with Level<=2 when ratio was below 10x or overlap was high, and the soft-min level clamping that raised extreme levels to 3. Also removes the scale constraint that forced curve[0].Level >= 3. These heuristics degraded sound quality. The gain curve analysis should produce correct levels directly rather than post-hoc clamping. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: suppress gain curves on near-silent frames (maxGain < 1e-4)Daniil Cherednik2026-04-081-0/+14
| | | | | | | | | | | | | | | | | | Firing CalcCurve on noise-floor content wastes bitrate and can produce extreme Level values when the target amplitude is tiny. No regressions: spine 22/1804 pre-echo, riddler 9/479, 0 flashes. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: skip point0 insertion when level is neutral and no other points existDaniil Cherednik2026-04-081-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A loc=0 point with level=4 (GainLevel=1.0, neutral) carries no information when it would be the only curve point — the decoder uses gc_scale=1.0 for frames with no curve, which is identical. Inserting it wastes bitstream entries and leaves fewer bits for spectral quantization. The guard is necessary only when other (non-loc-0) points exist: without a neutral anchor, the decoder reads gc_scale from the first non-zero-loc point (e.g. L2@10 → gc_scale=4× on the previous frame's OLA overlap). Result: 0 regressions, marginal post-echo improvement (spine 38→37, riddler 6→5). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: remove delay and soft-cap heuristics made redundant by point0Daniil Cherednik2026-04-081-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two heuristics that were workarounds for the old (broken) window domain are now redundant since point0 is computed correctly at the end of the band loop using the MDCT-input-domain energy ratio: 1. "Delay early attack points when overlap dominates" — shifted curvePoints[0].Location +2 when hpfOverlapRatio > 0.9. point0 now inserts at Location=0 afterward anyway, making the location shift irrelevant to the frame boundary correction. 2. "Soft-cap first point level under overlap dominance" — forced curvePoints[0].Level = 4 (neutral) when hpfOverlapRatio > 0.9. Was suppressing legitimate CalcCurve attenuation; point0 computes the correct level from rmsCur/rmsNextMod directly. Result vs previous commit: spine: post-echo 40→38/1804 riddler: pre-echo 11→9/479 0 noise flashes, SNR unchanged. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: fix window domain, move point0 after curve, remove dead codeDaniil Cherednik2026-04-081-104/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three correctness fixes: 1. Fix MDCT-domain window mismatch in point0 energy computation. bufCur already has EncodeWindow[i] baked in from the previous frame's MDCT prep; applying 2*DecodeWindow a second time caused a systematic ~1.58x bias in rmsCur vs rmsNextMod for stationary signals. Fix: use bufCur[i] directly for rmsCur; use EncodeWindow[255-pos] in CalcWindowedRmsAfterCurve so both sides are in the same MDCT-input domain. 2. Move point0 insertion to after all gain curve modifications. Previously point0 was computed before aggressive_suppressed gate, softMin floor, and scale constraint could clear/raise curvePoints, making the point0 level stale. Now point0 is inserted at the very end of the band loop using the final curvePoints state. Also remove the early-return guard (frameEndLevel > 1e-6f) on scaleBoost: silent-ending frames need the lookahead correction most. 3. Remove dead crossover logic. The crossover block (rmsCurSub/rmsNextSub comparison) used 2*DecodeWindow on bufCur which already has EncodeWindow baked in, making rmsCurSub[0] ≈ 0 always. The guard (crossover >= 2) never fired. Confirmed dead via YAML log analysis across all test tracks. Result vs baseline (c040b03, branch new_psy_cont): spine: pre-echo 45→22/1804, post-echo 58→41/1804, SNR +0.5 dB riddler: pre-echo 11→10/479, post-echo 6/479 unchanged 0 noise flashes maintained on all tracks. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: emit amplifying gain curves for rising transientsDaniil Cherednik2026-04-081-41/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For quiet-to-loud transients, CalcCurve produces curve points with level >= 4 (GainLevel <= 1), meaning Modulate amplifies the quiet prefix of bufNext. The unit test GainModulation_ReducesSpectralEnergy_ QuietToLoudTransient proves this reduces HF spectral leakage by 10x by smoothing the step function seen by the MDCT. Previously these curves were discarded by a !anyActive guard and only a p0-only cross-frame correction was applied. Now they fall through to the normal processing path, with two guards: 1. kMaxAmplifyLevel=7 cap: limits amplification to 8x max. 2. kMinHfrForAmplify=0.3 threshold: skips amplifying curves when HFR is low. Low HFR means HPF gain[] doesn't represent the full-band signal: a tiny HPF transient can produce level 9 (32x amplification) on a loud full-band signal, catastrophically over-inflating MDCT coefficients and exhausting the bit budget (causing noise flashes). Also precomputes anyAttenuating before point0 insertion so it correctly reflects the CalcCurve output. Results vs baseline (new_psy branch): spine: 55->45/1804 worse pre-echo (-18%), SNR 20.9->21.9 dB (+1.0) riddler: 10->11/479 worse pre-echo, SNR 23.7->24.2 dB (+0.5) 0 noise flashes on all tracks. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: use HPF-domain overlap ratio for transient suppression decisionsDaniil Cherednik2026-04-082-8/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | overlapRatio (full-band PCM energy) was used to raise dynamicMinScore, suppress aggressive curves, delay early-attack points, and soft-cap point0 level. This caused bass-heavy previous frames to suppress legitimate HPF-domain transients in the current frame — the two domains are only loosely correlated. Replace with hpfOverlapRatio = mean(prevGain[]) / mean(curGain[]), both in the HPF-upsampled analysis domain, stored per-band per-channel in TCurveBuilderCtx::LastHpfEnergy. Full-band overlapRatio is retained for the kLowOverlapRelax "attack frame" checks (overlapRatio < 0.6) and for YAML logging. Regression (0 flashes maintained): riddler: pre-echo worse 12→10/479, median SNR 25.1→25.4 dB spine: pre-echo worse 65→55/1804, median SNR unchanged Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: use min(p0Level, 7) instead of threshold guard for p0-onlyDaniil Cherednik2026-04-081-3/+5
| | | | | | | | | | | | | | | | | | The threshold approach (skip if p0Level > 7) and the clamp approach (min(p0Level, 7)) are now equivalent since p0Level is computed from plain windowed RMS of bufNext rather than the inflated rmsNextMod. Any genuine Level>7 case (bufCur ≤ 1/8 of bufNext) gets capped to Level 7 (8× amplification) instead of being skipped, providing at least partial cross-frame normalisation.
| * atrac3: emit p0-only curve for quiet-to-loud frame boundaryDaniil Cherednik2026-04-081-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the no_active_points skip path, when CalcCurve detected a mild transient but all adjusted levels ended up >= 4 (no attenuation needed), the existing code discarded the entire curve including the computed p0. If bufCur is significantly quieter than bufNext (Level 5-7 range), emitting a p0-only curve lets Modulate amplify the quiet bufCur before MDCT, normalising the cross-frame energy step that causes HF leakage (pre-echo). Key details: - Recompute the ratio against plain windowed RMS of bufNext rather than rmsNextMod. Using rmsNextMod would inflate p0Level when the discarded curve points have Level>4, causing over-correction. - Cap at Level 7 (8x amplification) to avoid bit allocation distortion from amplifying extremely quiet frames. Results on test tracks (baseline: 0 flashes, 107/1804 spine pre-echo): spine: 0 flashes, 64/1804 pre-echo (was 107, -40%) riddler: 0 flashes, 12/479 pre-echo (was 6, slight increase) SNR: 20.8 dB spine / 23.5 dB riddler (unchanged)
| * Revert "atrac3: add HFR/silence gates to reduce spurious gain curves"Daniil Cherednik2026-04-081-56/+2
| | | | | | | | This reverts commit b9afb2b09446e6d322421554234c33b17d138e51.
| * atrac3: add HFR/silence gates to reduce spurious gain curvesDaniil Cherednik2026-04-081-2/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three complementary guards based on YAML-log analysis of pre-echo frames: 1. Raise kHfrRef 0.30 → 0.50: the HFR-proportional minScore scale now kicks in earlier, suppressing candidates with score < 5.7× on frames where the spectral upsampler is operating below half its reference HFR. 2. Add kMinReliableHfr = 0.12: hard skip for bands where HFR is so low that even the scaled minScore cannot filter all spurious multi-point curves. Logged as 'skip: low_hfr_unreliable' in the YAML stream. 3. Add kMinGainLevel = 3e-4: skip gain processing entirely when the loudest 32-subframe RMS is below this threshold. Near-silence bands produce extreme relative ratios from tiny absolute spikes, generating gain curves that worsen reconstruction noise rather than reducing pre-echo. Riddler (10 s): pre-echo worse 6 → 5/479, mean SNR gain 22.9 → 23.5 dB. Spine (30 s): pre-echo worse 107 → 26/1804, mean SNR gain 20.3 → 20.6 dB. Noise flashes remain 0 on both tracks. All 18 unit tests pass. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: add YAML gain control debug logging (--yaml-log)Daniil Cherednik2026-04-085-3/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inspired by edge264's YAML logging approach (FOSDEM 2026): emit every gain control decision as a self-contained YAML stream so any frame can be grep'd, analyzed with Python, or used to craft custom test curves. ./atracdenc -e atrac3 -i in.wav -o out.oma --yaml-log gain.yaml Each YAML document covers one frame (frame:/time: header) with channels and bands nested below. Per-band fields emitted in pipeline order: high_freq_ratio, overlap_ratio, dynamic_min_score, gain[], next_level pcm_qmf (256 raw QMF samples, non-modulated, non-windowed) curve_raw, rms_cur, rms_next_mod, point0_level, crossover curve_final, max_gain, ratio, level_boost, scale_boost, total_boost gain_boost (or skip: <reason> at each early-exit point) New files / changes: src/yaml_log.h - TYamlFmtGuard (stream format RAII), YamlWriteFloatSeq (vector + ptr/len overloads) src/atrac/at3/atrac3.h - YamlLog field in TAtrac3EncoderSettings src/atrac3denc.h - FrameNum counter + YamlLog pointer on TAtrac3Encoder src/atrac3denc.cpp - frame header in GetLambda, incremental YAML in CreateSubbandInfo at every decision point src/main.cpp - --yaml-log <file> CLI flag Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: skip gain modulation for bands 2-3, redirect bit boost to band 0Daniil Cherednik2026-03-151-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bands 2-3 (>~11 kHz) carry little audible masking energy, so pre-echo there is largely inaudible. Instead of modulating gain in those bands, detect their transients and redirect the resulting bit boost to band 0 where it benefits audible-range reconstruction. Also simplify per-band score thresholds to a single kMinScore=1.9 constant — with bands 2-3 no longer emitting gain curves, the per-band distinction had no measurable effect. Results vs previous commit: riddler: 10/479 (unchanged) spine: 59/1804 (was 61) flashes: 0 Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * atrac3: ratio-scored transients, per-band gain boost, pre-echo reductionDaniil Cherednik2026-03-157-205/+587
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrites the ATRAC3 gain control pipeline to eliminate noise flashes and substantially reduce pre-echo artifacts: Transient detection & curve building (transient_detector.cpp/h): - Replace legacy heuristic detector with ratio-scored DetectTransients: score = peak/floor for rising (c/a) or falling (a/c) triplets, scored and ranked, top-N kept, sorted by location - Add explicit point0 derived from windowed-RMS match between bufCur and the curve-modulated bufNext (CalcWindowedRmsAfterCurve) - Replace RegionMax with RegionRMS for smoother region amplitude estimate - Add per-band detection thresholds kMinScorePerBand[4] = {1.9,1.9,2.1,2.2} - Dynamic minScore: scale threshold by min(1.5, max(1.0, overlapRatio)) to suppress false-positive curves when previous frame dominates - Scale constraint: curve[0].Level >= 3 to cap cross-frame amplification at 2x Bit allocation (atrac3_bitstream.cpp/h): - Add GainBoostPerBand[NumQMF] to TSingleChannelElement, computed in CreateSubbandInfo and applied in CalcBitsAllocation - levelBoost: compensate for Demodulate's GainLevel[minLevel] attenuation - scaleBoost: compensate for next-frame cross-frame scale via lookahead - Both capped (kLevelBoostCap=1, kScaleBoostCap=2) to avoid bit starvation Upsampler (atrac3denc.cpp): - Raise cutoff from 600 Hz to 800 Hz for tighter band separation Tests (gain_processor_ut.cpp): - Relax fixed curve shape assertions to ExpectCurveReasonable (bounds checks) - Relax quantization error bound for dense-event spacing (<=128 samples) Results (branch new_psy vs original baseline): riddler: pre-echo worse 43->15/479 frames, 0 flashes spine: pre-echo worse 192->107/1804 frames, 0 flashes Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * Integrate TSpectralUpsampler into ATRAC3 gain control and fix CalcCurve ctx ↵Daniil Cherednik2026-03-084-140/+637
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tracking Encoder (atrac3denc.cpp / atrac3denc.h): - Replace old TTransientParam / TransientParamsHistory with TSpectralUpsampler- based CreateSubbandInfo(): analyses the upsampled QMF band, computes gain[] and nextLevel from the contiguous look-ahead buffer, and calls CalcCurve to build ATRAC3 gain-curve points. - highFreqRatio guard: skip CalcCurve for sub-bass bands where the HPF signal is too weak to produce meaningful gain control. CalcCurve (transient_detector.cpp): - Fix Issue 1 (FFT-window context mismatch at frame boundary): Store ctx.LastLevel = in.back() instead of target (nextLevel). in.back() and the next call's gain[0] are both analysis-domain estimates of adjacent 8-sample blocks — no cross-domain FFT-window divergence that produced false boundary transients. - Guard against zero savedLastLevel (first frame or post-reset): return empty curve rather than emitting scaleLevel=15 (GainLevel=1/2048) which would cause extreme amplification in the gain modulator. - Tighten gain-point budget to 7 (< MaxGainPointsNum=8) to match the 3-bit count field in the ATRAC3 bitstream. Tests (gain_processor_ut.cpp): - Add BoundaryLevelMismatch suite: Issue1_FalseTransientOnConstantTone_AfterOnset, Issue1_MdctRoundtrip_NoGain, Issue1_MdctRoundtrip_WithGain, Issue1_RoundtripWithGainAndQuantization. - Quantization test threshold set to 400× kQuantStep: correct two-point gain curves for a 9:1 amplitude-ratio signal produce at most ~323× peak error (scale×level=16 × ~8× IMDCT base noise); pathological false transients would cause signal-level reconstruction errors well above this bound. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * Add one-frame look-ahead to ATRAC3 encoder with contiguous upsampler bufferDaniil Cherednik2026-03-072-12/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add LookAheadBuf[2][4][640] per channel/band with layout [prev_128 | current_256 | lookahead_256]; the first 512 samples are a ready-made input for TSpectralUpsampler::Process() - Return LOOK_AHEAD on first lambda call so the encoder always has the next frame available when building gain curves - QMF output goes directly into the appropriate slot (offset 128 or 384) avoiding any intermediate copy; advance via single memmove - Remove float* in[4] from CreateSubbandInfo; source data now read from upInput[band]+128 (current slot of LookAheadBuf) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * Add CalcCurve and TSpectralUpsampler for transient detectionDaniil Cherednik2026-03-068-1/+2093
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CalcCurve (transient_detector.cpp/h): - Recursive divide-and-conquer FindTransients scans the gain vector for monotonic 3-subframe windows (rising or falling); kMinScore=2.0 filters out oscillations smaller than a factor of 2 (no-op at Level 4). - RelationToIdx maps an amplitude ratio to an ATRAC3 gain Level index. - TCurveBuilderCtx carries LastLevel across frames; CalcCurve prepends it as a virtual boundary element to detect Location=0 attacks. - budget=8 matches ATRAC3 SubbandInfo::MaxGainPointsNum. TSpectralUpsampler (transient_spectral_upsampler.cpp/h): - Applies a Planck-taper window (ε=0.15) to a 512-sample context window, forward-FFTs, applies a 3-bin raised-cosine HPF, zero-pads to 4096 bins, and inverse-FFTs to give an 8× upsampled output. - Returns highFreqRatio = Σ|X[k]·H[k]|²/Σ|X[k]|²; callers skip CalcCurve when this is below kHighFreqThreshold=0.05, preventing false transients from Planck noise-floor variation in sub-cutoff frames. Tests: - gain_processor_ut: upsampled-path blocks added to all FreqDomain tests; CalcCurve negative tests (NegativeTests suite). - transient_spectral_upsampler_ut: OutputSize, DCIsRemovedByLowCutFilter, HighFreqSinePreservesRMS (parametrised), ChirpNoTransient (0→5510 Hz sweep at 689 Hz low-cut, Len1024/16384/262144). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * Add AttackAndRelease_LevelRise gain modulation testDaniil Cherednik2026-02-271-0/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Demonstrates handling a burst where the post-burst quiet level (A_after=2) differs from the pre-burst quiet level (A_before=1). Strategy: normalise to A_after using {{5,4},{2,12}}: - scale = GainLevel[5] = 0.5 → amplifies quiet prefix ×2 to A_after - Level=2 (GainLevel=4) attenuates burst ÷4 to A_after - Remainder [104..255] already at A_after → untouched Modulated bufNext is uniform A_after throughout → near-zero HF leakage in both frame 1 and frame 2. No compensating gain needed. Release ramp uses gainInc = 2^(-2/8) (GainLevel 4.0→1.0, 2-octave span to neutral). Includes full TDAC round-trip verification. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * Refactor AttackAndRelease and add ReleaseAndAttack gain modulation testsDaniil Cherednik2026-02-271-80/+266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both tests now normalise to the dominant amplitude using the minimum number of gain points, with no compensating gain needed in frame 2: - AttackAndRelease (QUIET→LOUD→QUIET burst): switch from 3-point {{7,4},{4,12},{7,31}} (normalise to A_loud, needed {{1,1}} frame 2 compensation) to 2-point {{4,4},{1,12}} (scale=1.0 leaves quiet bufCur unchanged, loud burst attenuated ÷8 to A_quiet, quiet tail falls in untouched remainder). Strengthen frame 2 assertion to ×10. - ReleaseAndAttack (LOUD→QUIET→LOUD dip): new test with 2-point {{4,4},{7,12}} (scale=1.0 leaves loud bufCur unchanged, quiet dip amplified ×8 to A_loud, loud tail falls in untouched remainder). Frame 2 is plain A_loud with no compensating gain. Both include full TDAC round-trip verification. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * Add GainModulation_ReducesSpectralEnergy_QuietToLoudTransient testDaniil Cherednik2026-02-261-0/+202
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Demonstrates that a QUIET->LOUD transient inside bufNext can be handled in the current frame by amplifying the quiet prefix to match the loud suffix, rather than deferring to the next frame. Signal (3 frames): frame 0: quiet (A=1) -- primes overlap frame 1: quiet[0..55] + attack ramp[56..63] + loud[64..255] frame 2: all loud (A=8) -- continuation Gain: {{7, 7}} on frame 1 Level=7 -> GainLevel[7]=0.125 -> scale=0.125 (amplify x8) Location=7 -> lastPos=56; constant [0..55] amplified x8 to A_loud Transition [56..63]: level 0.125->1.0 at gainInc_atk=2^(3/8) rate; signal pre-shaped with matching ramp so Modulate divides it out exactly -> uniform A_loud*sin across the entire MDCT window. Remainder [64..255]: loud signal untouched (already at A_loud). No compensating gain needed on frame 2 -- loud bufCur and loud bufNext are already matched. Assertions: EXPECT_LT(hf1_mod * 10, hf1_nomod) -- frame 1: HF leakage reduced >10x EXPECT_LE(hf2_mod * 10, hf2_nomod) -- frame 2: no regression Round-trip: Mdct(Modulate) -> Midct(Demodulate) recovers signal. frame 1 Midct: Demodulate(siCur=empty, siNext={{7,7}}) frame 2 Midct: Demodulate(siCur={{7,7}}, siNext=empty) Verified with EXPECT_NEAR(..., 1e-5) over frames 1 and 2. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| * Merge branch 'master' into new_psyDaniil Cherednik2026-02-233-0/+3
| |\ | |/ |/|
* | Compile fix for missing cstdint (#54)Ronnie Sahlberg2025-11-123-0/+3
| | | | | | Signed-off-by: Ronnie Sahlberg <[email protected]>