Page 1 of 4

MSU1 A/V synchronization

Posted: Sat Feb 22, 2014 1:21 pm
by Near
This thread is to discuss the audio sampling rate and method for MSU1 FMV playback.

Initial design goal: hardware flexibility. Audio is a simple system of "play or pause track #n". Video is a raw block of data. No codecs, no lossiness, no fixed playback rates, no fixed resolutions, no fixed color depths or palettes. Strictly speaking, the MSU1 has no concept of video. It's a software implementation detail.

Proposed change: a method to ensure long video sequences can be played without the audio desynchronizing. We want identical behavior between emulator and hardware implementations of MSU1, at least to the extent that it continues to allow for flexible hardware implementations.

Audio current design: audio is stored in a 44100hz 16-bit stereo file. This is not a hard requirement of MSU1 itself. Audio could be a 96000hz 24-bit 7.1 surround MP3 (only on an emulator), theoretically, and the same SNES-side software would just work with it. But of course, if we deviate from the 44100hz .pcm file format, it complicates compatibility between implementations.

Video current design: due to limited bandwidth, you have to make concessions. You can trade horizontal and/or vertical resolution in return for higher framerates. Or you can limit both for smaller video sizes (and thus, longer maximum video lengths.) For instance, video can be 240x160@15fps, or 224x144@20fps.

Video format timings:
NTSC non-interlace = 21477272/(1364*262-4) = 60.0991482hz
NTSC interlace = 21477272/(1364*525)*2 = 59.9840022 / 2 = 29.9920011hz

PAL non-interlace = 21281370/(1364*312) = 50.0069789hz
PAL interlace = 21281370/(1364*625)*2 = 49.9269677 / 2 = 24.9634839hz

NTSC standard video = 60 / 1.001 = 59.9400599 / 2 = 29.97003
PAL standard video = 50hz / 2 = 25hz

Proposed simplification: I don't think we should consider running PAL FMVs on an unmodified NTSC console, or NTSC FMVs on a PAL console. Further, I don't think we should consider interlaced FMVs being played in non-interlaced mode, or vice versa.

ikari's proposed solution: we adjust audio sample playback in the MSU1 to compensate for official NTSC / PAL video.
ikari_01 wrote:The MSU playback sample rate equals one sample per 59378938/122500 SNES master clocks (rounded to nearest neighbor). Audio output is synced to the SNES master clock in order to achieve frame synchronization even with slightly differently clocked consoles, and especially on modified PAL consoles. 21477270/(59378938/122500) Hz is an empirical value based on frame cycle count measurements (NTSC) and the assumption that audio should be synchronous to a source video material @29.97fps and 44100Hz, played back on the SNES at 2 SNES frames per video frame. On a perfectly tuned NTSC SNES the actual playback rate equals 44308.06Hz, on a perfectly tuned PAL SNES it's 43903.91Hz.
byuu's proposed solution: when we want our videos to be based off of NTSC source material, we perform our own sample rate conversion on the audio track of the initial video. A very simple linear or hermite filter should be fine. However, we do still have to consider the slight oscillator variance of the SNES CPU. The official specification is 21477272hz. Ceramic oscillators are terrible at +/-5%, but thankfully the SNES CPU oscillator is crystal, with a tolerance of +/-0.00000125%.

I think that basing the playback rate at NTSC / PAL video rates rules out the possibility of playing back videos not in NTSC / PAL format. Given that MSU1 is entirely lossless, the possibility exists to render base video at any refresh rate we like. I suspect that nearly all MSU1 video will be from NTSC / PAL sources, but one could conceivably CG render a sequence at the native SNES rate of 60.09hz / 2.

I'm also concerned that if we play back the audio at a different rate than .pcm stores, that the time lengths of the tracks will shift. Eg you'll see 10:00 in your media player, but 10:04 will be what you actually get on hardware.

However, I do think there may be strong incentive to base audio playback rate off of the SNES master clock. That +/-0.00000125% tolerance may be minimal, but it may quickly add up into big trouble. So I might propose a hardware design of ouputting one sample after every 487 SNES CPU clock ticks, which gives us 44101hz on NTSC. PAL is not quite so easy, as 21281370hz isn't very divisible by 44100. But combine a tying to the SNES CPU clock to audio resampling, and the same video file should play perfectly on every MSU1 implementation, be it hardware or emulator. It does however require access to the SNES CPU, and a non-trivial clock divider.

Other solutions? feel free to share your input below.

Re: MSU1 A/V synchronization

Posted: Sat Feb 22, 2014 3:35 pm
by tepples
In [url=http://forums.nesdev.com/viewtopic.php?p=125890#p125890]this thread[/url], byuu wrote:
tepples wrote:So I've gathered that some means to synchronize streaming audio with in-game animation is needed.
Only in nocash's mind.
And in my own. Mostly I want characters' lips in the animations to match up with the voice acting in the cut scenes; sync loss is noticeable if they drift 0.1 s apart. And I'm trying to future-proof it for 2019 when the Dance Dance Revolution patents start expiring. Until I found out about the claim construction verdict in Konami v. Roxor, I was planning on making a DDR clone for Game Boy Advance. The effort to do so resulted in a music player app for GBA using the GSM Full Rate codec. But all I really need for these is a readable playback position register. This would allow playback to remain as underspecified as it was before, except that the program can see how far along it is and adjust the animation playback rate.

CD analogy: the Q subcode.

Re: MSU1 A/V synchronization

Posted: Sat Feb 22, 2014 5:13 pm
by ikari_01
byuu wrote: byuu's proposed solution: when we want our videos to be based off of NTSC source material, we perform our own sample rate conversion on the audio track of the initial video. A very simple linear or hermite filter should be fine. However, we do still have to consider the slight oscillator variance of the SNES CPU. The official specification is 21477272hz. Ceramic oscillators are terrible at +/-5%, but thankfully the SNES CPU oscillator is crystal, with a tolerance of +/-0.00000125%.
Many people in PAL countries want 60Hz but do not have a genuine NTSC console, so they mod their consoles to 60Hz. This results in a quite large install base of "PAL60" consoles mimicking NTSC consoles (according to $213F) but running at 21.282MHz.

Re: MSU1 A/V synchronization

Posted: Sat Feb 22, 2014 5:35 pm
by nocash
ikari_01 wrote:The MSU playback sample rate equals one sample per 59378938/122500 SNES master clocks (rounded to nearest neighbor). Audio output is synced to the SNES master clock in order to achieve frame synchronization even with slightly differently clocked consoles, and especially on modified PAL consoles. 21477270/(59378938/122500) Hz is an empirical value based on frame cycle count measurements (NTSC) and the assumption that audio should be synchronous to a source video material @29.97fps and 44100Hz, played back on the SNES at 2 SNES frames per video frame. On a perfectly tuned NTSC SNES the actual playback rate equals 44308.06Hz, on a perfectly tuned PAL SNES it's 43903.91Hz.
At first glance, it's really hard to understand why you haven't used 44100Hz.

At second glance, the part about "source video material @29.97fps and 44100Hz" - are that common rates for videos (like .mpg's)?
Then I would roughly get the idea:
The SNES has a frame rate of ~60.0988Hz (30.0494Hz when skipping each 2nd frame).
So the SNES video would be a bit faster as in the video source (30.0494 instead 29.97).
So the SNES audio should be also played a bit faster as in audio source (44100*30.0494/29.97 = 44216.83).
Hmmm... but that doesn't explain your 44308.06Hz though.

Re: MSU1 A/V synchronization

Posted: Sat Feb 22, 2014 6:04 pm
by Near
<tepples> Mostly I want characters' lips in the animations to match up with the voice acting in the cut scenes

Again, please understand that it was never a design goal to play two-hour movies. Nobody wants to watch Avatar on their SNES.

I agree that audio should not noticeably desync within a two-minute cutscene, but we need to be reasonable here: perfection is the enemy of good. If we make the sync requirements too complex, it will rule out certain hardware designs.

Perhaps I've underestimated how quickly a difference of 44100 -> 44308 would accumulate. That would be a half-second desync in two minutes, which is pretty bad.

<ikari> Many people in PAL countries want 60Hz but do not have a genuine NTSC console, so they mod their consoles to 60Hz. This results in a quite large install base of "PAL60" consoles mimicking NTSC consoles (according to $213F) but running at 21.282MHz.

A/V timing is a very important matter: you need to match the hardware and software to do it right. I don't believe it'd be wise to worry about this case officially.

Is there a reason PAL users can't run an NTSC console? It seems easier than the myriad of soldering and switches necessary to pull off those PAL60 consoles.

But if there's something simple we can do to perfectly sync A/V on all variations (NTSC60, PAL50, PAL60 and the probably non-existent NTSC50; all with and without interlace mode enabled), then we may as well do so.

<nocash> Hmmm... but that doesn't explain your 44308.06Hz though.

From my perspective, I see:

60.0991482 / 59.9400599 = 1.002654123 * 44100 = 44217.046830479
The result is a drift of ~150ms per minute of video. Still too high.

What I would rather see is to take an NTSC video, strip out the audio to a WAV file, and then scale it by 0.997353959

What I am wondering is if we should drive the audio by exactly 44100hz, or try and sync it as a multiple against the CPU clock.

21477272 / 487 is a really nice number. That's a drift of 1.5ms per minute. A human being wouldn't notice an issue at all even in a ten minute video.

At ~60MB per minute of video (224x144@30fps), you can't exceed 73 minutes of video even if you wanted to. And given realistic download constraints, nobody is going to want to have a 4GB SNES game download hosted anywhere.

Let's also consider that the SNES itself has this problem in a completely unfixable manner. There's a guy on Youtube who makes videos demonstrating this all the time. Two otherwise identical revision SNES consoles will have slight oscillator differences, which means long intro songs in eg Super Mario World 2 will desync slightly if you watch it all the way through. And that's official software.

Re: MSU1 A/V synchronization

Posted: Sat Feb 22, 2014 6:23 pm
by tepples
It's fixable on Super NES. The S-CPU can send the current animation position to the S-SMP, or the S-SMP can send the current song position to the S-CPU. That's all I'm asking for: a way for the S-CPU to read the elapsed time.

As for data size, sync remains a problem whether the lip movement is FMV or just sprites.

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 4:48 am
by nocash
byuu wrote:
nocash wrote:44100 vs 43800 differs by almost 1%. So if you play a 120 second movie, then audio could become offsync by around 1 second towards end of the movie.
It was designed to play short 1-2 minute openings
...
You're nitpicking and undermining my efforts
...
Perhaps I've underestimated how quickly a difference of 44100 -> 44308 would accumulate. That would be a half-second desync in two minutes, which is pretty bad.
No comment on that.

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 5:33 am
by Near
So is there any reason resampling the audio stream from an NTSC video by (NTSC framerate / SNES framerate) won't work?

That gives us two options:

1. run a separate oscillator at 44100hz. With the SNES CPU oscillator's tolerance (~10ppm?) that's a max drift of 210 clocks/sec. That means we'd have to run for 102272 seconds (1704.5 minutes) to drift by a full second in the absolute worst possible case.

2. run at NTSC clock / 487. That means we'd stay perfectly in sync forever, but compared to the source PCM file, it'd drift by one second every 44100 seconds (every 735 minutes.)

Personally, I think both solutions are fine. Maximum theoretical run-time would be 120 minutes for a full 4GB of video data. Maximum real-world run-time will still be under 2 minutes for what MSU1 is intended for.

Barring any flaws in this thinking, I'd like to keep the spec at 44100hz officially. It'd be ideal to not change it unless we have to.
nocash wrote:No comment on that.
And yet your post is comment #125908

Could you please at least keep your passive aggressive responses in PM instead of derailing this thread? We're busy trying to address your concern in this thread.

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 7:49 am
by ikari_01
I'm all for proposal no. 2 as it takes into account possible clock differences between consoles and during operation. Though I'd prefer running at master clock/487 instead of being fixed to NTSC.

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 8:05 am
by Near
Cool! I love progress. Let's work off proposal 2, then.

So for NTSC: 315 / 88 * 6000000 / 44100 = 487.012987013

But for PAL (and PAL60): 21281370 / 44100 = 482.570748299
21281370 / 482 = 44152.219917012
21281370 / 483 = 44060.807453416

Let's round that to being off by 40 samples per second. That means we drift by one second every 1102 seconds, or every 18.375 minutes.
This is very much well within intended use cases, but perhaps not ideal for the theoretical "watch an entire movie on MSU1" scenario.
Note that this only applies to the difference between PCM files played back at 44100hz versus SNES consoles playing back the PCM files.
On actual hardware, so long as you scale your audio correctly, no drifting will ever occur.

A possible consideration: would it be possible to output one sample after 482 PAL clocks, the next sample after 483 PAL clocks, and then repeat? Yes, every other sample will be faster by 0.2074689%. I don't know how audible that would be to humans.

But if that worked, we'd end up with much less drifting. It'd be effectively 44106.513685214hz.

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 9:32 am
by ikari_01
I'd just stick with 487 because otherwise you'd lose sync at PAL60 as opposed to NTSC, and as far as I can tell autodetecting PAL60 vs. NTSC is impossible for SNES software.

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 10:16 am
by Near
Is it possible to tell the difference between NTSC and regular PAL50 in your firmware?

I would imagine you'd need your boot BIOS to read $213f and save the bit somewhere to be used by the MSU1 support.

What I'm thinking would be ideal:
Decided on option: NTSC -> pin 1 (master clock) / 487 = 44101hz
Best option: PAL -> pin 1 (master clock) / 482.5 {every two samples: 482x A, 1x A+B/2, 482x B} = 44106hz
Barring the above: PAL -> pin 1 (master clock) / 483 = 44061hz
Barring the above: PAL -> pin 1 (master clock) / 487 = 43699hz
Wildcard: just output at pure 44100hz using your own oscillator in PAL mode. Won't sync to the CPU perfectly, though.

NTSC / 487 is a one second difference every 44100 seconds.
<best> PAL / 482.5 is a one second difference every 7350 seconds.
<better> PAL / 483 is a one second difference every 1131 seconds.
<worst> PAL / 487 is a one second difference every 110 seconds.

Ultimately ... if the only option is a single divider, I'd suggest 487 everywhere. It's almost perfect for NTSC. And I suspect most if not all MSU1 titles will be NTSC or PAL60 games.

We could take a nuclear option for now and declare that MSU1 perfect-sync is only for NTSC titles, and worry about PAL perfect-sync support only if and when a strong desire for it emerges.

What I really love about the NTSC / 487 option is that it doesn't matter whether you output at 44100hz (your own oscillator) or at NTSC / 487 = 44101hz (reuse SNES oscillator) ... the time to drift by a noticeable degree is longer than entire movies, so it doesn't matter at all in practice. Anything you can fit on the 4GB of storage will stay synced, even taken to ad absurdum extremes of entire movies. And because of that, we don't have to change the spec for NTSC! :D

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 10:29 am
by nocash
byuu wrote:This is very much well within intended use cases, but perhaps not ideal for the theoretical "watch an entire movie on MSU1" scenario.
That is no problem - at least not once when sd2snes and higan do agree on using the same sample rate.
At that point, movie players could simply adopt themselves to that sample rate.
byuu wrote:Note that this only applies to the difference between PCM files played back at 44100hz versus SNES consoles playing back the PCM files.
On actual hardware, so long as you scale your audio correctly, no drifting will ever occur.
No please. If there is an agreement to use, for example, 44123Hz for .pcm files then that should be considered as the official rate which has to be used on both emulators and real hardware.
In that case devrs would need to resample their audio files to 44123Hz before converting them to .pcm format. So that no drifting would occur.

Assuming that you won't like that 44123Hz example, you also do this:
byuu wrote:A possible consideration: would it be possible to output one sample after 482 PAL clocks, the next sample after 483 PAL clocks, and then repeat? Yes, every other sample will be faster by 0.2074689%. I don't know how audible that would be to humans. But if that worked, we'd end up with much less drifting. It'd be effectively 44106.513685214hz.
That isn't audible to humans.
If you want exact 44100Hz then you could also use fixed-point additions to get 487.012987013 steps on NTSC.

Of course you would still need to resample audio for movies.
If the source movie has 44.1kHz and "slightly lower frame rate" than SNES...
then one could solve that by treating the 44.1kHz source as "slightly higher than 44.1kHz"...
which would mean that one would need to downsample the audio from to "slightly higher than 44.1kHz" to 44.1kHz.
Did somebody (not) understand that concept?

Some thoughts about PAL / NTSC:

Using the same divider for both PAL and NTSC has two advantages:
- It would work for NTSC and PAL60.
- One wouldn't need different dividers in PAL and NTSC cartridges.
Downside is that PAL has slower master clock so .pcm playback would have lower sample rate on PAL than NTSC (if everybody is aware of that "feature" than it could be dealt with; NTSC games wouldn't even need to be aware of it and work automatically on PAL60).

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 10:49 am
by nocash
byuu wrote:Ultimately ... if the only option is a single divider, I'd suggest 487 everywhere. It's almost perfect for NTSC. And I suspect most if not all MSU1 titles will be NTSC or PAL60 games.
I like that idea, too.

For NTSC, the official .pcm specification would change from 44100Hz to 44101.17Hz, right?
That change won't disturb older 44100Hz based NTSC software. And even most newer software could stay using 44100Hz source data (or if people should want microsecond-perfect synchronization, then they could resample their source data to 44101.17Hz).

And for PAL, the official .pcm specification would change from 44100Hz to 43698.911Hz, right?
That change might slightly disturb older 44100Hz based PAL software - but I guess there isn't much fine-tuned PAL software yet anyways (Super Road Blasters maybe). And any newer PAL programs could deal with the new 43698.911Hz value. That's great.

Re: MSU1 A/V synchronization

Posted: Sun Feb 23, 2014 10:53 am
by Near
<nocash> If you want exact 44100Hz then you could also use fixed-point additions to get 487.012987013 steps on NTSC.

Oh man, why didn't I think of that? That's a brilliant idea!

ikari, how hard would it be to do fixed-point step additions on the sd2snes?

On each clock tick, add 487.012987013 for NTSC, and 482.570748299 for PAL. When the value is >= 44100, output a sample and subtract 44100.

Obviously to handle the floating-point, we'd store the fractional bits in the low 16-bits and the whole number bits in the upper 16-bits.

If you could do this, that would be absolutely perfect!! :D

<nocash> And for PAL, the official .pcm specification would change from 44100Hz to 43698.911Hz, right?

Yes. But I still want to put this as the worst case for now. If we have to do it, we will.

Note that keeping the same clock rate for NTSC as PAL60 means that you will need different PCM files for NTSC versus PAL60 if you drive it by the SNES master clock. That could be an issue. But we could make an "NTSC->PAL60 PCM converter tool" for people to use.