State of Optical Preservation?
-
segaloco
- Posts: 913
- Joined: Fri Aug 25, 2023 11:56 am
State of Optical Preservation?
Disclaimer: This ain't about piracy. I don't want to hear it. I am interested in technical details and if someone gets talking about commercial games and file sharing, into the pit they go, this is Sparta.
-------
So in years past, I've never really gotten the warm fuzzies from any of the optical media dump formats I've encountered. The one I will use for my own dumps just because it has the least problems is bin/cue, since it is open and does happen to work with a decent amount of stuff. Plus, you can extract iso data tracks and wav audio tracks from them. Still, its a bit unwieldy, you have the two files, when I would have it that all the necessary track descriptors would *be* part of the bin somehow, and that would be standardized for things like word size and endianness.
Anywho, lower down the rung, I always despised the ubiquity of DiscJuggler and Alcohol 120% in the realm of optical media preservation. The cdi and mdf/mds formats still haunt my dreams. I'm glad I don't have to tussle with any of that stuff these days.
However, even though bin/cue clearly beats these in my mind, it still seems to be lacking to me, and I want to figure out what might work better. Unfortunately my most natural thought at present, to simply dd(1) from the /dev entry for an optical drive, is not attainable due to my lack of any such drive on a UNIX-y computer. I'm also averse to bootstrapping a new system on my one tower just to find out that isn't a winner.
Has anyone wrestled this problem into submission? Is there an ideal format for capturing accurate images from diverse optical media that may not be so simple as an audio CD or single-track ISO9660 filesystem?
Pardon the harsh disclaimer too but ya, I can't stress enough how much I don't want this to turn into any sort of untoward conversation.
-------
So in years past, I've never really gotten the warm fuzzies from any of the optical media dump formats I've encountered. The one I will use for my own dumps just because it has the least problems is bin/cue, since it is open and does happen to work with a decent amount of stuff. Plus, you can extract iso data tracks and wav audio tracks from them. Still, its a bit unwieldy, you have the two files, when I would have it that all the necessary track descriptors would *be* part of the bin somehow, and that would be standardized for things like word size and endianness.
Anywho, lower down the rung, I always despised the ubiquity of DiscJuggler and Alcohol 120% in the realm of optical media preservation. The cdi and mdf/mds formats still haunt my dreams. I'm glad I don't have to tussle with any of that stuff these days.
However, even though bin/cue clearly beats these in my mind, it still seems to be lacking to me, and I want to figure out what might work better. Unfortunately my most natural thought at present, to simply dd(1) from the /dev entry for an optical drive, is not attainable due to my lack of any such drive on a UNIX-y computer. I'm also averse to bootstrapping a new system on my one tower just to find out that isn't a winner.
Has anyone wrestled this problem into submission? Is there an ideal format for capturing accurate images from diverse optical media that may not be so simple as an audio CD or single-track ISO9660 filesystem?
Pardon the harsh disclaimer too but ya, I can't stress enough how much I don't want this to turn into any sort of untoward conversation.
-
Pokun
- Posts: 3442
- Joined: Tue May 28, 2013 5:49 am
- Location: Hokkaido, Japan
Re: State of Optical Preservation?
I agree, the BIN/CUE format seems to have the largest support but is unwieldy with its dual files. The name of the BIN is written in the CUE which means changing the name also requires editing the CUE.
I much prefer a single image file if possible.
Mame has the CHD (Compressed Hunks of Data) format which was originally for HDD images for arcade games with extra data on an HDD (in addition to ROM chips with the actual game on) but was later expanded to use with GD-ROM, CD-ROM and other optical disc media. I guess now it's more of a universal format for any game that uses a relatively large media format compared to traditional game media like ROM chips, tapes and magnetic disks.
For Mame, emulating a larger number of systems, this is very useful as that means they don't need to either support a ton of different image file formats for everything or favor a few select formats, instead they just need to make sure the converter supports all the common formats.
I much prefer a single image file if possible.
Mame has the CHD (Compressed Hunks of Data) format which was originally for HDD images for arcade games with extra data on an HDD (in addition to ROM chips with the actual game on) but was later expanded to use with GD-ROM, CD-ROM and other optical disc media. I guess now it's more of a universal format for any game that uses a relatively large media format compared to traditional game media like ROM chips, tapes and magnetic disks.
For Mame, emulating a larger number of systems, this is very useful as that means they don't need to either support a ton of different image file formats for everything or favor a few select formats, instead they just need to make sure the converter supports all the common formats.
-
Joe
- Posts: 773
- Joined: Mon Apr 01, 2013 11:17 pm
Re: State of Optical Preservation?
How much of the disc do you want to preserve? For CDs, the "ideal" format would be something capable of storing a raw EFM capture, plus maybe wobble if that was ever used for something other than ATIP or console copy protection. (And since we're talking about ideal, let's go with something a bit smarter than a raw bitstream so we don't have to store tons of redundant data.)
Being able to store a raw EFM capture isn't much good if you can't capture raw EFM in the first place, though. Normal CD drives will decode and error-correct the EFM, split the subcode from the audio frames, split the disc into tracks according to the TOC in the subcode in the lead-in area, deinterleave and error-correct the audio frames, and for data tracks, descramble and error-correct the audio frames into data sectors. Better hope your disc doesn't have any mastering errors. (Mastering errors may be intentional, either for non-standard formats or for copy protection.)
Fortunately, most of the problems with preserving CDs stem from their "analog cassette tape but digital and on LaserDisc" origin and don't affect newer optical disc formats. I'm only really familiar with how CDs work, though, so I can't comment on the "ideal" preservation format for anything else.
Being able to store a raw EFM capture isn't much good if you can't capture raw EFM in the first place, though. Normal CD drives will decode and error-correct the EFM, split the subcode from the audio frames, split the disc into tracks according to the TOC in the subcode in the lead-in area, deinterleave and error-correct the audio frames, and for data tracks, descramble and error-correct the audio frames into data sectors. Better hope your disc doesn't have any mastering errors. (Mastering errors may be intentional, either for non-standard formats or for copy protection.)
The TOC is stored in the subcode in the lead-in area. Usually bin/cue doesn't include the subcode, and even when it does, CD drives don't let you read the lead-in area.segaloco wrote: Thu Oct 23, 2025 4:08 pmStill, its a bit unwieldy, you have the two files, when I would have it that all the necessary track descriptors would *be* part of the bin somehow, and that would be standardized for things like word size and endianness.
You basically get an ISO file if you do that. Works great if you only want the user data portion from the mode 1 or mode 2 form 1 data track on a CD with only one track, not so great if you want anything else from a CD.segaloco wrote: Thu Oct 23, 2025 4:08 pmUnfortunately my most natural thought at present, to simply dd(1) from the /dev entry for an optical drive, is not attainable due to my lack of any such drive on a UNIX-y computer. I'm also averse to bootstrapping a new system on my one tower just to find out that isn't a winner.
Fortunately, most of the problems with preserving CDs stem from their "analog cassette tape but digital and on LaserDisc" origin and don't affect newer optical disc formats. I'm only really familiar with how CDs work, though, so I can't comment on the "ideal" preservation format for anything else.
-
segaloco
- Posts: 913
- Joined: Fri Aug 25, 2023 11:56 am
Re: State of Optical Preservation?
Darn, that sounds really messy. Did the CD architects never hear of UNIX and a device being representable as a linear array of the bits inside that medium? All the weirdness in CD structure is such a pain, especially given you don't run into the same on countless other media formats.
While the wobble and stuff would also be nice, I'm mainly interested in anything the console is executing or using as data for execution. So for instance Dreamcast discs have at least two data tracks, the low-density CD-ROM tracked inner band and the high-density GD-ROM tracked outer band. There might also be CD-DA tracks in the low-density section, and I don't know if this is the case or not but I imagine it would be possible to have CD-DA-ish tracks in the high-density portion too.
In that situation, for instance, I'd just want complete images of each data track and then the CD-DA as PCM/wav/whatever. The control data is an interesting prospect, and a more thorough preservation would do this of course, but at the very least I want to find a consistent, open format that works *well* for backing up multiple tracks that cross Rainbow-book definitions. That would at least be a way to ensure anything, including bootstraps sitting on alternate tracks, weird file BLOBs, etc. are actually in the image so it is fully useful then for reverse engineering. For instance, you at the very least want to ISO a Sega CD disc because the bootstrap is stored in sector 0, putting the disc in a drive and copying the files misses the CD-DA and bootstrap sectors. Granted bin/cue has been well enough in this realm, I just find it unnecessarily complicated when compared with things like taking a hard disk dump ala.
That's why I thought maybe that would work, that a dd off a CD would actually do what UNIX says and linearly copy the device as a file. Somehow someone decided that a CD won't get treated as a file....
While the wobble and stuff would also be nice, I'm mainly interested in anything the console is executing or using as data for execution. So for instance Dreamcast discs have at least two data tracks, the low-density CD-ROM tracked inner band and the high-density GD-ROM tracked outer band. There might also be CD-DA tracks in the low-density section, and I don't know if this is the case or not but I imagine it would be possible to have CD-DA-ish tracks in the high-density portion too.
In that situation, for instance, I'd just want complete images of each data track and then the CD-DA as PCM/wav/whatever. The control data is an interesting prospect, and a more thorough preservation would do this of course, but at the very least I want to find a consistent, open format that works *well* for backing up multiple tracks that cross Rainbow-book definitions. That would at least be a way to ensure anything, including bootstraps sitting on alternate tracks, weird file BLOBs, etc. are actually in the image so it is fully useful then for reverse engineering. For instance, you at the very least want to ISO a Sega CD disc because the bootstrap is stored in sector 0, putting the disc in a drive and copying the files misses the CD-DA and bootstrap sectors. Granted bin/cue has been well enough in this realm, I just find it unnecessarily complicated when compared with things like taking a hard disk dump ala.
Code: Select all
dd if=/dev/sda of=disk.img bs=1M
-
Joe
- Posts: 773
- Joined: Mon Apr 01, 2013 11:17 pm
Re: State of Optical Preservation?
The good news is, most CD drives used in game consoles work like PC CD drives, so the data you can extract with a PC CD drive pretty closely resembles the data the game console would be able to read from the disc. The bad news is, it's really really hard to do better than pretty close.segaloco wrote: Fri Oct 24, 2025 3:26 pmWhile the wobble and stuff would also be nice, I'm mainly interested in anything the console is executing or using as data for execution.
Many games have CDDA tracks in the high-density area. The high-density area is almost exactly the same as a regular CD, just higher density. It wouldn't be too inaccurate to say there are two CD-ROMs on every GD-ROM. Heck, you can trick some DVD drives into reading the high-density area, and they think it's an ordinary CD when they do it.segaloco wrote: Fri Oct 24, 2025 3:26 pmThere might also be CD-DA tracks in the low-density section, and I don't know if this is the case or not but I imagine it would be possible to have CD-DA-ish tracks in the high-density portion too.
I don't see how what you're asking for is any different from what you can already do with .cue files. Although, once you start looking at mixed-mode CDs, you run into the "fun" problem that CDs weren't designed for byte-accurate seeking: different drives have different ideas of how to line up the audio with the subcode timing information, and yes, that includes the factory "drives" that cut glass masters. Data tracks include some extra timing information so the drive can correct the offset while it's reading a data track, but that means you'll have either duplicated or missing data at the boundaries between audio and data tracks.segaloco wrote: Fri Oct 24, 2025 3:26 pmIn that situation, for instance, I'd just want complete images of each data track and then the CD-DA as PCM/wav/whatever.
There are even some drives that can't do byte-accurate seeking, but those are rare.
You've got your pick of CHD, bin/toc, and maybe bin/cue with some nonstandard extensions. I don't know what "control data" you're thinking of. If you're talking about the TOC, that's how you tell the difference between data and audio, so you need to store that information somewhere. If you're talking about subcode, there are consoles that use it as intended (e.g. Saturn playing CD+G) and as copy protection (e.g. various PlayStation games).segaloco wrote: Fri Oct 24, 2025 3:26 pmThe control data is an interesting prospect, and a more thorough preservation would do this of course, but at the very least I want to find a consistent, open format that works *well* for backing up multiple tracks that cross Rainbow-book definitions.
CDs were designed as a consumer audio format, so ease of copying probably wasn't much of a consideration. If they did consider it, they wouldn't have wanted to make it easy.segaloco wrote: Fri Oct 24, 2025 3:26 pmSomehow someone decided that a CD won't get treated as a file....
-
segaloco
- Posts: 913
- Joined: Fri Aug 25, 2023 11:56 am
Re: State of Optical Preservation?
I guess what irks me is yes, yes I am fully aware you need to store that data. When I dd a hard drive, I get that data in the same file, it's called a super block. There is ultimately in some way, shape, or form a linear array of data, maybe checksummed, maybe error corrected, whatever, there is a linear array of data that, when interpreted, yields then the logical structure of the multi-format disc. Whatever *this* is, my frustration is that *this* is not part of a dump, *this* is represented abstractly as a secondary cuesheet that can be corrupted, lost, overwritten independently of the disc data.
The CD is one object. It has a finite amount of objective data encoded on its physical surface to describe the contents. Why is a dump not:
TOC Data
Track 1
Track 2
Track 3
Like a hard disk is
Superblock
Partition Header A
Partition A
Partition Header B
Partition B
It can be one thing. One physical object *could* be described by a linear array of the necessary data to recreate the data on the media. That it is instead bifurcated in what is sadly still the most practical format is what irks me. Tools that don't properly implement cuesheet irk me. That I've had to hand edit the same cuesheet multiple times for each different program I had to put it through IRKS me.
Why is it so bad? Linear preservation of data encoded on a track has been a thing in computing since magtape and PPT. That for optical media specifically the situation is so dire and daunting makes absolutely no sense given the media was invented *long after* the computing world decided a linear array of information was...well...linear and complete.
Sorry this is now a rant, but I'm done. I guess I'm frustrated with bin/cue and have wondered for over a decade why there isn't literally anything better. My .img multi-partition USB stick image of a Linux installer doesn't need a damn cuesheet. Why is CD track metadata *so* special it needs such a backwards abstraction that none of the other exotic disks, tapes, and ROMs that existed for decades needed. Heck the iNES format even settled an abstract ROM banking scenario without 100% more files to misplace and corrupt in the process...
Thanks for playing along though, I guess I just have to keep using bin/cue, I was hoping I had just overlooked some magical format that was a linear array of the linear data stored on the linear path on the linear surface of the finitely sized object that I can see and hold in one hand because it is one single finite object without magic entropic quantum variability that would justify such an arcane and mystifying family of absolute garbage disc image formats....
The CD is one object. It has a finite amount of objective data encoded on its physical surface to describe the contents. Why is a dump not:
TOC Data
Track 1
Track 2
Track 3
Like a hard disk is
Superblock
Partition Header A
Partition A
Partition Header B
Partition B
It can be one thing. One physical object *could* be described by a linear array of the necessary data to recreate the data on the media. That it is instead bifurcated in what is sadly still the most practical format is what irks me. Tools that don't properly implement cuesheet irk me. That I've had to hand edit the same cuesheet multiple times for each different program I had to put it through IRKS me.
Why is it so bad? Linear preservation of data encoded on a track has been a thing in computing since magtape and PPT. That for optical media specifically the situation is so dire and daunting makes absolutely no sense given the media was invented *long after* the computing world decided a linear array of information was...well...linear and complete.
Sorry this is now a rant, but I'm done. I guess I'm frustrated with bin/cue and have wondered for over a decade why there isn't literally anything better. My .img multi-partition USB stick image of a Linux installer doesn't need a damn cuesheet. Why is CD track metadata *so* special it needs such a backwards abstraction that none of the other exotic disks, tapes, and ROMs that existed for decades needed. Heck the iNES format even settled an abstract ROM banking scenario without 100% more files to misplace and corrupt in the process...
Thanks for playing along though, I guess I just have to keep using bin/cue, I was hoping I had just overlooked some magical format that was a linear array of the linear data stored on the linear path on the linear surface of the finitely sized object that I can see and hold in one hand because it is one single finite object without magic entropic quantum variability that would justify such an arcane and mystifying family of absolute garbage disc image formats....
-
stan423321
- Posts: 127
- Joined: Wed Sep 09, 2020 3:08 am
Re: State of Optical Preservation?
It is my understanding that earliest CD emulators using cue/bin were companions to simple mixed mode disc authoring tools. The idea wasn't "let's dump the discs we bought", it was "let's make sure whatever we send to the CD factory works". Thus, cue + bin (for error-hardened filesystem) + wav-s (for CD audio). Don't ask me why they already merged files into the bin, maybe it helped with simulating the timing.
This format was eventually started being used for dealing with dumps as well. The thing is, some CDs screwed with the simple heuristics for track splitting. There are continuous recordings marked as separate tracks, and then there's CD-XA. The dumping people were not in the mood for improving making more files, so they threw more things into the bin.
This format was eventually started being used for dealing with dumps as well. The thing is, some CDs screwed with the simple heuristics for track splitting. There are continuous recordings marked as separate tracks, and then there's CD-XA. The dumping people were not in the mood for improving making more files, so they threw more things into the bin.
-
segaloco
- Posts: 913
- Joined: Fri Aug 25, 2023 11:56 am
Re: State of Optical Preservation?
Well and the thing is, a superblock is similarly something that doesn't just magically exist on its own. When authoring a CD, yes that directory information is originally abstract things like a cuesheet.
Just like HD is just an empty superblock until you issue a bunch of commands to make the files. Sure, bootstrapping the filesystem, you can do that. But then you dd it and copy the entire structure, data *and* metadata. To me, bin/cue comes off as instead: "Well I could just dump this linear disk, but instead I'm going to write a shell script that reruns each mkdir, cat, touch, etc. to manually recreate the directory structure." Yeah sure, it works, I guess, but is HORRIBLY inefficient and again now you have a whole second file to keep track of. It just feels....lazy?
That said, there was a format called "shar" otherwise known as "shell archive" that was largely that, although still one file. The idea was you generate a shell script with the blobs stashed in the right arrays in places and then all the filesystem metadata involved was all shell commands to make it. Compare with "tar" aka tape archive where it is more like a dd dump, what with having the filesystem metadata simply part of the binary image rather than a bunch of script commands. Things like shar were invented for transmitting data over the "uucp" network, predating ARPANET and all of that.
Anyway it just bewilders me that CD is the odd one out. That you do weird mastering stuff for a magtape....but can dump a magtape as a single file. That you do weird mastering stuff for a QIC tape...but can dump a QIC tape as a single file. That you do weird mastering stuff for a FDS QuickDisk...but can dump a QuickDisk as one file. What paint thinner were the optical media folks huffing....
Just like HD is just an empty superblock until you issue a bunch of commands to make the files. Sure, bootstrapping the filesystem, you can do that. But then you dd it and copy the entire structure, data *and* metadata. To me, bin/cue comes off as instead: "Well I could just dump this linear disk, but instead I'm going to write a shell script that reruns each mkdir, cat, touch, etc. to manually recreate the directory structure." Yeah sure, it works, I guess, but is HORRIBLY inefficient and again now you have a whole second file to keep track of. It just feels....lazy?
That said, there was a format called "shar" otherwise known as "shell archive" that was largely that, although still one file. The idea was you generate a shell script with the blobs stashed in the right arrays in places and then all the filesystem metadata involved was all shell commands to make it. Compare with "tar" aka tape archive where it is more like a dd dump, what with having the filesystem metadata simply part of the binary image rather than a bunch of script commands. Things like shar were invented for transmitting data over the "uucp" network, predating ARPANET and all of that.
Anyway it just bewilders me that CD is the odd one out. That you do weird mastering stuff for a magtape....but can dump a magtape as a single file. That you do weird mastering stuff for a QIC tape...but can dump a QIC tape as a single file. That you do weird mastering stuff for a FDS QuickDisk...but can dump a QuickDisk as one file. What paint thinner were the optical media folks huffing....
-
Pokun
- Posts: 3442
- Joined: Tue May 28, 2013 5:49 am
- Location: Hokkaido, Japan
Re: State of Optical Preservation?
Because Laser Disc technology can basically be traced back to phonograph records which are all-analog and predates Unix by a 100 years. Even LD uses analog video and either analog or digital audio. The Compact Disc derives from the LD and introduces an all-digital design as we entered the digital era.segaloco wrote: Fri Oct 24, 2025 3:26 pm Darn, that sounds really messy. Did the CD architects never hear of UNIX and a device being representable as a linear array of the bits inside that medium? All the weirdness in CD structure is such a pain, especially given you don't run into the same on countless other media formats.
Also CD-R technology didn't exist from the start, CDs wasn't supposed to be writable unlike magnetic disks and tapes. They were also originally made only for audio playback (which though of course can easily be recorded to a tape) just like the phonograph record was.
Not all of the information of a CD is meant to be readable from a normal CD-ROM drive. The wobble data used by Saturn and PS1 for copy-protection (and region-protection in the case of the PS1) abuses the laser error correction hardware. By encoding the protection code in the wobbles, this code is then cleverly found by the console's protection hardware in the laser error correction log output for checking if it's the correct code, I don't think you can get access to that without a modified disc drive at least, and of course no CD-burner can reproduce the wobbles (that's the point).
The PS1 has the wobble data in the TOC in the inner ring while the Saturn has it in the ring area in the outer rim.
Subchannel data used for things like CD+G should be readable by most CD-ROM drives or they wouldn't be able to view such content. Not all drives can dump the areas where the control data is though.
-
segaloco
- Posts: 913
- Joined: Fri Aug 25, 2023 11:56 am
Re: State of Optical Preservation?
Geeze...I'm so tired of standards being suggestions. If something is a *standard* and it isn't required, it is completely meaningless. These lazy engineers who don't want to get their ducks in a row and have any QA to their processes aren't the ones then stuck dealing with the historical mess they've made...
-
Joe
- Posts: 773
- Joined: Mon Apr 01, 2013 11:17 pm
Re: State of Optical Preservation?
When you consider that the first PC CD drives were basically audio CD players with a microprocessor connected to the audio output to translate "audio" from a data CD into usable data, the weird limitations make a bit more sense. And they have no reason to go back and fix it, since it works well enough for everybody except archivists.
-
segaloco
- Posts: 913
- Joined: Fri Aug 25, 2023 11:56 am
Re: State of Optical Preservation?
Sounds like serial transmission over modems being an adjunct of the voiceband POTS service rather than something catered to in QA from the beginning.
Still, that points to a bit of dishonesty with the consumer, adding on some functionality to a technology while still claiming your thing is a CD or whatever. What is the point of the Rainbow books if not to be prescriptive of the data formats? It should be unambiguous enough that there isn't wiggle room for these weird little variations in the first place. Ironic, given the point of a memory format is to preserve information...
Still, that points to a bit of dishonesty with the consumer, adding on some functionality to a technology while still claiming your thing is a CD or whatever. What is the point of the Rainbow books if not to be prescriptive of the data formats? It should be unambiguous enough that there isn't wiggle room for these weird little variations in the first place. Ironic, given the point of a memory format is to preserve information...
-
lidnariq
- Site Admin
- Posts: 11803
- Joined: Sun Apr 13, 2008 11:12 am
Re: State of Optical Preservation?
The Rainbow books indeed were prescriptive. Being incompatible with the standard was deliberate, after unlicensed copying was ubiquitous on the previous generation of consoles.segaloco wrote: Sun Oct 26, 2025 12:25 am What is the point of the Rainbow books if not to be prescriptive of the data formats? It should be unambiguous enough that there isn't wiggle room for these weird little variations in the first place.
-
segaloco
- Posts: 913
- Joined: Fri Aug 25, 2023 11:56 am
Re: State of Optical Preservation?
Well I still think the cuesheet should just be a header on the file nyeh. It makes sense though, I do appreciate the perspective that the state of things derives largely from mastering also being a TOC + other stuff separate process. I just think in terms of mass storage, be it tapes, platters, flash, whatever.
-
creaothceann
- Posts: 863
- Joined: Mon Jan 23, 2006 7:47 am
- Location: Germany
Re: State of Optical Preservation?
Just "adding on some functionality" worked great for NTSC, closed captioning and teletext. It's also how we got to enjoy consoles with 240p output.segaloco wrote: Sun Oct 26, 2025 12:25 am that points to a bit of dishonesty with the consumer, adding on some functionality to a technology while still claiming your thing is a CD or whatever
Reminds me of analog TV in general.segaloco wrote: Sun Oct 26, 2025 12:25 am It should be unambiguous enough that there isn't wiggle room for these weird little variations in the first place
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10