Dragon Quest Disassembly

A place where you can keep others updated about your NES-related projects through screenshots, videos or information in general.

Moderator: Moderators

User avatar
segaloco
Posts: 437
Joined: Fri Aug 25, 2023 11:56 am
Contact:

Re: Dragon Quest Disassembly

Post by segaloco »

Sorry for the curt notification, I was still grumpy about something at the time. To be a bit more thorough, this is done as far as I can tell regarding addresses and text. There are still select "magic numbers" that are not intrinsically defined to their sources, for instance NPC string IDs are still just bytes, not a constant in terms of where the members in the array actually are at compile time. That's stuff that I intend to work on in the second pass of this while working in the Dragon Warrior stuff.

The crux of it all is the awk script in the util folder. Since ca65 doesn't support UTF-8 as far as I can tell, I went ahead and shunted that to an outside tool. I had to make a few odd choices to ensure the ROM still builds 1:1. For instance, there is one whole string in the strings file where someone inadvertently provided a handakuten as a period. In Japanese as I'm sure you know, both are a small circle, but one is added as accenting to ha-hi-fu-he-ho kana to change them to pa-pi-pu-pe-po whereas the other is used much like a conventional period. I could see someone making that mistake in a hurry, but yeah, as a result that one period uses another symbol in the Japanese codepage that also looks like an empty circle, but that's admittedly a kludge.

Another inconsistency is that there are two ways that accented kana are expressed in strings:

- In one approach, a special control suffix (in this case 0xF8 or 0xF9) is provided immediately following a kana needing decoration. The text engine then interprets this as meaning to apply a dakuten (0xF8) or handakuten (0xF9) to the immediately preceding kana.

- In the other approach, all accented hiragana characters are also provided in the "common strings" library meaning you can trim them to one byte in most strings. This is done inconsistently through the codebase but luckily cleanly along the lines that I separated files. To handle this at build time, there is a sed script that further converts the request for the single byte version in this case *to* the multi-byte one above if the file has the right extension. I used .j and .j2 (for japanese-text) but am not really married to an extension. I went with a different extension so I could use make(1)s inference rules for .j -> .s and .j2 -> .s rather than some temp file or changing other rules to subjectively handle the awk filtering.

One reason I see they *may* have avoided the string versions of the accented hiragana is that unlike the global strings library, the common strings library does *not* have the offsets of each string precalculated and staged in a table, the global strings do. In other words, global strings have the index stashed in a table somewhere with even alignment, so if you want string 3, you go look at index 3 in the table and that tells you the word offset from the base of the string library.

With the common strings, instead, it's an 0xFF-delimited list that every time a string index is passed, I'm pretty sure it has to traverse the entire collection until getting to that row, meaning your lookup time scales with how many and how long the strings are. I'm still a bit fuzzy on this but that's the gist I'm getting from the code now that it's all broken open. I haven't studied the text handling of the US release yet to determine how much similarity there is, but that was the area with the most differences when comparing stuff with existing US Dragon Warrior analysis, so I suspect there may be enough changes that I shunt the US and Japanese text handlers into separate conditionally/included files rather than putting little ifdefs in the middle of things, depends on how different though.

Either way, if you find yourself using this for anything and have any questions, I'm happy to illuminate the still shadowy parts. This has been a couple years in the making, it's satisfying to finally have what I believe is complete labeling and, aside from CNROM bank considerations, full relocatability of code and a high degree of relocatabilty of data without having to manually adjust pointers and indices all over the place.

The trickest thing to nail down to an assembler/linker generated value rather than bytes right in the code is that common string library. Since the strings are variable length and there is no offset table, there's no consistent spacing I could use for a constant that reduces down to each string index. I'm still puzzling on the best way to make the string IDs derive from their labels in code rather than just knowing which one is the first, the second, the third, etc. in the list.

Anywho, I added a section to the readme about how the text works, at least the code generation parts of it, so that should illuminate the reasoning behind anything you actually see in strings like control characters.
Pokun
Posts: 2951
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: Dragon Quest Disassembly

Post by Pokun »

Yeah I remember the game also had two ways to represent dakuten/handakuten in the game. As a separate character when entering the hero's name and putting the diacritical marks between the rows in dialog (so that only every other row could be used for the text).

It's nice that you bothered implementing a system for handling strings in kana, it makes it much easier to read (provided you can fluently read kana) and search. Especially since, as you said, the text portions are where DQ and DW differs the most and deserves to be documented thoroughly.
It's too bad ca65 doesn't support UTF-8 though. That's another thing I like about 64tass and the fact that it allows defining things like control characters for use in strings, but ca65 is the more popular assembler for NES.

Definitely a fine job as deserved for one of the most important games in gaming history. I think DQ1 has SMB1 level of importance since it basically gave birth to the whole JRPG genre by merging Wizardry and Ultima and also popularized the genre beyond its nerdy roots.
User avatar
segaloco
Posts: 437
Joined: Fri Aug 25, 2023 11:56 am
Contact:

Re: Dragon Quest Disassembly

Post by segaloco »

I appreciate the comparison to SMB1, and I agree, that's largely why I chose Dragon Quest for examining an RPG engine. It's a genre-defining title that spawned not only its own legacy but innumerable other series and titles. Its importance to the proliferation of RPGs on home consoles cannot be understated.

My main reasoning behind using the cc65 suite is it just resembles the standard UNIX programming environment so much, so my practices that I use in other areas meld nicely without having to learn a whole bunch of different stuff just for my 6502-focused things. That also makes choices like this one more exportable, I could theoretically use this same awk(1) script with GNU as out of the box since .byte is also a directive over there. The sed script would need touchup since .dbyt is not. An alternative fix would be to change my constants so that the two-byte characters are expressed in reverse order, then I could use .word. Still, a lot less rewrite than, say, if I had relied on the intrinsic UTF-8 of some other toolkit, now I've got a home-grown solution that makes that not matter at all.
User avatar
segaloco
Posts: 437
Joined: Fri Aug 25, 2023 11:56 am
Contact:

Re: Dragon Quest Disassembly

Post by segaloco »

Ouch, so discovered an issue with my awk(1)-based tooling for handling Janapese UTF-8 text in source files. As it turns out, POSIX does require applications to acknowledge that the LANG and LC_* variables exist, and governs that an implementation can provide locales other than the C/POSIX locale, but it does not bridge the two in that it does not require individual utilities to *honor* other locales the implementation supports. This is quite a pain, as it means a system can advertise POSIX conformance as well as a host of locales beyond POSIX, but there is nothing enforcing the quality of implementation of those other locales, POSIX essentially just states you can say they're there, you can say your applications are POSIX, but POSIX isn't going to require you to then ensure the two play with each other.

In any case, what this has lead to is I'm finding that the version of awk(1) shipped with macOS explicitly does *not* support multi-byte locales even though macOS itself includes said locales. This is quite a pain, this script of mine does not work on macOS. For now I've touched up the repo to indicate that this is unfortunately only functional on systems providing gawk or another awk(1) implementation that is cognizant of the locales on the machine.

Does anyone have any experience with supporting this sort of thing in a portable way? I've been considering digging into the iconv(1) utility as a way to do the locale conversions, but I don't know how well that will work in practice because iconv(1) maps encodings to encodings, not encodings to arbitrary text like my awk(1) script does. It would work if all the tiles are in the same place because it's just transform multi-byte kana <xyz> to single-byte CHR index. However, one of the points of my implementation was that awk(1) is transforming the character into a corresponding enumerated value that then points to whatever the character is defined as, so that way if a character moves around, you change one equate rather than the conversion script. This is annoying because my own personal reasons for being super POSIX-y are so I can jump between macOS and GNU without thinking about it, and potentially migrate back to FreeBSD in the future under the same auspices.

Thoughts?
lidnariq
Site Admin
Posts: 11609
Joined: Sun Apr 13, 2008 11:12 am

Re: Dragon Quest Disassembly

Post by lidnariq »

In my opinion, if you're using awk or sed, you should be using perl...
User avatar
segaloco
Posts: 437
Joined: Fri Aug 25, 2023 11:56 am
Contact:

Re: Dragon Quest Disassembly

Post by segaloco »

Perl isn't a guarantee, but awk(1) has the weight of POSIX behind it. Everywhere I've posted this I've mentioned that angle and I've been told Python, Lua, and now Perl, none of which fit the ticket. I've had some success using iconv(1) and just need to finish redoing the tooling to use it. That way iconv(1) is the only piece that touches the UTF-8, the rest is just handling single-byte representations in ASCII. This should result in a portable build between UNIX-likes again without some external dependency beyond the cc65 suite.
Oziphantom
Posts: 1740
Joined: Tue Feb 07, 2017 2:03 am

Re: Dragon Quest Disassembly

Post by Oziphantom »

you might be better off aiming for CodePage 932 aka "Windows CodePage 932" aka SHIFT-JIS for Japanese support, but wanting to support Mac will probably be the issue, the new NextOS macs might have 932 support or they might stick to the "better" CodePage 10001 that Mac historically used.
User avatar
segaloco
Posts: 437
Joined: Fri Aug 25, 2023 11:56 am
Contact:

Re: Dragon Quest Disassembly

Post by segaloco »

Ideally Apple would just support the locales they ship with the tools they ship rather than making me go shop around for another solution, but far be it from Apple to actually put any effort into the UNIX-like subsystem they want to so proudly advertise with.

Using other codepages just lands me in the same boat, with awk(1) probably not natively supporting it, requiring something to translate into something awk(1) can speak. I'm having success with iconv(1) so I think I'm going to stick with that doing the encoding conversion to something awk(1) can mess with. I'm realizing it probably should be fine to have iconv(1) spit out whole strings, I'm just defining a character encoding where codepoint <xyz> is byte sequence <abc>, the latter byte sequence just *happens* to be the string I want in ASCII. Heck, if there's a liberal enough limit on the max allowable sequence length, I might just have iconv produce the entire preprocessing directive, cut awk(1) out entirely, but we'll see.
User avatar
Dwedit
Posts: 5069
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Dragon Quest Disassembly

Post by Dwedit »

Shift-JIS is very badly designed. A multibyte character can contain a regular ASCII character as the second byte. Like backslash or quote. Hence why UTF-8 was created to avoid that problem.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Post Reply