Tokugawa Corporate Forums

Retro Japanese Computing
It is currently March 28th, 2024, 12:45 pm

All times are UTC




Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: February 22nd, 2015, 2:00 pm 
Offline
User avatar

Joined: May 15th, 2010, 1:35 am
Posts: 701
I've been cataloging PC-8801 disks recently.

The d88 file format is great, but it makes comparing files tricky. I have been thinking about the techniques and best practices for "scrubbing" d88 files to eliminate duplicates and create consistency. (Redumping everything with a Kyroflux would be nice, but...)

1) Split multi-disk images
This multi-disk archive feature of the d88 format is extremely convenient, but not all emulators support it, and we need to split these up to compare individual disks. There are a number of tools to do this, such as d88edit, d88uty, divd88, and the d88split Perl script.

2) Remove internal disk name
In a d88 file, the first 16 bytes is an arbitrary disk name/comment. Clearing this field is the quickest way to eliminate most duplicates. On the other hand, the internal disk name is often valuable for identification purposes, so make a note of the name before removing it. Also, I've seen some disks where the name extends into the reserved area following the name field. Some emulators and tools don't like this, so it's a good idea to zero out all bytes from 0h to 19h.

3) Check the write-protect flag (1Ah)
This is where things get tricky. Obviously most games need to save data at some point, so for consistency, the easiest thing to do is write-enable all disks (set byte 1Ah to the value 00). This is what TOSEC does, and it's a logical choice.

The problem is that software can check whether a disk is write-protected or not. Misty Blue and Emerald Dragon do this, for example. The check is usually whether a disk is writable, not write-protected, but a program could just as easily check the reverse. I have also discovered at least two disks that "self-destruct" unless write protection is enabled. I don't know whether it's copy protection or simply bad coding, but if you do not write-protect the disk, and do not have a backup... it's gone forever.

In general, write protection should be ON for most retail disks, because that is how the original floppies were published. Only save disks and user disks should be write-enabled. On the PC-8801, many retail game disks used notchless plastic covers, so it was actually impossible to make them writable (unless you cut a notch yourself with scissors or something).

4) Update older 672 byte (160 track table) images to the newer 688 byte (164 track table) format.
If my understanding is correct, this simply means inserting an extra 16 bytes into the header and updating the offsets in the track table. The dumped data itself is unaffected. Is there any reason not to do this? Could this conceivably break a working disk image?

Incidentally, if you feed VFIC a d88 with a 672-byte header and set d88 as the output format, VFIC will create a 688-byte header automatically.


Top
 Profile  
 
PostPosted: February 22nd, 2015, 2:47 pm 
Offline
User avatar

Joined: June 7th, 2008, 8:51 am
Posts: 928
Location: South Africa
I'm not sure I entirely understand your objective, but the problem of de-duping across all formats, including the PC98 ones, is always an issue.

I know you have used my Dimwit program, with what success I do not know. I have been working on it, and one aspect is you can set it to log the MD5 of the data portion of multiple disks by drag&drop - just set logging on. Not the whole disk - just the unpacked sector data ( which is mostly what you are trying to compare ).

Disclaimers - it is obviously a work-in-progress and might evidence bugs. It is also quite slow at .d88s as there is quite a lot of unpacking to do.

If you wish to try it again the current version is at

http://www.blackdiamond.co.za/slt/Dimwit.7z

Screenshot at

http://www.blackdiamond.co.za/slt/Dimwit1.png

If it would be useful to add a column in the .csv log output for the write-protect, I'm sure I could do that, or any other field off the header (but the disk name is likely to cause issues - anything and everything in there!)

Judging from the file spec I cannot see any way changing from 672 to 688 byte header could affect anything - I am sure it is quite safe.

I'm not sure if this helps. If Dimwit is not working for you I'd love to know why, and apologize for wasting your time. Or if you have any ideas for a helpful tool for this, please say so.


Top
 Profile  
 
PostPosted: February 23rd, 2015, 6:06 am 
Offline
User avatar

Joined: May 15th, 2010, 1:35 am
Posts: 701
Hi peter_j, thanks for replying. After writing this post, my next task was to PM you about Dimwit.

Basically, my objective is to systemically process the d88 headers, so that files with the same sector data will have the same file hash. Then you can just throw everything into a rom manager and dat, and sort thousands of files instantly.

I'm not worried about cross-format duplicates for now, just focusing on d88 only.

Take Emerald Dragon as an example. For this game, there are at least 7 different d88 sets commonly floating around, plus the files in the TOSEC set.
Total: 58 files (some merged together), 22.5MB

After applying steps 1-4 and removing duplicates based on MD5 hash, we get...
Total: 20 files, 7.3MB.

The number of files and total size is now 1/3 of the original. Saving 15MB of disk space is not much, but multiply that by around 1600 different PC-8801 games to get an idea of the potential reduction.

Steps 1-4 can be performed automatically with the right tool, like a Perl script, or a command-line utility called from a batch file. Currently I'm using a modified version of H. Tomari's d88split, and some hex editor scripts. An all-in-one tool with robust error checking would be better, but I can't program my way out of a paper bag. >_<

Now we only need to manually inspect 20 files instead of 58. For this part, I use a hex editor and peter_j's great tools like Dimwit and D88Viewer. For example, after eliminating duplicates, there are only two copies of the Emerald Dragon Ending disk, and it's easy to see that one is an 80-track dump, and the other is an 82-track dump.

The idea is to scrub 1000s of d88 files automatically with a script, remove all the hash-duplicates, and then manually inspect the remainder as needed.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group