Monday, June 25, 2007

Hacking through the night (Status Update)

It's 5:30 over here. I am somewhat tired after having spent the night hacking on the first decompiler prototype for SCUMM v5 (antipasto "Dirty D") and figured I might just as well whip up a quick status update on my blog. These past few weeks I have been quite busy with university work and personal issues and hence, even though I started writing Dirty D earlier than planned, I find myself barely within schedule now. There are ~10 days until the first prototype "milestone" and my slacker sense alarms me of more sleepless nights ahead. I still have to do the first svk push to my tools branch in the ScummVM repository, but that will happen later today once I am on campus (unless I am run over by a bus or abducted by aliens). Anyway, not too long ago I talked to my mentor about the general design of antipasto. Let me give a quick summary:


The decompiler is planned to roughly consist of two parts.

The first encompasses general utilities to be used to extract/process bytecode from game scripts and bytecode dependent backends making use of these utilities to produce a uniform temporary opcode listing format independent of bytecode variant in structure. All control flow manipulating opcodes of a bytecode variant will have to be replaced by semantically identical counterparts of the temporary format during processing.

The second part holds the control flow analysis functionality of the decompiler. It is responsible for transforming a partial decompilation in temporary format as produced by one of the bytecode variant backends into a semantically identical "program" with low level control flow statements such as gotos resolved into higher level looping/branching constructs where possible.

Rationale:

By disconnecting the control flow analysis part of the decompiler from bytecode dependent decompilation code, we'd go a long way of making the decompiler simpler to extend with support for new bytecode formats. Apart from that, the introduction of a temporary decompilation format might also help make more general ways of specifying bytecode formats in part one of antipasto apparent. Every bit of functionality for initial bytecode processing that can be ripped out of a bytecode variant backend and pressed deep into the guts of Dirty D eases future adding of support for new bytecode variants.

And that's what I'd like.

Friday, June 8, 2007

Report #1 -- Noodles and Double D

Week #1


I have spent most of my first week rereading descumm's source code and prodding my mentor to write up a little document describing SCUMM v5 bytecode format. He delivered it on sunday and it proved to be quite useful for getting a better grasp on descumm's inner workings by just having it side by side with a hex view of one of the Monkey Island 1 game scripts Andreas Muegge was kind enough to send me and descumm's decompilation of said script.

I also started thinking about recognizing control structures within bytecode (from conditional gotos/jumps ..), but as there is more drudgery to do until this becomes relevant I have not yet written up a concise representation of my ideas.

Week #2


I have actually started programming a decompiler for version 5 of SCUMM in Scheme on Tuesday. According to my original project schedule, this wasn't planned to happen for another two weaks while I sketch out an initial design. For this first prototype however, I will heavily lean on descumm's opcode handling code to be able to work on certain improvements earlier.

The prototype is named antipasto and the series of versions I plan to implement will henceforth be referred to as the family of antipasti. The first of these bears the code name "Dirty D" and will essentially be a decompiler for SCUMM v5 with improved control structure recognition in place.