DISX4 - Full-Screeen Interactive Disassembler
by Bruce Tomlin - bruce@xi6.com


DISX4 is a multi-CPU disassembler for various vintage CPU architectures, mostly 8-bit. It will keep track of live changes to label references.


What disx4 is intended for:
- making a decent disassembly that can be saved as a .asm file that you can tweak later
- not for making the prefect disassembly that needs no further touching

Cool things that it doesn't do:
- track RAM references
- deal with split references (high half in one instruction, low half in another)
- combine separate binary files (if your code comes from multiple ROMs, you will need to create a combined binary first)
- handle multiple code segments with different address ranges


=== BULDING ===

This uses a standard makefile, so use the "make" command to build it. It should work in a standard Linux or MacOS environment with C++ developer tools installed. If this isn't sufficient information, then google for "how to use make".

On some systems, if you have not previously used the compiler, "make" might have trouble finding it. Consult the documentation of your OS distribution for more information.

If you plan on modifying the code, you might want to use "make depend" first. This generates a file named ".depend" with make dependencies for the .h files. That way, when you change a common .h file, all files that use it will be automatically rebuilt. The "make depend" command will probably complain about missing standard includes; you should ignore these warnings.

I haven't tried to build this on Windows. It might work with either Cygwin or the Linux subsystem. At least one person has tried with Cygwin, but its system headers defined "addr_t" for some reason.


=== REFERENCES ===

Each line of disassembly can have one external reference, to either code or data. Note that some CPUs have 16-bit immediate instructions which can potentially be either an address or a number. This reference address will be used for creating labels, and for code tracing. Word or longword data lines will only allow a reference from the first word on the line.

Label creation can be suppressed to certain addresses by using the Control-R command on the address.

Address 0x0000 is specifically prevented from automatically generating a reference, because it is almost never used as such.


=== TRACING ===

If the label is a code reference, the T command can disassemble into referenced code after finishing the current block of code. This can quickly disasssemble large ranges of code, but it can also be misled by "tricks" like subroutines that pop the return address, read data from that address, and then put it back before returning.  Note that the Z-80 series is particularly vulnerable to blank (0x20) characters causing bogus code references. The "rip-stop" functionality attempts to detect the most obvious situations.


=== COMMENTS ===

Comments can be attached to any code line by using the semicolon (';') command. This will bring up the current comment (if any) for editing. Press Return to save the line, press Escape to cancel changes, or delete the entire comment line and press Return to remove the comment.

Comments are saved in a ".cmt" file, which consists of lines with the address in hexadecimal ASCII, a blank, the comment text, and a newline. They are saved in sorted order, but can be loaded in a non-sorted order. If all comments have been removed, the ".cmt" file is deleted.

If a comment "falls off" because its address is now in the middle of a line, it is not immediately deleted, but can not be seen until the address starts a line again. However, when saving, such hidden labels will not be saved.

NOTE: Comments are not affected by the Undo command!


=== UNDO ===

There is minimal undo support, in the form of one saved state. It is automacially saved by the shift-C and shift-T commands, and manually by the control-U key command. To use it, use the shift-U key command. But it's really not in a useful state right now.


=== COMMAND LINE ===

Usage:
    disx4 [options] [binfile]

Options:
    -c cpu         select default CPU type
    -c ?           show list of supported CPU types
    -b xxxx        hexidecimal base address
    -s xxxx        hexidecimal size of binary data
    -o xxxx        hexidecimal offset to start of data in file
    -a             create binfile.asm and exit
    -l             create binfile.lst and exit
    -!             don't load binfile.ctl

If started up with no parameters, it will show an empty file, and the selected CPU type will be undefined. If "binfile.ctl" is loaded, its saved -b, -s, and -o parameters are used unless overridden.

"-c cpu" will select the default and current CPU type in a new .ctl file.

"-c ?" will show a list of supported CPU types.

"-b xxxx" will specify the load address of the first byte

"-s xxxx" will specify the maximum size of the code image to be loaded.

"-o xxxx" will specify the offset in the file to the first byte of the code image. This allows skipping a header.

"-!" will prevent automatic loading of "binfile.ctl". This will ignore any existing work in "binfile.ctl".

"-a" and "-l" will generate a .asm and/or .lst file, then return to the command line.


=== SCREEN NAVIGATION KEYS ===

Up-arrow        Move up one line
Down-arrow      Move down one line

Home            Move to first line
End             Move to last line

Page Up         Scroll down a page at a time, while staying on the same screen line
Page Down       Scroll up a page at a time, while staying on the same screen line
Space bar       Same as Page Down

Control-B       Move to top of screen, then move up a page at a time
Control-D       Move to bottom of screen, then move down a page at a time


=== SINGLE-KEY COMMANDS ===

0 - 9           Digit keys let you specify a count of 0-99 for some commands.
                The escape key will clear the count. Some commands have a maximum count.

x               (no count) Disasssemble as raw data (the default state)
a               (max count 40) Disassemble as ASCII text
b               (max count 32) Disassemble as single bytes
h               (max count 32) Disassemble as hex bytes with no formatting'
w               (max count 20) Disassemble as 2-byte words
shift-W         (max count 20) Disassemble as reverse 2-byte words
ctrl-W          (max count 1)  Disassemble as word address - 1 for 6502 jump tables
\               (max count 16) Disassemble as 4-byte longwords
| (shift-\)     (max count 16) Disassemble as reverse 4-byte longwords
d               (max count 20) Disassemble as decimal words (unsigned)
shift-D         (max count 16) Disassemble as decimal longwords (signed)
_ (underline)   (max count 40) Disassemble as EBCDIC text
I               (max count 16) Disassemble as binary, 0x55 = 01010101B
X               (max count 16) Disassemble as visible binary, 0x55 = _X_X_X_X
O               (max count 16) Disassemble as visible binary, 0x55 = O_O_O_O_
                See "xobits.h"/"xobits.s" for the "visible binary" equates.
*               Repeat previous data format command with same count
( and )         Expand or shrink the size of a line of data.

o               Dissasemble current word as a position-independent offset in a table
                starting at the most recent label. This is commonly found on certain
                architectures such as 68k. If the target address has no label, a data
                label will be created. If the target is already disassembled as code, a
                code label will be created.

c               Disassemble current address as code. Ignored if already disassembled
                as current CPU, or if an illegal instruction.

shift-C         Disassemble as code until unconditional branch or illegal instruction.

T               Trace disassemble from current instruction, same rules as shift-C except
                that all code references are followed and disassembled as well.

ctrl-T          Disassemble as a word and trace from the referenced address. This is
                intended for jump tables. The reference address will become a code label
                when appropriate.

l               Toggle a pre-instruction blank line before this code/data line.

shift-L         Toggle label type at this address between none, data, and code. Note that
  or            if the current line is a "EQU $-n" line, this will only remove the label.
control-L       You must use "X" to break up the previous instruction, change the label
                state, then fix it with "c" afterward.

^               Toggle label type at address referenced by the current instruction.

control-R       Toggle the no-reference attribute for selected address, to prevent labels
                from being automatically created. This is useful for certain "magic
                number" addresses like 0x1000, and almost anything at address 0000-00FF.

control-U       Save current state for the next undo.
shift-U         Undo to last saved state. State is automatically saved before shift-C and
                shift-T commands.

@               Go to the address referenced by the current instruction.
< and >         Go backward and forward along "@" or ":xxxx" usages.
[ and ]         Go backward and forward to the next non-EQU label.

!               Search the entire code for any reference to the label at the current line,
                and delete the label if not found. The address of the first reference is
                reported.

~ (tilde)       Re-center the selected line towards the middle of the screen. This is
                useful when at the very top or bottom of the screen and you need to see a
                few more lines.

"               Change the hint flags for the current address. There are two bits of
                hint flags that rotate through all four combinations.

$               Change the default hint flags for newly disassembled instructions.


=== TEXT COMMANDS ===

: command       Enter a text command
/ text          Search forward - Searches are case insensitive and collapse all spaces
                together. Searches also ignore the hex address and data fields.
? text          Search backward
; comment       Edits a comment at the current address. (code/data lines only)


While typing in the command line:

Backspace or DEL  Delete character before cursor, or exit input mode if line is empty.
Carriage return   Finish line and start executing it.
Escape            Abort current line and exit input mode.
Left/Right Arrow  Move the cursor left or right.
Ctrl-U / Ctrl-X   Erase from the cursor position to the beginning of the line.
Ctrl-A            Move cursor to start of line.
Ctrl-E            Move cursor to end of line.


Command list:

quit / q        Exit the program. If the disassembly status has been changed, you must
                use ":q!" to override the changes, or save first with ":w".

cpu <cputype>   Sets the current CPU type for disassembly. If defcpu has not yet been set,
                it will also be set to this CPU type.

defcpu <cputype> Sets the default CPU type that will be used for such things as number
                 syntax (0FFH vs $FF) and default endian order for pseudo-ops like DW.

list            Create the file "binfile.lst" as a listing-style file of the whole image.

asm             Create the file "binfile.asm" as a source-style file of the whole image.

save / w        Save the current disassembly state into "binfile.ctl".

load <file> [!] [Bxxxx] [Sxxxx] [Oxxxx]
                Load the binary file.
                - If "file.ctl" exists, it will be used.
                - If the file name is a single "-" character, the current file
                  will be reloaded.
                - If ! is used, the disassembly state in an existing "file.ctl" is
                  completely ignored.
                - The B/S/O parameters will override what is stored in the .ctl
                  file, but will not be saved into the .ctl file until an actual
                  save/w command is done. There must not be a blank before the
                  xxxx address. Changing Oxxxx will almost certainly cause all
                  saved disassembly information to be incorrect.
                  Warning: if the parameters result in a smaller data area,
                  some disassembly state could be lost!

label / l [name] If name is specified, a custom label is added for the current
                code address. If no name is specified, any custom label for the
                current address is removed.

tabs [!] [n] [n] [n] [n] [n]
                Without any parameters, shows the current column widths.
                With parameters, sets column widths.
                "!" causes listing files to generate hard (0x09) tabs. When
                this is used, all column widths must be multiples of 8.
                The five columns are listing address, hex object code, label,
                opcode, and operands. The default values are: ! 8 16 8 8 16

xxxx            A hexadecimal number goes to that address in the image. ":0"
                should always move to the start of the image, just like the
                Home key. The current address is added to the < / > stack
                before going to the new address.

$               Moves to the end of the image, just like the End key.

rst [########]  Sets extra instruction bytes used by the 8080/Z80 RST instruction.
                An 8 digit number sets all the values at once. A value of 1 causes RST
                instructions to disassemble as two bytes, and a value of 9 causes RST
                instructions do disassemble as ten bytes. Default is 00000000.
                If no parameter is provided, it shows the current state.
                Note that changing this will not automatically re-scan the code. You
                must go to each RST instruction and re-disassemble it with the 'c' key.


=== RIP-STOP ===

An auto-tracing disassembler can be very powerful, and with great power comes great responsibility! Each instruction set has its own quirks when it reaches data that isn't code. The problem is that many bogus labels can get created, which are a pain to clean up, and it could run wild over even more data. Each CPU disassembler can report that a given instruction is suspicious, and to stop tracing. The shift-C command stops immediately, and the shift-T command stops the current branch. To manually disassemble an offending instruction, use the lowercase "c" command.

So far this is mostly in the Z-80 and 6502 disassemblers. Here are some of the various trigger conditions:

Z-80:
- three NOP instructions in a row (repeated 0x00)
- two RST 38H instructions in a row (repeated 0xFF)
- LD r,r where r is the same register
- repated LD r1,r2 with the same pair of registers
- branches with an offset of 0x00 or 0xFF
- DJNZ with a forward offset (this is common on 8051, but rarely used on Z-80)

6502:
- two BRK instructions in a row (repeated 0x00)
- branches with an offset of 0xFF
- the rip-stop code also tries to detect "always branch" conditions such as BEQ+BNE

68HC11:
- two NOP or SWI instructions in a row (repeated 0x01 or 0x3F)
- STX $FFFF (repeated 0xFF)
- branches with an offset 0f 0xFF

8008:
- repeated MOV r,r with the same pair of registers
- two NOP instructions in a row
- two HLT instructions in a row (repeated 0xFF)


=== HINT FLAGS ===

Some diassemblers can use a few extra hints on what to do. There are two flag bits for each address, which can be one of four combinations. The meaning of these is different for each disassembler.

65816: Some instructions change length depending on the run-time value of two processor status flags. These have been assigned to set the "LONGA" (immediate loads for the accumulator are 16 bits) and "LONGI" (immediate loads for the index registers are 16 bits) flags.

0 = none, 1 = LONGI, 2 = LONGA, 3 = LONGI + LONGA

8008: Loads of register pairs are by default detected and combined into LXI instructions. In some cases (such as a branch into the second instruction) you will want to override this.

0,2 = combine LXI, 1,3 = don't combine LXI

8048: The bank for long jumps is by default determined by tracing back in the code for SEL MB0/MB1 instructions, but sometimes this is wrong. You can use hints to force which bank to use.

0 = automatic, 1 = current bank, 2 = SEL MB0 bank, 3 = SEL MB1 bank


=== LABELS ===

Addresses can be overriden with custom labels. This lets you name things, or use labels from existing source code, and it lets you add address references for an external ROM.

To add a label, use the ":L" or ":LABEL" command. If you specify a name, a label is added at the current address, otherwise a label is deleted at the current address. Labels are saved in a text file called "file.sym", which can be edited when not running the disassembler.

NOTE: THIS FEATURE IS CURRENTLY EXPERIMENTAL AND UNDER DEVELOPMENT
(but it works rather well for just a few hours of effort!)

- there is no way yet to add labels outside the file range from in the disassembler; you must edit the .sym file manually
- when using the :ASM or :LIST commands, labels outside the binary file do not get labels generated
- there is no check for duplicate labels
- labels are only generated in contexts where code addresses are used, and currently this excludes zero-page addresses
- labels can not be attached to "EQU $-1" lines


=== MISCELLANEOUS STUFF ===

* If the .bin file has been moved to a different directory, the path saved in the .ctl file will not match. There will still be an attempt to load the .ctl file from the same directory as the .bin file.

* The 8048 disassembler has to guess the SEL MB0/MB1 flag for the JMP and CALL instruction bank address. This is done by searching backwards as long as there's contiguous 8048 code. If none is found, it defaults to the bank at the current address. It is possible for a called subroutine to change the bank, and further calls/jumps after the call will have to be manually overridden with hints.

* Not all of the disassembler cores have been thoroughly tested. Some (ARM and PIC) were never quite finished.

* The PIC disassembler creates label names using the byte address rather than the word address.

* The PIC disassembler requires the binary be in little-endian format (low instruction byte first).

* The 78K0 disassembler is mostly working, but the code that I had hoped used it was actually for a different CPU, so it hasn't really been tested.

* Because Thumb often uses the low bit of an address to indicate Thumb code, longword references to Thumb code will be displayed as the even address "+1". You can not use the "^" command to put a label at the even address, but will have to manually add a label at the even address.


=== DISASSEMBLER STATUS ===

This is a rough estimation of the quality of each disassembler:

1 = early attempt, not even sure if all opcodes are correct
2 = actual code has been disassembled and makes some sense, but no way to re-assemble
3 = it generates code that has been re-assembled, but not a lot of it
4 = disassembled code has been regularly re-assembled and compared
5 = it has been used a lot, but some sub-types might be less than perfect

5  disz80.cpp    - heavy usage and re-assemblies, but does not support 8080 mnemonics
5  dis6502.cpp   - 6502 tested with many re-assemblies, 65816 barely tested
5  dis68HC11.cpp - tested with many re-assemblies, but few 6303 examples
5  dis6809.cpp   - heavy usage and re-assmblies, no 6309 support
5  dis68k.cpp    - heavy usage and re-assmblies
4  dis8051.cpp   - tested with many re-assemblies
3  dis8048.cpp   - tested with re-assemblies, but suffers from guessing bank selects
3  disz8.cpp     - tested on a few re-assemblies
3  dis8008.cpp   - tested on a few re-assemblies
2  dis4004.cpp   - tested on the Busicode calculator ROM
2  dis1802.cpp   - sort of works, but few samples and it suffers from split references
2  disf8.cpp     - sort of works, but not enough examples
2  dispic.cpp    - sort of works, but better testing needed
1  disarm.cpp    - still needs a lot of work, can suffer from split references
2  dis7810.cpp   - only tested on one code example
1  dis78K0.cpp   - the code that I thought was 78K0 turned out to be 78K3, so no examples
2  dis78K3.cpp   - only tested on one code example
2  dis8086.cpp   - very minimal, only useful for Small model code without segments
2  dispdp11.cpp  - sort of works, but not enough examples, and it looks weird in hexadecimal
3  disthumb.c    - needs work, and trouble dealing with odd code addresses (Thumb flag)

=== Q AND A ===

Q: Why is it version "4"?
A: Because I've had disassemblers for a very long time. Version "2" was when I added tracing and a configuration file. It was very detailed, and it could do a good job, but it was a real pain to use. Version 3 was rewriting everything to use C++ inheritance. But it was still such a pain to use that for many years I had been wanting to do an interactive disassembler "someday". Once I finally started writing the code, it only took about two months for V4 to become functional. The V3 disassembler cores were then ported to V4.

Q: Why use ncurses and not a real windowing system?
A: Because it is more portable between Unix-like operating systems, and text UIs are underrated. I kept all the use of ncurses to a single .cpp file so that there should only be one layer to replace later. The key bindings were mostly inspired by vi and bash.

Q: Why only one level of undo?
A: Because it's better than zero levels, and doing more would be a lot of work. Actually, I haven't really used it yet, so it's mostly there as a placeholder.

Q: It crashed and now I can't see what I'm typing!
A: When an app using ncurses crashes, it can leave the terminal in an inconvienent state. The command "stty sane" should return things to normal. If it locks up, pressing control-C twice should force exit to the command line with ncurses properly cleaned up.

REMEMBER, IF IT CRASHES YOU MAY NEED TO 'stty sane' TO GET THE TERMINAL BACK

Q: It crashed with an "assert"!
A: Try to reproduce the problem. Go back and do stuff up to right before the problem, then save. Try a step at a time until it crashes. What is needed to debug this is the binary file, the .ctl file from just before the crash, and what keys to press to cause the crash.


=== HINTS AND TIPS ===

* Sometimes you have a binary of an unknown CPU type. It is very helpful to identify hex opcodes before disassembling so that you know which CPU type to select, and also to know where the hardware-defined entry points are. Sometimes even determining the base address of code can be difficult.

* Common endians, opcodes, and vectors for various CPUs:

LE 8080 - CD xx xx / C3 xx xx / C9 / F5 E5 D5 ... D1 E1 F1 / 11 xx xx / 21 xx xx
        - reset at 0000, vector entry points at 0008 0010 0018 0020 0028 0030 0038 0066
LE 6502 - A9 xx / 4C xx xx / 20 xx xx / 60
        - vector address words at FFFx
BE Z8   - D6 xx xx / 8D xx xx / AF
        - reset at 000C, vector address words at 0000-000B
LE 8048 - 02 xx xx / 12 xx xx / 83
        - reset at 0000, interrupt entry points at 0003 0007
BE 8051 - 02 xx xx / 12 xx xx / 20
        - reset at 0000, interrupt entry points at 0003 000B 0013 001B 0023
BE 6800 - BD xx xx / 7E xx xx / 39
        - vector address words at FFFx
BE 6809 - BD xx xx / 
        - vector address words at FFFx
BE 1802 - Dx / Cx xx xx / 70 / 3x xx / F8 xx By F8 xx Ay
BE 68000 - 4E56 0000 / 4E5E 4E75
         - vector address longwords at 0000-00FF
LE 8086 - return instructions C2 C3 CA CB / branch instructions 7xrr with rr F0-10

* CPUs without 16-bit load instructions are harder to work with because there are no register loads of full addresses. You can still put a label at the address, but the references will have to be manually replaced in the resulting .asm file. It is a good idea to put the reference in a comment.

* When you have subroutine calls followed by inline ASCII text or other non-code data that can't disassemble properly, one workaround is:

- use the search function to find one of the bytes of the call instruction ("/ db xxH")
- use "C" to convert the subroutine call to code
- repeat for all such calls
- code tracing will now automatically stop at the already disassembled subroutine call when tracing
- this is of course not as simple as it sounds


=== CHANGELOG ===

4.0.0 - 2022-05-27

Original version

4.1.0 - 2022-06-23

New features:
- Added support for user comments.
- 8051, 8048: added rip-stop for repeated 00H and FFH
- Command line input can now use left and right arrow keys.
- Tab stops are now saved to the .ctl file.

Bugs fixed:
- Fixed a possible crash when started when started with no filename.
- A7H in ASCII mode no longer produces " '''+80H "
- ORG line no longer shows a label
- 68HC11: fixed opcodes that were missed in a table update (18A6 etc.)
- The ')' command now clears the pre-LF attribute of added bytes.
- Fixed an infinite loop when search started from "END" line.
- Width of address/hex fields for search is now taken from tabs[] array.
- It is now possible to change tab stops.
- [ and ] commands will now skip past a pre-LF line.

4.2.0 - 2023-10-31

New features:
- added 68HC05 disassembler
- added ctrl-T command to trace at refaddr of current line
- added a minimal 8086 disassembler
- added a minimal PDP-11 disassembler
- added 8008 disassembler
- added Intel mnemonics for 8080 and 8085 disassembly
- added custom labels

Bugs fixed:
- 68HC11: fixed AIM/OIM/EIM/TIM instructions for 6303
- 6502: fixed some branch-always combinations
- F8: updated instruction definitions
- fixed a caching bug when deleting a label or comment

4.3.0 - 2024-07-18

New features:
- added 6301 to 68HC11 disassembler
- added ARM Thumb diassembler
- added 4004 disassembler

Bugs fixed:
- ORG was only being generated with a 16-bit address
- 68HC11: ASLD was incorrectly being recognized for 6800/6802/6808
- 68HC11: ABY instruction was not recognized
- 8048: hints now affect bank choice for long JMP/CALL

================