But Turbo Pascal and MS-DOS are way out of style now, so I ported the code to plain standard C. It should run on any Unix-like operating system. It should also run on any operating system that provides Unix-style file and command-line handling. In particular, it runs fine under Mac OS X, which is what I use on my laptop.
Porting it to C wasn't enough, though. I had added some nice features like macros to the 6502 assembler and I wanted them in the Z-80 assembler too. But I didn't want to have to copy and paste code every time I added a new feature. So I turned the code inside out, and made the common code into a gigantic .h file. This made writing an assembler for a new CPU easy enough that I was able to write a 6809 assembler in one day, plus another day for debugging.
Unlike most "generic" assemblers, I make an effort to conform to the standard mnemonics and syntax for a CPU, as you'd find them in the chip manufacturer's documentation. I'm a bit looser on the pseudo-ops, trying to be inclusive whenever possible so that existing code has a better chance of working with fewer changes, especially code written back in the '80s.
This is a two-pass assembler. That means that on the first pass it figures out where all the labels go, then on the second pass it generates code. I know there are popular multi-pass assemblers out there (like DASM for 6502), and they have their own design philosophy. I'm sticking with the philosopy that was used by the old EDTASM assemblers for the TRS-80. There are a few EDTASM-isms that you might notice, if you know what to look for.
But being a two-pass assembler, there are some things you can't do. You can't ORG to a label that hasn't been defined yet, because on the second pass it'll have a value, and your code will go into a different location, and all your labels will be at the wrong address. This is called a "phase error". You also can't use a label that hasn't been defined yet with DS or ALIGN because they affect the current location.
Some CPUs like the 6502 and 6809 have different instructions which can provide smaller faster code based on the size of an operand. To make this work, the assembler keeps an extra flag in the symbol table during the second pass, which tells if the symbol was known yet at this point in the first pass. Then the assembler can know to use the longer form to avoid a phase error. The 6809 assembler syntax uses "<" (force 8-bits) and ">" (force 16-bits) to override this decision. The 6502 assembler can also override this with a ">" before an absolute or absolute-indexed address operand. (Note that this usage is different from "<" and ">" as a high/low byte of a word value.)
Some assemblers can only output code in binary. This might be nice if you're making a video game cartridge ROM, but it's really not very flexible. Intel and Motorola both came up with very nice text file formats which don't require any kind of padding when you do an ORG instruction, and don't require silly "segment" definitions just to keep DS instructions from generating object code. Then, following the Unix philosophy of making tools that can connect to other tools, you can pipe the object code to another utility which makes the ROM image.
Anyhow, it works pretty well for what I want it to do.
- Bruce -
make
This will create the asmx binary in the src sub-directory. That's it. Now you will might want to copy it to your /usr/local/bin or ~/bin directory, but that's your choice.
If you are using a unix-like OS such as Linux, OS X, or BSD, you can also use:
make install
This will install the binaries to ~/bin, unless you change the makefile to install it somewhere else. Symbolic links are generated so that each CPU assembler can be used with a separate command.
If you can't use the makefile, the simplest way is this:
gcc *.c -o asmx
Windows users should install Cygwin as the easiest way to get GCC.
asmx [options] srcfile
Here are the command line options:
-- end of options
-e show errors to screen
-w show warnings to screen
-l [filename] make a listing file, default is srcfile.lst
-o [filename] make an object file, default is srcfile.hex or srcfile.s9
-d label[[:]=value] define a label, and assign an optional value
-s9 output object file in Motorola S9 format (16-bit address)
-s19 output object file in Motorola S9 format (16-bit address)
-s28 output object file in Motorola S9 format (24-bit address)
-s37 output object file in Motorola S9 format (32-bit address)
-b [base[-end]] output object file as binary with optional base/end addresses
-t [reclen] output object file in TRSDOS format (implies -C Z80)
-T [reclen] output object file as TRS-80 cassette file (implies -C Z80)
-c send object code to stdout
-C cputype specify default CPU type (currently 6502)
-@ causes the old EDTASM pass/errors messages to be printed
Example:
asmx -l -o -w -e program.asm
This assembles the source file "program.asm", shows warnings and errors to the screen, creates a listing file "program.asm.lst", and puts the object code in an Intel hex format file named "program.asm.hex". (Binary files get named "program.asm.bin", and Motorola S9 files get an extension of .s9, .s19, .s28, or .s37.)
Notes:
The '--' option is needed when you use -l, -o, or -b as the last option on the command line with no parameters, so that they don't try to eat up your source file name. It's really better to just put -l and -o first in the options.
The value in -d must be a number. No expressions are allowed. The valid forms are:
| -d label | defines the label as EQU 0 |
| -d label=value | defines the label as EQU value |
| -d label:=value | defines the label as SET value |
By default, object code is written as an Intel hex file unless the -s or -b option is specified.
The value in -b specifies the start address for your binary file. If you are making code for a ROM at address range 0xC000-0xFFFF, use "-b 0xC000-0xFFFF" and the first byte of the object file will be whatever belongs at 0xC000. Anything at a lower address is not written to the file, any gaps are filled with 0xFF, and no bytes past 0xFFFF are written to the file. The object file is not padded to the full address range. Be careful about using large ORG values without an end address, or the resulting binary file could become VERY large.
The -c and -o options are incompatible. Attempting to use both will result in an error. Normal screen output (pass number, total errors, error messages, etc.) always goes to stderr.
Unary operations take a single value and do something with it. The supported unary operations are:
| + val | positive of val |
| - val | negative of val |
| ~ val | bitwise NOT of val |
| ! val | logical NOT of val (returns 1 if val is zero, else 0) |
| < val | low 8 bits of val |
| > val | high 8 bits of val |
| ..DEF sym | returns 1 if symbol 'sym' has already been defined |
| ..UNDEF sym | returns 1 if symbol 'sym' has not been defined yet |
| ( expr ) | parentheses for grouping sub-expressions |
| [ expr ] | square brackets can be used as parentheses when necessary |
| 'c' | One or two character constants, equal to the ASCII value |
| 'cc' | of c or cc. In the two-byte case, the first character is the high byte. |
| H(val) | high 8 bits of val; whitespace not allowed before '(' |
| L(val) | low 8 bits of val; whitespace not allowed before '(' |
NOTE: with the Z-80, (expr), H(val), and L(val) will likely not work at the start of an expression because of Z-80 operand syntax. Likewise with the 6809, <val and >val may have special meaning at the start of an operand.
Binary operations take two values and do something with them. The supported binary operations are:
| x * y | x multipled by y |
| x / y | x divided by y |
| x % y | x modulo y |
| x + y | x plus y |
| x - y | x minus y |
| x << y | x shifted left by y bits |
| x >> y | x shifted right by y bits |
| x & y | bitwise x AND y |
| x | y | bitwise x OR y |
| x ^ y | bitwise x XOR y |
| x = y | comparison operators, return 1 if condition is true (note that = and == are the same) |
| x == y | |
| x < y | |
| x <= y | |
| x > y | |
| x >= y | |
| x && y | logical AND of x and y (returns 1 if x !=0 and y != 0) |
| x || y | logical OR of x and y (returns 1 if x != 0 or y != 0) |
Numbers:
| . | current program counter |
| * | |
| $ | |
| $nnnn | hexadecimal constant |
| nnnnH | |
| 0xnnnn | |
| nnnn | decimal constant |
| nnnnD | |
| nnnnO | octal constant |
| aaabbbA | split-octal constant, where aaa and bbb are the high and low bytes (377377A = 0xFFFF) |
| %nnnn | binary constant |
| nnnnB |
Hexadecimal constants of the form "nnnnH" don't need a leading zero if there is no label defined with that name.
Operator precedence:
( ) [ ]
unary operators: + - ~ ! < > ..DEF ..UNDEF
* / %
+ -
< <= > >= = == !=
& && | || ^ << >>
WARNING:
Shifts and AND, OR, and XOR have a lower precedence than the comparison
operators! You must use parentheses when combining them with comparison operators!
Example:
Use "(OPTIONS & 3) = 2", not "OPTIONS & 3 = 2". The former checks the
lowest two bits of the label OPTIONS, the latter compares "3 = 2"
first, which always results in zero.
Also, unary operators have higher precedence, so if X = 255, "<X + 1" is 256, but "<(X + 1)" is 0.
With the 6809 assembler, a leading "<" or ">" often refers to an addressing mode. If you really want to use the low-byte or high-byte operator, surround the whole thing with parentheses, like "(<LABEL)". This does not apply to immediate mode, so "LDA #<LABEL" will use the low byte of LABEL.
NOTE: ..def and ..undef do not work with local labels. (the ones that start with '@' or '.')
Labels must begin in the first column of the source file when they are declared, and may optionally have a ":" following them. Opcodes with no label must have at least one blank character before them.
Local labels are defined starting with "@" or ".". This glues whatever is after the "@" or "." to the last non-temporary code label defined so far, making a unique label. Example: "@1", "@99", ".TEMP", and "@LOOP". These can be used until the next non-local label, by using this short form. They appear in the symbol table with a long form of "LABEL@1" or "LABEL.1", but can not be referenced by this full name. Local labels starting with a "." can also be defined as subroutine local, by using the SUBROUTINE pseudo-op.
Comments may either be started with a "*" as the first non-blank character of a line, or with a ";" in the middle of the line.
Lines after the END pseudo-op are ignored as though they were comments, except for LIST and OPT lines.
NOTE: All of the data pseudo-ops like DB, DW, and DS have a limit of 1023 bytes of initialized data. (This can be changed in asmx.h if you really need it bigger.)
Examples:
DS 5 ; skip 5 bytes (generates no object code)
DS 6,"*" ; assemble 6 asterisks
Note that no forward-reference values are allowed for the length because this would cause phase errors.
Examples:
FCC /TEXT/ ; 4 bytes "TEXT"
FCC \TEXT\ ; 4 bytes "TEXT"
In this assembler, FCC is extended by allowing it to work like DB afterward, only with a different quote character. Also, the string delimiter can be repeated twice inside the string to include the delimiter in the string.
Examples:
FCC /TEXT//TEXT/ ; 9 bytes "TEXT/TEXT"
FCC /TEXT/,0 ; 5 bytes "TEXT" followed by a null
FCC /TEXT/,0,/TEXT/ ; 9 bytes "TEXT", null, "TEXT"
There is also a second mode where the length is specified, the text has no quotes, and the text is padded to the specified length with blanks. Be aware that if the text is too short, it will copy more data from your source line, even if you have a comment in the line! However, it will stop copying when it encounters a tab character.
Example:
FCC 9,TEXT <- this is 9 bytes "TEXT "
FCC 9,TEXT;comm <- this is 9 bytes "TEXT;comm"
FCC 9,TEXT;comment <- this is 9 bytes "TEXT;comm", then an error from "ent"
Examples:
HEX 123456 ; assembles to hex bytes 12, 34, and 56
HEX 78 9ABC DE ; assembles to hex bytes 78, 9A, BC and DE
HEX 1 2 3 4 ; Error: hexadecimal digits must be in pairs
ELSIF is the same as "ELSE" followed by "IF", only without the need for an extra ENDIF.
Example:
IF .undef mode
ERROR mode not defined!
ELSIF mode = 1
JSR mode1
ELSIF mode = 2
JSR mode2
ELSE
ERROR Invalid value of mode!
ENDIF
IF statements inside a macro only work inside that macro. When a macro is defined, IF statements are checked for matching ENDIF statements.
| LIST ON / OPT LIST | Turn on listing |
| LIST OFF / OPT NOLIST | Turn off listing |
| LIST MACRO / OPT MACRO | Turn on macro expansion in listing |
| LIST NOMACRO / OPT NOMACRO | Turn off macro expansion in listing |
| LIST EXPAND / OPT EXPAND | Turn on data expansion in listing |
| LIST NOEXPAND / OPT NOEXPAND | Turn off data expansion in listing |
| LIST SYM / OPT SYM | Turn on symbol table in listing |
| LIST NOSYM / OPT NOSYM | Turn off symbol table in listing |
| LIST TEMP / OPT TEMP | Turn on temp symbols in symbol table listing |
| LIST NOTEMP / OPT NOTEMP | Turn off temp symbols in symbol table listing |
| OPT EXACT / OPT NOOPT | Turn off assembler-specific optimizations |
| OPT NOEXACT / OPT OPT | Turn on assembler-specific optimizations (Z80, 68K) |
The default is listing on, macro expansion off, data expansion on, symbol table on, exact off.
Macro calls can be nested to a maximum of 10 levels. (This can be changed in asmx.c if you really need it bigger.)
Example:
TWOBYTES MACRO parm1, parm2 ; start recording the macro
DB parm1, parm2
ENDM ; stop recording the macro
TWOBYTES 1, 2 ; use the macro - expands to "DB 1, 2"
An alternate form with the macro name after MACRO, instead of as a label, is also accepted. A comma after the macro name is optional.
MACRO plusfive parm
DB (parm)+5
ENDM
When a macro is invoked with insufficient parameters, the remaining parameters are replaced with a null string. It is an error to invoke a macro with too many parameters.
Macro parameters can be inserted without surrounding whitespace by using the '##' concatenation operator.
TEST MACRO labl
labl ## 1 DB 1
labl ## 2 DB 2
ENDM
TEST HERE ; labl ## 1 gets replaced with "HERE1"
; labl ## 2 gets replaced with "HERE2"
Macro parameters can also be inserted by using the backslash ("\") character. This method also includes a way to access the actual number of macro parameters supplied, and a unique identifier for creating temporary labels.
\0 = number of macro parameters
\1..\9 = nth macro parameter
\? = unique ID per macro invocation (padded with leading zeros to five digits)
NOTE: The line with the ENDM may have a label, and that will be included in the macro definition. However if you include a backslash escape before the ENDM, the ENDM will not be recognized, and the macro definition will not end. Be careful!
| NONE | No CPU type selected |
| 1802 | RCA 1802 |
| 6502 | MOS Technology 6502 |
| 6502U | MOS Technology 6502 with undocumented instructions |
| 65C02 | Rockwell 65C02 |
| 65816 65C816 | Western Digital 65C816 |
| 68K 68000 | Motorola 68000 |
| 68010 | Motorola 68010 |
| 6805 68HC05 | Motorola 6805 |
| 68HSC08 | Motorola 68HSC08 (6805 variant) |
| 6809 | Motorola 6809 |
| 6309 | Hitachi 6309 |
| 6800 6802 6808 | Motorola 6800 |
| 6801 6803 | Motorola 6801 |
| 6303 | Hitachi 6303 (6800 variant) |
| 6811 68HC11
68HC711 68HC811 68HC99 | Motorola 68HC11 variants |
| 68HC16 | Motorola 68HC16 |
| 8048 | Intel 8048 |
| 8051 8052
8031 8032 | Intel 8051 variants |
| 8080 | Intel 8080 |
| 8085 | Intel 8085 |
| 8080Z | Intel 8080 with Z-80 JR and DJNZ opcodes |
| 8085U | Intel 8085 with undocumented instructions |
| Z80 | Zilog Z-80 |
| Z180 | Zilog Z-180 |
| GBZ80 | Gameboy Z-80 variant |
| Z8085 | Intel 8085 with Z-80 mnemonics |
| Z8 | Zilog Z8 |
| 8008 | Intel 8008 |
| F8 | Fairchild F8 |
| TOM | Atari Jaguar GPU |
| JERRY | Atari Jaguar DSP |
| ARM | ARM (32-bit little-endian) |
| ARM_BE | ARM big-endian |
| ARM_LE | ARM little-endian |
| THUMB | ARM Thumb (16-bit little-endian) |
| THUMB_BE | ARM Thumb big-endian |
| THUMB_LE | ARM Thumb little-endian |
| SCMP | National Semiconductor SC/MP |
At the start of each pass, this defaults to the assembler specified in the "-C" command line option, if any, or the assembler type determined from the name of the executable used on the command line. The latter is useful with soft-links when using Unix-type systems. In that case, the default assembler name can be determined by looking at the end of the executable name used to invoke asmx, then selecting that CPU type.
If no default assembler is specified, the DW/WORD and DL/LONG pseudo-ops will generate errors because they do not know which endian order to use.
Opcodes for the selected processor will have priority over generic pseudo-ops. However, assemblers for CPUs which have a "SET" opcode have been specifically designed to pass control to the generic "SET" pseudo-op.
At the start of each assembler pass, all segment pointers are reset to zero, and the null segment becomes the current segment.
SEG.U is for DASM compatibility. DASM uses SEG.U to indicate an "unitialized" segment. This is necessary because its DS pseudo-op always generates data even when none is specified. Since the DS pseudo-op in this assembler normally doesn't generate any data, unitialized segments aren't supported as such.
RSEG is for compatibility with vintage Atari 7800 source code.
Example:
START
.LABEL NOP ; this becomes "START.LABEL"
SUBROUTINE foo
.LABEL NOP ; this becomes "FOO.LABEL"
SUBROUTINE bar
.LABEL NOP ; this becomes "BAR.LABEL"
SUBROUTINE
LABEL
.LABEL NOP ; this becomes "LABEL.LABEL"
This is primarily intended for using DS pseudo-ops to create data structure offsets, using WORDSIZE 8.
WARNING: using a forward-referenced value could cause phase errors!
See http://www.wolldingwacht.de/if/z-spec.html for more information on the compressed text format.
| U | Undefined | this symbol was referenced but never defined |
| M | Multiply defined | this symbol was defined more than once with different values (only the first is kept) |
| S | SET | this symbol was defined with the SET pseudo-op, or from the -dLABEL:=VALUE command line option |
| E | EQU | this symbol was defined with the EQU pseudo-op, or from the -dLABEL=VALUE command line option |
<label> MACRO start recording the macro with the name <label>
ENDM end the macro
LIST MACRO enable macro expansion in listings
LIST NOMACRO disable macro expansion in listings (default)
<label> use the macro
Note that macros can not currently call other macros.
\0 = number of macro parameters
\1..\9 = nth macro parameter
\? = unique ID per macro invocation (padded with leading zeros to five digits)
This is a somewhat shaky feature right now, and doesn't handle things like word alignment when the DB pseudo-op is used, or even adjusting the address at the left margin of the listing file.