asmx semi-generic assembler
Okay, so it's not really generic, just semi generic. This started
from an 8080 assembler I wrote in Turbo Pascal back in my college days
when I had a class where I had to write an 8080 emulator. The assembler
wasn't part of the class, but no way was I going to hand assemble code
again like I did back in my early TRS-80 days. Then when I started
dabbling with ColecoVision and Atari 2600 code, I made a Z-80 and a 6502
assembler from it.
But Turbo Pascal and MS-DOS are way out of style now, so I ported the
code to plain standard C. It should run on any Unix-like operating
system. It should also run on any operating system that provides
Unix-style file and command-line handling. In particular, it runs fine
under Mac OS X, which is what I use on my laptop.
Porting it to C wasn't enough, though. I had added some nice features
like macros to the 6502 assembler and I wanted them in the Z-80
assembler too. But I didn't want to have to copy and paste code every
time I added a new feature. So I turned the code inside out, and made
the common code into a gigantic .h file. This made writing an
assembler for a new CPU easy enough that I was able to write a 6809
assembler in one day, plus another day for debugging.
Unlike most "generic" assemblers, I make an effort to conform to the
standard mnemonics and syntax for a CPU, as you'd find them in the chip
manufacturer's documentation. I'm a bit looser on the pseudo-ops,
trying to be inclusive whenever possible so that existing code has a
better chance of working with fewer changes, especially code written
back in the '80s.
This is a two-pass assembler. That means that on the first pass it
figures out where all the labels go, then on the second pass it
generates code. I know there are popular multi-pass assemblers out
there (like DASM for 6502), and they have their own design philosophy.
I'm sticking with the philosopy that was used by the old EDTASM
assemblers for the TRS-80. There are a few EDTASM-isms that you might
notice, if you know what to look for.
But being a two-pass assembler, there are some things you can't do.
You can't ORG to a label that hasn't been defined yet, because on
the second pass it'll have a value, and your code will go into a
different location, and all your labels will be at the wrong
address. This is called a "phase error". You also can't use a
label that hasn't been defined yet with DS or ALIGN because they
affect the current location.
Some CPUs like the 6502 and 6809 have different instructions which
can provide smaller faster code based on the size of an operand.
To make this work, the assembler keeps an extra flag in the symbol
table during the second pass, which tells if the symbol was known
yet at this point in the first pass. Then the assembler can know
to use the longer form to avoid a phase error. The 6809 assembler
syntax uses "<" (force 8-bits) and ">" (force 16-bits) to override
this decision. The 6502 assembler can also override this with a
">" before an absolute or absolute-indexed address operand. (Note
that this usage is different from "<" and ">" as a high/low byte
of a word value.)
Some assemblers can only output code in binary. This might be nice
if you're making a video game cartridge ROM, but it's really not
very flexible. Intel and Motorola both came up with very nice text
file formats which don't require any kind of padding when you do an
ORG instruction, and don't require silly "segment" definitions
just to keep DS instructions from generating object code. Then,
following the Unix philosophy of making tools that can connect to
other tools, you can pipe the object code to another utility which
makes the ROM image.
Anyhow, it works pretty well for what I want it to do.
- Bruce -
HOW TO BUILD asmx
The standard way to build asmx is using the makefile:
This will create the asmx binary in the src sub-directory.
That's it. Now you will might want to copy it to your
/usr/local/bin or ~/bin directory, but that's your choice.
If you are using a unix-like OS such as Linux, OS X, or BSD, you
can also use:
This will install the binaries to ~/bin, unless you change the
makefile to install it somewhere else. Symbolic links are generated
so that each CPU assembler can be used with a separate command.
If you can't use the makefile, the simplest way is this:
gcc *.c -o asmx
Windows users should install Cygwin as the easiest way to get GCC.
Just give it the name of your assembler source file, and
whatever options you want.
asmx [options] srcfile
Here are the command line options:
-- end of options
-e show errors to screen
-w show warnings to screen
-l [filename] make a listing file, default is srcfile.lst
-o [filename] make an object file, default is srcfile.hex or srcfile.s9
-d label[[:]=value] define a label, and assign an optional value
-s9 output object file in Motorola S9 format (16-bit address)
-s19 output object file in Motorola S9 format (16-bit address)
-s28 output object file in Motorola S9 format (24-bit address)
-s37 output object file in Motorola S9 format (32-bit address)
-b [base[-end]] output object file as binary with optional base/end addresses
-c send object code to stdout
-C cputype specify default CPU type (currently 6502)
asmx -l -o -w -e program.asm
This assembles the source file "program.asm", shows warnings and
errors to the screen, creates a listing file "program.asm.lst", and
puts the object code in an Intel hex format file named "program.asm.hex".
(Binary files get named "program.asm.bin", and Motorola S9 files get an
extension of .s9, .s19, .s28, or .s37.)
The '--' option is needed when you use -l, -o, or -b as the last option
on the command line with no parameters, so that they don't try to eat up your source file
name. It's really better to just put -l and -o first in the options.
The value in -d must be a number. No expressions are allowed. The
valid forms are:
|-d label || defines the label as EQU 0
|-d label=value || defines the label as EQU value
|-d label:=value || defines the label as SET value
By default, object code is written as an Intel hex file unless the -s or
-b option is specified.
The value in -b specifies the start address for your binary file. If you
are making code for a ROM at address range 0xC000-0xFFFF, use "-b 0xC000-0xFFFF"
and the first byte of the object file will be whatever belongs at 0xC000. Anything
at a lower address is not written to the file, any gaps are filled
with 0xFF, and no bytes past 0xFFFF are written to the file. The object file is not
padded to the full address range. Be careful about using large ORG values without
an end address, or the resulting binary file could become VERY large.
The -c and -o options are incompatible. Attempting to use both will
result in an error. Normal screen output (pass number, total errors,
error messages, etc.) always goes to stderr.
Whenever a value is needed, it goes through the expression evaluator.
The expression evaluator will attempt to do the arithmetic needed to
get a result.
Unary operations take a single value and do something with it. The
supported unary operations are:
|+ val ||positive of val
|- val ||negative of val
|~ val ||bitwise NOT of val
|! val ||logical NOT of val (returns 1 if val is zero, else 0)
|< val ||low 8 bits of val
|> val ||high 8 bits of val
|..DEF sym ||returns 1 if symbol 'sym' has already been defined
|..UNDEF sym ||returns 1 if symbol 'sym' has not been defined yet
|( expr ) ||parentheses for grouping sub-expressions
|[ expr ] ||square brackets can be used as parentheses when necessary
|'c' ||One or two character constants, equal to the ASCII value
|'cc' ||of c or cc. In the two-byte case, the first character
is the high byte.
|H(val) ||high 8 bits of val; whitespace not allowed before '('
|L(val) ||low 8 bits of val; whitespace not allowed before '('
NOTE: with the Z-80, (expr), H(val), and L(val) will likely not work at
the start of an expression because of Z-80 operand syntax. Likewise
with the 6809, <val and >val may have special meaning at the start of
Binary operations take two values and do something with them. The
supported binary operations are:
|x * y ||x multipled by y
|x / y ||x divided by y
|x % y ||x modulo y
|x + y ||x plus y
|x - y ||x minus y
|x << y ||x shifted left by y bits
|x >> y ||x shifted right by y bits
|x & y ||bitwise x AND y
|x | y ||bitwise x OR y
|x ^ y ||bitwise x XOR y
|x = y ||comparison operators, return 1 if condition is true|
(note that = and == are the same)
|x == y
|x < y
|x <= y
|x > y
|x >= y
|x && y ||logical AND of x and y (returns 1 if x !=0 and y != 0)
|x || y ||logical OR of x and y (returns 1 if x != 0 or y != 0)
|. ||current program counter
|$nnnn ||hexadecimal constant
|nnnn ||decimal constant
|nnnnO ||octal constant
|%nnnn ||binary constant
Hexadecimal constants of the form "nnnnH" don't need a leading zero if
there is no label defined with that name.
( ) [ ]
unary operators: + - ~ ! < > ..DEF ..UNDEF
* / %
< <= > >= = == !=
& && | || ^ << >>
Shifts and AND, OR, and XOR have a lower precedence than the comparison
operators! You must use parentheses when combining them with comparison operators!
Use "(OPTIONS & 3) = 2", not "OPTIONS & 3 = 2". The former checks the
lowest two bits of the label OPTIONS, the latter compares "3 = 2"
first, which always results in zero.
Also, unary operators have higher precedence, so if X = 255, "<X + 1" is
256, but "<(X + 1)" is 0.
With the 6809 assembler, a leading "<" or ">" often refers to an addressing
mode. If you really want to use the low-byte or high-byte operator, surround
the whole thing with parentheses, like "(<LABEL)". This does not apply to
immediate mode, so "LDA #<LABEL" will use the low byte of LABEL.
..def and ..undef do not work with local labels. (the ones that start with
'@' or '.')
LABELS AND COMMENTS
Labels must consist of alphanumeric characters or underscores, and must
not begin with a digit. Examples are "FOO", "_BAR", and "BAZ_99". Labels
are limited to 255 characters. Labels may also contain '$' characters, but
must not start with one.
Labels must begin in the first column of the source file when they are
declared, and may optionally have a ":" following them. Opcodes with no
label must have at least one blank character before them.
Local labels are defined starting with "@" or ".". This glues whatever
is after the "@" or "." to the last non-temporary code label defined so far,
making a unique label. Example: "@1", "@99", ".TEMP", and "@LOOP". These
can be used until the next non-local label, by using this short form. They
appear in the symbol table with a long form of "LABEL@1" or "LABEL.1", but
can not be referenced by this full name. Local labels starting with a "."
can also be defined as subroutine local, by using the SUBROUTINE pseudo-op.
Comments may either be started with a "*" as the first non-blank character
of a line, or with a ";" in the middle of the line.
Lines after the END pseudo-op are ignored as though they were comments,
except for LIST and OPT lines.
These are all the opcodes that have nothing to do with the instruction
set of the CPU. All pseudo-ops can be preceeded with a "." (example:
".BYTE" works the same as "BYTE")
All of the data pseudo-ops like DB, DW, and DS have a limit of 1023
bytes of initialized data. (This can be changed in asmx.h if
you really need it bigger.)
.6502 / .68HC11 / etc.
The CPU type can be specified this way in addition to the CPU and
Creates a text string preceeded by a single byte indicating the length
of the string. This is equivalent to a Pascal-style string.
Generates an error if expr is false (equals zero).
This ensures that the next instruction or data will be located on a
power-of-two boundary. The parameter must be a power of two (2, 4,
8, 16, etc.)
This is an alias for PROCESSOR.
DB / BYTE / DC.B / FCB
Defines one or more constant bytes in the code. You can use as many
comma-separated values as you like. Strings use either single or
double quotes. Doubled quotes inside a string assemble to a quote
character. The backslash ("\") can escape a quote, or it can
represent a tab ("\t"), linefeed ("\n"), or carriage return ("\r")
character. Hex escapes ("\xFF") are also supported.
DW / WORD / DC.W / FDB
Defines one or more constant 16-bit words in the code, using the
native endian-ness of the CPU. With the 6502, Z-80, and 8080, the
low word comes first; with the 6809, the high word comes first.
Quoted text strings are padded to a multiple of two bytes. The
data is not aligned to a 2-byte address.
DL / LONG / DC.L
Defines one or more constant 32-bit words in the code, using the
native endian-ness of the CPU. With the 6502, Z-80, and 8080, the
low word comes first; with the 6809, the high word comes first.
Quoted text strings are padded to a multiple of four bytes. The
data is not aligned to a 4-byte address.
Define Reverse Word - just like DW, except the bytes are reversed
from the current endian setting.
DS / RMB / BLKB
Skips a number of bytes, optionally initialized.
DS 5 ; skip 5 bytes (generates no object code)
DS 6,"*" ; assemble 6 asterisks
Note that no forward-reference values are allowed for the length
because this would cause phase errors.
This prints a custom error message.
This is an alias for ALIGN 2.
Motorola's equivalent to DB with a string. Each string starts and
ends with the same character. The start/end character must not be
alphanumeric or an underscore.
FCC /TEXT/ ; 4 bytes "TEXT"
FCC \TEXT\ ; 4 bytes "TEXT"
In this assembler, FCC is extended by allowing it to work like DB
afterward, only with a different quote character. Also, the
string delimiter can be repeated twice inside the string to
include the delimiter in the string.
FCC /TEXT//TEXT/ ; 9 bytes "TEXT/TEXT"
FCC /TEXT/,0 ; 5 bytes "TEXT" followed by a null
FCC /TEXT/,0,/TEXT/ ; 9 bytes "TEXT", null, "TEXT"
There is also a second mode where the length is specified, the text has
no quotes, and the text is padded to the specified length with blanks.
Be aware that if the text is too short, it will copy more data from your
source line, even if you have a comment in the line! However, it will
stop copying when it encounters a tab character.
FCC 9,TEXT <- this is 9 bytes "TEXT "
FCC 9,TEXT;comm <- this is 9 bytes "TEXT;comm"
FCC 9,TEXT;comment <- this is 9 bytes "TEXT;comm", then an error from "ent"
This marks the end of code. After the END statement, all input is
ignored except for LIST and OPT lines.
EQU / = / SET / :=
Sets a label to a value. The difference between EQU and SET is that
a SET label is allowed to be redefined later in the source code.
EQU and '=' are equivalent, and SET and ':=' are equivalent.
Defines raw hexadecimal data. Individual hex bytes may be separated by
HEX 123456 ; assembles to hex bytes 12, 34, and 56
HEX 78 9ABC DE ; assembles to hex bytes 78, 9A, BC and DE
HEX 1 2 3 4 ; Error: hexadecimal digits must be in pairs
IF expr / ELSE / ELSIF expr / ENDIF
Conditional assembly. Depending on the value in the IF statement,
code between it and the next ELSE / ELSIF / ENDIF, and code between
an ELSE and an ENDIF, may or may not be assembled.
ELSIF is the same as "ELSE" followed by "IF", only without the need
for an extra ENDIF.
IF .undef mode
ERROR mode not defined!
ELSIF mode = 1
ELSIF mode = 2
ERROR Invalid value of mode!
IF statements inside a macro only work inside that macro. When
a macro is defined, IF statements are checked for matching ENDIF
This inserts the contents of the named binary file into the object
code output. The size of the binary file is shown in the listing.
This starts reading source code from the named file. The file is
read once in each pass. INCLUDE files can be nested to a maximum
of 10 levels. (This can be changed in asmx.c if you really need
LIST / OPT
These set assembler options. Currently, the options are:
|LIST ON / OPT LIST ||Turn on listing
|LIST OFF / OPT NOLIST ||Turn off listing
|LIST MACRO / OPT MACRO ||Turn on macro expansion in listing
|LIST NOMACRO / OPT NOMACRO ||Turn off macro expansion in listing
|LIST EXPAND / OPT EXPAND ||Turn on data expansion in listing
|LIST NOEXPAND / OPT NOEXPAND ||Turn off data expansion in listing
|LIST SYM / OPT SYM ||Turn on symbol table in listing
|LIST NOSYM / OPT NOSYM ||Turn off symbol table in listing
|LIST TEMP / OPT TEMP ||Turn on temp symbols in symbol table listing
|LIST NOTEMP / OPT NOTEMP ||Turn off temp symbols in symbol table listing
The default is listing on, macro expansion off, data expansion on,
symbol table on.
MACRO / ENDM
Defines a macro. This macro is used whenver the macro name is
used as an opcode. Parameters are defined on the MACRO line,
and replace values used inside the macro.
Macro calls can be nested to a maximum of 10 levels. (This can
be changed in asmx.c if you really need it bigger.)
TWOBYTES MACRO parm1, parm2 ; start recording the macro
DB parm1, parm2
ENDM ; stop recording the macro
TWOBYTES 1, 2 ; use the macro - expands to "DB 1, 2"
An alternate form with the macro name after MACRO, instead of as
a label, is also accepted. A comma after the macro name is
MACRO plusfive parm
When a macro is invoked with insufficient parameters, the remaining
parameters are replaced with a null string. It is an error to invoke
a macro with too many parameters.
Macro parameters can be inserted without surrounding whitespace
by using the '##' concatenation operator.
TEST MACRO labl
labl ## 1 DB 1
labl ## 2 DB 2
TEST HERE ; labl ## 1 gets replaced with "HERE1"
; labl ## 2 gets replaced with "HERE2"
Macro parameters can also be inserted by using the backslash ("\")
character. This method also includes a way to access the actual
number of macro parameters supplied, and a unique identifier for
creating temporary labels.
\0 = number of macro parameters
\1..\9 = nth macro parameter
\? = unique ID per macro invocation (padded with leading zeros to five digits)
NOTE: The line with the ENDM may have a label, and that will be included in the
macro definition. However if you include a backslash escape before the ENDM, the
ENDM will not be recognized, and the macro definition will not end. Be careful!
Sets the origin address of the following code. This defaults to
zero at the start of each assembler pass.
This selects a specific CPU type to assemble code for. Some assemblers
support multiple CPU sub-types. Currently supported CPU types are:
|NONE ||No CPU type selected
|1802 ||RCA 1802
|6502 ||MOS Technology 6502
|6502U ||MOS Technology 6502 with undocumented instructions
|65C02 ||Rockwell 65C02
|6809 ||Motorola 6809
|6800 6802 ||Motorola 6800
|6801 6803 ||Motorola 6801
|6805 ||Motorola 6805
|6303 ||Hitachi 6303 (6800 variant)
|Motorola 68HC11 variants
|68HC16 ||Motorola 68HC16
|68HSC08 ||Motorola 68HSC08 (6805 variant)
|68K 68000 ||Motorola 68000
|68010 ||Motorola 68010
|8051 ||Intel 8051 variants
|8080 ||Intel 8080
|8085 ||Intel 8085
|8085U ||Intel 8085 with undocumented instructions
|ARM ||ARM (32-bit little-endian)
|ARM_BE ||ARM big-endian
|ARM_LE ||ARM little-endian
|THUMB ||ARM Thumb (16-bit little-endian)
|THUMB_BE ||ARM Thumb big-endian
|THUMB_LE ||ARM Thumb little-endian
|F8 ||Fairchild F8
|TOM ||Atari Jaguar GPU
|JERRY ||Atari Jaguar DSP
|Z80 ||Zilog Z-80
|GBZ80 ||Gameboy Z-80 variant
At the start of each pass, this defaults to the assembler specified
in the "-C" command line option, if any, or the assembler type determined
from the name of the executable used on the command line. The latter is
useful with soft-links when using Unix-type systems. In that case, the
default assembler name can be determined by looking at the end of the
executable name used to invoke asmx, then selecting that CPU type.
If no default assembler is specified, the DW/WORD and DL/LONG pseudo-ops
will generate errors because they do not know which endian order to use.
Opcodes for the selected processor will have priority over generic
pseudo-ops. However, assemblers for CPUs which have a "SET" opcode have
been specifically designed to pass control to the generic "SET" pseudo-op.
Ends an RORG block. A label in front of REND receives the relocated
address + 1 of the last relocated byte in the RORG / REND block.
Sets the relocated origin address of the following code. Code
in the object file still goes to the same addresses that follow
the previous ORG, but labels and branches are handled as though
the code were being assembled starting at the RORG address.
SEG / RSEG / SEG.U segment
Switches to a new code segment. Code segments are simply different
sections of code which get assembled to different addresses. They
remember their last location when you switch back to them. If no
segment name is specified, the null segment is used.
At the start of each assembler pass, all segment pointers are reset
to zero, and the null segment becomes the current segment.
SEG.U is for DASM compatibility. DASM uses SEG.U to indicate an
"unitialized" segment. This is necessary because its DS pseudo-op
always generates data even when none is specified. Since the DS
pseudo-op in this assembler normally doesn't generate any data,
unitialized segments aren't supported as such.
RSEG is for compatibility with vintage Atari 7800 source code.
SUBROUTINE / SUBR name
This sets the scope for temporary labels beginning with a dot.
At the start of each pass, and when this pseudo-op is used with
no name specified, temporary labels beginning with a dot use the
previous non-temporary label, just as the temporary labels
beginning with an '@'.
.LABEL NOP ; this becomes "START.LABEL"
.LABEL NOP ; this becomes "FOO.LABEL"
.LABEL NOP ; this becomes "BAR.LABEL"
.LABEL NOP ; this becomes "LABEL.LABEL"
Specifies the CPU's word size in bits. This is for CPUs which do not
support byte addressing. If the word size is zero, the native CPU
word size is used. Currently only the Jaguar DSP/GSP uses a word
size that is not equal to 8.
This is primarily intended for using DS pseudo-ops to create data
structure offsets, using WORDSIZE 8.
Creates a compressed text string in the version 1 Infocom format.
Otherwise, this works exactly like the DB pseudo-op. Note that this
will always generate a multiple of two bytes of data.
WARNING: using a forward-referenced value could cause phase errors!
http://www.wolldingwacht.de/if/z-spec.html for more information
on the compressed text format.
There is also one CPU-specific pseudo-op:
With the 6809 assembler, this sets the current value of the
direct page register, for determining whether to use direct or
extended mode. It defaults to zero at the start of each
SYMBOL TABLE DUMP
The symbol table is dumped at the end of the listing file. Each
symbol shows its name, value, and flags. The flags are:
|U||Undefined||this symbol was referenced but never defined
|M||Multiply defined||this symbol was defined more than once with
different values (only the first is kept)
|S||SET||this symbol was defined with the SET pseudo-op, or from
the -dLABEL:=VALUE command line option
|E||EQU||this symbol was defined with the EQU pseudo-op, or from
the -dLABEL=VALUE command line option
Version 1.1 changes (April 1995)
(this version was on the original Starpath "Stella Gets a New Brain" CD)
Version 1.2 changes (September 1996)
- Added macro parameters
- Added FCB, FDB, RMB, BYTE, WORD, and BLKB pseudo-ops
- Added binary constants (xxxxxxxxB and %xxxxxxxx)
- Added ASLA, ROLA, LSRA, and RORA opcodes
- Added '$' by itself as the current location, in addition to '.' and '*'
- DB and DW opcodes now work properly with multiple operands
- Added &, |, <<, and >> to expression evaluator. Each of these operators
has the same precedence, below + and -.
- Default file extension was changed from .A65 to .ASM
- Allowed command line options to be specified in lowercase
- Added a new RORG "relative ORG" pseudo-op to allow assembling code that
will later be moved to a different address
Version 1.3 changes (December 1996)
- No more than one macro could be defined, because the macro list wasn't
being linked properly.
- Errorlevel 1 is now returned if any errors occurred during assembly.
Version 1.4 changes (February 2002)
- Added FCC pseudo-op
- Lines beginning with "*" are now comments
- Jump indirect wouldn't work correctly if the line had a comment
- No error was generated if too many macro parameters were used
- Maximum macro parameters increased from 5 to 10
- Lines from macro expansions with errors are now displayed
- Symbol table is now displayed with more than one column
- Lines that generate more than 3 bytes are flagged with a "+"
Version 1.5 changes (2004-02-24)
- Converted from Pascal to C.
- Added 65C02 opcodes. They can be disabled at compile time with a #define.
- Improved output with more than 3 bytes of object code from a line.
- Now outputs code in Intel Hex format. Old object format still available
by changing a #define.
Version 1.6 changes (2004-04-30)
- Separated common code so that my Z-80 assembler could use the new features
like macros and INCLUDE.
- Fixed a bug that would cause phase errors with one-character label names.
- Fixed a bug that would prevent control pseudo-ops from working after END
- Added DC.B, DC.W, .BYTE, and .WORD pseudo-ops for compatibility with source
code written for various other assemblers.
- DB now handles double quotes, \r \n \t \' and \". A bug in the handlng of
paired quotes ("foo""bar") was fixed.
- new DRW pseudo-op to do a DW in reverse order (such as for Atari 7800 Maria
chip display lists)
- added INCLUDE pseudo-op. Maximum nesting level is set to 10 by a #define.
- added PROCESSOR 6502/PROCESSOR 65C02 pseudo-op to select 65C02 instructions,
and for improved DASM compatiblity. The default is 6502 instructions only.
- increased the size of bytStr from 256 to 1024 bytes. This will allow more
data for long-data pseudo-ops like DB, etc.
- added support for DS pseudo-op with an initializer. This is currently
limited to a max of 1024 bytes. (DB, DW, and FCC also have this limit, but
do not yet have a bounds check for it.)
- added HEX pseudo-op. This also currently limited to a max of 1024 bytes.
- [ and ] can now be used as parentheses in expressions, for better DASM source
- REND pseudo-op added for better DASM source compatibility. Note that a label
on the REND line will receive the relocated value of the last relocated code
byte + 1.
Version 1.7 changes (2004-08-25)
- Added a 6809 assembler back-end.
- Added an 8085 assembler back-end. Couldn't just add 8080/8085 style opcodes
to the Z-80 assembler because the JP opcode with no condition in the Z-80
style mnemonics conflicts with JP in the 8080/8085 style mnemonics.
Irony: this assembler originally started out a long time ago as an 8080
assembler, and now it is again! I did this new 8080 code from scratch,
rather than porting the old Turbo Pascal code to C.
- Added a 68HC11 assembler back-end, with support for 6800, 6801, and 6303
instruction set variations.
- Added support for undocumented 6502 instructions, enabled with "PROCESSOR
6502U". Only the most reliable and useful instructions are implemented:
3-cycle NOP (mnemonic NOP3), LAX, DCP, ISB, RLA, RRA, SAX, SLO, SRE, ANC,
ARR, ASR, and SBX.
- Added pseudo-ops to 6502 assembler to select CPU type without PROCESSOR
pseudo-op: .6502, .6502U, .65C02
- Added conditional assembly using IF <expr> / ELSE / ELSIF <expr> / ENDIF.
Maximum official nesting level is 255, but beyond that it will still work if
you don't try wierd stuff like multiple ELSE statements in a row. <expr> is
an expression that is false if it is equal to 0 and true if it is not equal
- Added new operators to the expression evaluator, because IF needed them:
..def symbol (returns 1 if symbol is defined)
..undef symbol (returns 1 if symbol is not defined)
&&, || (boolean and/or, returns either 1 or 0)
< <= > >= = == != (returns either 1 or 0; = and == are the same)
Beware the operator precedence when using these!
- Added new H() and L() operators for compatiblity with vintage Atari 7800
source code, equivalent to >() and <(). Note that no whitespace is allowed
before the left paren.
- Symbols may now contain the '$' character, but they must not start with it.
This, combined with H() and L(), will allow the original Atari 7800 ROM
source code to compile as-is.
- Single quote operator can now handle 2-byte constants. The first character
becomes the high byte of the constant.
- Added 0xnnnn hexadecimal constants (but not 0nnn octal constants).
- Fixed problems with DB, DW, DS, and FCC pseudo-ops.
- Added bounds checking to DB, DW, and FCC pseudo-ops. They are now limited to
MAX_BYTSTR - 1 bytes (1023).
- Added ERROR pseudo-op.
- Added ALIGN pseudo-op.
- Added SEG/RSEG/SEG.U pseudo-op.
- INCLUDE pseudo-op can now accept file names surrounded by single or double
- Handled division by zero.
- Improved handling of form feed characters in source files.
- Fixed some bugs in macro handling, and expanded maximum macro parameters from
10 to 30.
- Temporary labels can now start with '.' in addition to '@'.
- Labels can be defined from the command line with "-d label=value". If value
is not specified, the label is set to zero.
- Added -9 command line option to generate Motorola S9-record code output.
- Added -c command line option to send object code to stdout, to allow piping
- Console output now goes to stderr.
- Error messages now include file name and line number.
- Bytes out of -127..256 range now generate a warning instead of an error.
- Rewrote the documentation.
Version 1.7.1 changes (2004-10-20)
- Added exclusive-or ("^") to the expression evaluator. This has the same
precedence as the existing AND and OR operators.
- DB has been modified to handle the case of arithmetic parameters which
start with a single-quoted character, such as "DB 'X'+$80".
- 6809 conditional long branches were off by one.
- Changed 68HC11 assembler file name to all lowercase.
- Found out that asm68hc11.c wasn't being included in the zip archive anyhow.
- Added an F8 assembler back-end. Since I have no real code to test this on,
it should be considered experimental.
Version 1.7.2 changes (2005-08-21)
- Added the SUBROUTINE pseudo-op.
- Added BLKB pseudo-op as an alias to DS.
- Fixed a bug that would cause errors if the line after an ENDM
was not a blank line.
- F8 assembler relative branches were off by one.
- Other minor changes to F8 assembler, including allowing expressions for
register numbers in more places, and 'A' and 'B' as hex register numbers
when not inappropriate.
- Tweaked symbol table output to show up to 19 characters of symbol names,
and to fit in 80 columns.
- ENDIF lines and the guts of MACRO declarations did not respect the
LIST OFF setting.
- Macro parameters can now be inserted without surrounding whitespace using
the '##' concatenation operator, similar to how it works in C macros.
- 6502 assembler can now force absolute or absolute-indexed addressing mode
(instead of zero-page and zero-page-indexed) by using '>' in front of the
address, similar to how the 6809 does it.
- When not using the -w option, warning lines would still print to the
- Z-80 assembler can now take parameters after a RST instruction, which are
interpreted as DB bytes. Example: RST 08H,'('
- Fixed a bug that could cause phase errors if a forward-referenced label was
used with DW.
- started work on REP pseudo-op (code still under construction)
- Added an 1802 assembler back-end.
- Added opcode table of common pseudo-ops to asmguts.h, and removed "DoStdOpcode"
pass-thru in order to keep more generic stuff out of the CPU-specific .c files.
- CPU_BIG_ENDIAN/CPU_LITTLE_ENDIAN #define now automatically controls
DW/DRW opcodes and Instr3W/Instr4W/Instr5W calls.
- Cleanups for GCC 4 stricter type checking: string signedness with Str255 type
and forward declaration of OpcdTab.
- Short branch range check was allowing branches too far forward.
- Z-80 assembler couldn't do "CP (a+b)/256" (or SUB, AND, OR, XOR) because it was
expecting "CP (HL)" etc. Now it falls back and tries to evaluate the parens as
an immediate operand.
- Added OPT NOEXPAND to disable expanding hex output for more than one listing line.
- All standard pseudo-ops can now optionally start with a period, such as .DB, .DW,
- Standard pseudo-ops can now start with a period in column 1.
- Symbol table output handles long symbols better by stretching really long symbol
names into multiple columns.
Version 1.7.3 changes (2006-01-23)
Version 1.7.4 changes (2006-11-09)
- Updated Z-80 assembler with improved parsing techniques from the 8051
- In Z-80 assembler, "LD BC,(foo - 1) * 256" wouldn't assemble properly.
- Added ZSCII pseudo-op to generate Infocom-style compressed text strings
in the version 1 format. (versions 2 and later use a dictionary table
which is impractical to supply or generate)
- Fixed a bug that caused long DB, DW, etc. statements to list most of their
lines in a macro when macro listing was turned off.
- Fixed a bug that could cause a negative result to appear as "FFFF" in the
listing for the EQU and RORG pseudo-ops.
- Added LIST SYM / LIST NOSYM to disable symbol table in listing file.
- Added "FLAG.BIT" format to EQU / SET pseudo-op in 8051 assembler.
- ELSIF pseudo-op would cause a "Too many operands" error if its matching IF
expression evaluated to true.
Version 1.8 changes (2007-01-11)
- made changes to allow for unified assembler
- got rid of instr vs bytStr distinction and INSTR_MAX
- new common error calls: BadMode() MissingOperand()
- InstrFoo() calls renamed to be a bit more descriptive (Instr5W -> InstrBBBW, etc.)
- made endian a variable, not a #define
- removed usage() from invidivual assemblers
- NUKED HARD TABS in C source code
- updated other assemblers to use FindReg/GetReg from Z80
- listing files now use eight more columns for hex data, and with new InstrFoo() calls,
spaces can now be put between opcodes and operands in hex data
- common assembler code is now mostly 32-bit clean
- added a 68000 assembler. 68010-only instructions can be enabled with a #define, except
BKPT was left enabled because it is still semi-valid for 68000/68008
- 68HC16 referred to PSHM as "PUSHM" in a few places
- 68HC16 was mis-assembling ANDP and ORP as 8-bit immediate instructions
- 8051 was mis-assembling ORL/ANL/XRL dir,#imm instructions
- 8051 was mis-assembling ORL/ANL/XRL dir,A instructions
- added new ASCIIC pseudo-op for counted-length (Pascal-style) strings
- single-quoted character constants can now use the same backslash escapes as
the double-quoted strings in the DB pesudo-op
- added support for "three-tab" listing format with hex data in 24 columns
which puts blanks between parts of instructions
- added support for 32-bit addresses and symbols in listings (requires three-tab)
- symbols can now start with a $ if you set a #define, but this prevents
hexadecimal constants starting with a $ from working
- single-quoted constants are no longer limited to two bytes
- CR-terminated source files would do bad things
- object code output for Intel hex records now supports 32-bit addresses using record
types 04 and 05
- object code output for Motorola S9 records now supports 32-bit addresses using new
"-s##" command line option, where ## is the type of file: 9, 19, 28, or 37. The number
is used for the file name of the object code output file. 9 and 19 are identical except
for the name of the object code output file.
- fixed a really subtle bug that would cause ALIGN/EVEN to use up bytes when they weren't
supposed to (this bug dates back to the first appearance of ALIGN!)
- DS pseudo-op no longer allows forward-declared lengths, which cause phase errors
- DC.W and DC.L now suport text strings with null padding alignment after every string literal.
Note that alignment is internal-only, and any alignment error at the start will be preserved.
Version 2.0a1 changes (2007-01-12)
- Unified all assemblers into one binary. CPU type selection is as follows:
- if the CPU type can be determined from the executable name (either by renaming the
executable or creating a Unix file link to it), that becomes the default CPU type at
the start of each assembler pass
- using the CPU or PROCESSOR pseudo-op selects a new CPU type
- using a "." followed by the CPU type (".68000") also selects a new CPU type
- note that if no CPU has been selected and there is no default, the DW and DL pseudo-ops
will generate an error because they don't know what the current endian setting should be
- symbol table dump is in the widest address width used
- 8080 and 8085 are now broken up in to separate CPU types along with 8085U for the
undocumented 8085 instructions
- SUB pseudo-op alias was changed to SUBR because so many CPUs already have a SUB opcode
- added an 8048 assembler, with no specific support for 8041/8021/8022 subsets yet
Version 2.0a2 changes (2006-01-13)
- Z80 would allow LD r,I or LD r,R with destination registers other than A
- added Gameboy variant to Z80 assembler
- fix from 2.0a1: ".CPUTYPE" in column 1 now works
- Z80 ADD/ADC/SBC can be made to not require the "A," if you #define NICE_ADD
Version 2.0a3 changes (2007-01-14)
- Fixed a bug with 68K "CMP Dn,Dn" that got the registers reversed
- 68K branches are made as short as possible if the destination is a known value
Version 2.0a4 changes (2007-02-02)
- added a 6805 assembler
- fix from 2.0a1: symbol table dump was always 32-bits, now fixed
- fix from 2.0a2: Gameboy Z80 support had CPU type flags moved from opcode table parm
to opcode type instead.
- fixed a problem with ..DEF/..UNDEF which would cause them to return incorrect results
during the first assembler pass, resulting in phase errors if they were used for
- 68K: EA of "(operand)" or "(operand.W)" or "(operand.L)" did not work
- 68K: EA of "0(An)" or "(0,An)" now assembles as "(An)" if offset contains no forward defs
- 68K: "TST (1)*4(A1)" didn't work but "MOVE.L (A0,D0.W),1*4(A1)" did
- 68K: ADD/CMP/SUB Dn,An now assembles as ADDA/CMPA/SUBA, but EA,An still reports an error
- -d command line option now does a SET rather than an EQU if you use ":=" instead of "="
- added 65C816 support to 6502 assembler
- rearranged multi-CPU handling so that each CPU type could have its own endian and listing
parameters, and its own opcode table, too. This was primarily done because of 65C816.
- added a new ADDR_24 listing parameter for CPUs with 24-bit address such as 68000 and 65C816
- Macros could be used before they are defined. In the first pass there would be an error,
but in the second pass code would be generated and phase errors would happen. Now an
error is generated if a macro is used before it is defined.
Version 2.0b1 changes (2007-02-05)
- 68K: "EOR Dn,Dn" did not assemble properly
- ADDR_24 list mode was one character too wide for DB data
- Added Atari Jaguar GPU/DSP support.
- ROLQ instruction equivalent to RORQ 32-n,Rn
- pre-defined JR/JUMP condition codes
Caveat: labels are byte addresses, so if you use MOVE PC,Rn and compare the address,
or if you make a table of addresses, you need to divide by two. This needs some way
to inherently divide Tom/Jerry code addresses by 2. I'll probably figure something
out before the final 2.0 version.
Version 2.0b2 changes (2007-02-22)
- added a CPU type of "NONE"
- when printing an error message, extra hex data listing lines are no longer printed to stderr
- 68K/GBZ80 changed some value range warnings to errors
- 68K: longwords in instructions are now spaced in the listing as longwords when possible
- 68K: ADDI/CMPI/SUBI #imm,An now assembles as ADDA/CMPA/SUBA
- 68K: ADD/CMP/SUB EA,An now assembles as ADDA/CMPA/SUBA
- 68K: warning when branch could be shorter
- 68K: merged changes back to 1.8.2
- assembler labels in column 1 are now checked for starting with a non-numeric character
- added INCBIN pseudo-op to include binary files
- INCLUDE wasn't allowing blanks in file names, even when quotes were used
- a binary object file can now be generated with with the "-b [baseaddr]" option
- hex baseaddr must be specified with a leading "0x"
- out-of-order bytes will cause a padding fill of 0xFF bytes
- bytes below baseaddr will not be put into file
- added support for CPUs with an addressing granularity (the resolution of a pointer)
larger than a byte, such as the Jaguar GPU/DSP. When a code label is defined, or the
current location pointer is read (with '.', '*', or '$'), it is divided by the number
of bytes per word. This can be overriden with the WORDSIZE pseudo-op, which either
specifies a new word size or, if zero, returns to the CPU's native width.
This is a somewhat shaky feature right now, and doesn't handle things like word
alignment when the DB pseudo-op is used, or even adjusting the address at the left
margin of the listing file. I'll see about improving it if someone starts using this
- added ASCIZ/ASCIIZ pseudo-op, which is like DB except that it adds a null to the end
of the data
- added a new listing option TEMP/NOTEMP to not list temp symbols (containing a "."
or "@") in symbol table at end of listing file
Version 2.0b3 changes (2007-07-07)
- Added 68HSC08 support to 6805 assembler.
- Fixed a bug with some direct-page instructions in the 6805 assembler.
- Invalid CPU type in -C command line option is now a fatal error.
- Fixed a really old bug in the expression evaluator that would allow missing
operands, then fixed some bugs in the 8048 assembler that the fix brought out.
- Fixed a serious bug in conditonal assembly. In an IF...ELSIF...ELSE...ENDIF block, if
any condition before the last ELSIF was true, the ELSE block would be assembled.
Version 2.0b4 changes (2007-08-06)
- Made HTML version of documentation.
- Started adding ARM and ARM Thumb instruction sets, at the ARM5 level.
(Thumb is currently complete, ARM is missing LDC/STC and ADR/ADRL.)
- Changed 'parm' in opcode table to a u_long to handle more complex
- Changed opcode parser to accept opcodes ending in wildcards.
Version 2.0b5 changes (2009-10-31)
- Added ARM LDC/STC instructions.
- ARM still needs range checks
- Added ASSERT pseudo-op.
- -b command line option now takes an end address (ex: "0x0000-0xFFFF")
- Error checking is now done on numbers in -b and -d command line options.
- Added -t command line option to output TRSDOS format binary files.
- Started to deprecate "@" for temp labels (#define TEMP_LBLAT to enable @-labels)
- Z80 can now use labels starting with and containing '$' and '@' (for some old TRSDOS code I had)
This is enabled by OPT_ATSYM and OPT_DOLLARSYM options in the AddCPU call)
- Added 6309 support to 6809 assembler
Version 2.0b6 changes (2009-xx-xx)
- Added parameter to -t to allow choosing TRSDOS binary block size (for easier confirmation of disassembled code).
- Removed classic EDTASM pass/errors messages by default. They can be enabled with the -@ command line parameter.
- FIXME: 6309 AIM/EIM/OIM/TIM instructions incorrect (xIM #immed,DIR/IDX/EXT), TFM not allowed to use CC, DP, W, V, 0, 00, PC
- FIXME: "*" in column 1 comments no longer work with 6502?
- Code needs to be reformatted to Linux kernel style for reliable SCM merges
- * error or warning if -b and data is out of range? (reset warning at ORG?)
- * make test files for assembler pseudo-ops
- DS should be scaled by the WORDSIZE?
- The SUBROUTINE pseudo-op needs to be tweaked. It should either define the subroutine name
as a label, or use the label on the left side of the line as the name of the subroutine.
- change "out of range" warnings in DB/DW to errors?
- need to test what happens with 32-bit symbols on 16-bit and 24-bit address CPUs
- negative symbols vs $8000-$FFFF symbols? maybe RefSym should sign-extend from 16 bits?
- masking or wraparound of values for location counter? (sign-extend locPtr as "." too?)
- 65C816: need addressing mode force characters
- add Z80 undocumented instructions
- "Z8085" CPU type for 8085 with Z80 syntax
- double-check 8048 instruction set, and add support for 8048 variants
- see if it's possible to get labels starting with "$" compatible with $xxxx hex constants,
maybe in RefSym?
- Implement REP (or REPEAT) pseudo-op (currently under construction).
- Implement ".FOO." operators? (.SHL. .AND. .OR., etc.) If I do this, I will probably change
..DEF and ..UNDEF to .DEF. and .UNDEF.
- 6809 WARNDP pseudo-op? (I think this was for "direct page could be used here" warnings on absolute addressing mode?)
- Linkable/relocatable object code files (long-term 3.0 goal).
- 8051 EQU needs to AND with 0xFF87, not 0x87?