ASMX semi-generic assembler - - - FOREWORD ======== Okay, so it's not really generic, just semi generic. This started from an 8080 assembler I wrote in Turbo Pascal back in my college days when I had a class where I had to write an 8080 emulator. The assembler wasn't part of the class, but no way was I going to hand assemble code again like I did back in my early TRS-80 days. Then when I started dabbling with ColecoVision and Atari 2600 code, I made a Z-80 and a 6502 assembler from it. But Turbo Pascal and MS-DOS are way out of style now, so I ported the code to plain standard C. It should run on any Unix-like operating system. It should also run on any operating system that provides Unix-style file and command-line handling. In particular, it runs fine under Mac OS X, which is what I use on my laptop. Porting it to C wasn't enough, though. I had added some nice features like macros to the 6502 assembler and I wanted them in the Z-80 assembler too. But I didn't want to have to copy and paste code every time I added a new feature. So I turned the code inside out, and made the common code into a gigantic .h file. This made writing an assembler for a new CPU easy enough that I was able to write a 6809 assembler in one day, plus another day for debugging. Unlike most "generic" assemblers, I make an effort to conform to the standard mnemonics and syntax for a CPU, as you'd find them in the chip manufacturer's documentation. I'm a bit looser on the pseudo-ops, trying to be inclusive whenever possible so that existing code has a better chance of working with fewer changes, especially code written back in the '80s. This is a two-pass assembler. That means that on the first pass it figures out where all the labels go, then on the second pass it generates code. I know there are popular multi-pass assemblers out there (like DASM for 6502), and they have their own design philosophy. I'm sticking with the philosopy that was used by the old EDTASM assemblers for the TRS-80. There are a few EDTASM-isms that you might notice, if you know what to look for. But being a two-pass assembler, there are some things you can't do. You can't ORG to a label that hasn't been defined yet, because on the second pass it'll have a value, and your code will go into a different location, and all your labels will be at the wrong address. This is called a "phase error". You also can't use a label that hasn't been defined yet with DS or ALIGN because they affect the current location. Some CPUs like the 6502 and 6809 have different instructions which can provide smaller faster code based on the size of an operand. To make this work, the assembler keeps an extra flag in the symbol table during the second pass, which tells if the symbol was known yet at this point in the first pass. Then the assembler can know to use the longer form to avoid a phase error. The 6809 assembler syntax uses "<" (force 8-bits) and ">" (force 16-bits) to override this decision. The 6502 assembler can also override this with a ">" before an absolute or absolute-indexed address operand. (Note that this usage is different from "<" and ">" as a high/low byte of a word value.) Some assemblers can only output code in binary. This might be nice if you're making a video game cartridge ROM, but it's really not very flexible. Intel and Motorola both came up with very nice text file formats which don't require any kind of padding when you do an ORG instruction, and don't require silly "segment" definitions just to keep DS instructions from generating object code. Then, following the Unix philosophy of making tools that can connect to other tools, you can pipe the object code to another utility which makes the ROM image. Anyhow, it works pretty well for what I want it to do. - Bruce - - - - BUILDING IT =========== Version 2.0 now has a make file that should work on Unix/Linux type operating systems. Just run the "make" command from the src sub-directory and it will create the asmx binary. To install it, type "make install" and it will install to ~/bin (unless you change the makefile to install it somewhere else). Symbolic links are generated so that each CPU assembler can be used with a separate command, as in version 1. If you can't use the make file, the simplest way is this: gcc *.c -o asmx That's it. Now you will might want to copy it to your /usr/local/bin or ~/bin directory, but that's your choice. - - - RUNNING IT ========== Just give it the name of your assembler source file, and whatever options you want. asm6502 [options] srcfile Here are the command line options: -- end of options -e show errors to screen -w show warnings to screen -l [filename] make a listing file, default is srcfile.lst -o [filename] make an object file, default is srcfile.hex or srcfile.s9 -d label[[:]=value] define a label, and assign an optional value -s9 output object file in Motorola S9 format (16-bit address) -s19 output object file in Motorola S9 format (16-bit address) -s28 output object file in Motorola S9 format (24-bit address) -s37 output object file in Motorola S9 format (32-bit address) -b [baseaddr] output object file as binary with optional base address -c send object code to stdout -C cputype specify default CPU type (currently 6502) Example: asm6502 -l -o -w -e program.asm This assembles the 6502 source file "program.asm", shows warnings and errors to the screen, creates a listing file "program.asm.lst", and puts the object code in an Intel hex format file named "program.asm.hex". (Binary files get named "program.asm.bin", and Motorola S9 files get an extension of .s9, .s19, .s28, or .s37.) Notes: The '--' option is needed when you use -l, -o, or -b as the last option on the command line, so that they don't try to eat up your source file name. It's really better to just put -l and -o first in the options. The value in -d must be a number. No expressions are allowed. The valid forms are: -d label defines the label as EQU 0 -d label=value defines the label as EQU value -d label:=value defines the label as SET value By default, object code is written as an Intel hex file unless the -s or -b option is specified. The value in -b specifies the start address for your binary file. If you are making code for a ROM that starts at 0xC000, use "-b 0xC000" and the first byte of the object file will be whatever belongs at 0xC000. Anything at a lower address is not written to the file, and any gaps are filled with 0xFF. Be careful about using large ORG values or the resulting binary file could become VERY large. The -c and -o options are incompatible. Attempting to use both will result in an error. Normal screen output (pass number, total errors, error messages, etc.) always goes to stderr. - - - EXPRESSIONS =========== Whenever a value is needed, it goes through the expression evaluator. The expression evaluator will attempt to do the arithmetic needed to get a result. Unary operations take a single value and do something with it. The supported unary operations are: + val positive of val - val negative of val ~ val bitwise NOT of val ! val logical NOT of val (returns 1 if val is zero, else 0) < val low 8 bits of val > val high 8 bits of val ..DEF sym returns 1 if symbol 'sym' has already been defined ..UNDEF sym returns 1 if symbol 'sym' has not been defined yet ( expr ) parentheses for grouping sub-expressions [ expr ] square brackets can be used as parentheses when necessary 'c' One or two character constants, equal to the ASCII value 'cc' of c or cc. In the two-byte case, the first character is the high byte. H(val) high 8 bits of val; whitespace not allowed before '(' L(val) low 8 bits of val; whitespace not allowed before '(' NOTE: with the Z-80, (expr), H(val), and L(val) will likely not work at the start of an expression because of Z-80 operand syntax. Likewise with the 6809, val may have special meaning at the start of an operand. Binary operations take two values and do something with them. The supported binary operations are: x * y x multipled by y x / y x divided by y x % y x modulo y x + y x plus y x - y x minus y x << y x shifted left by y bits x >> y x shifted right by y bits x & y bitwise x AND y x | y bitwise x OR y x = y comparison operators, return 1 if condition is true x == y (note that = and == are the same)' x < y x <= y x > y x >= y x && y logical AND of x and y (returns 1 if x !=0 and y != 0) x || y logical OR of x and y (returns 1 if x != 0 or y != 0) Numbers: . current program counter * current program counter $ current program counter $nnnn hexadecimal constant nnnnH hexadecimal constant 0xnnnn hexadecimal constant nnnn decimal constant nnnnD decimal constant nnnnO octal constant %nnnn binary constant nnnnB binary constant Hexadecimal constants of the form "nnnnH" don't need a leading zero if there is no label defined with that name. Operator precedence: ( ) [ ] unary operators: + - ~ ! < > ..DEF ..UNDEF * / % + - < <= > >= = == != & & || || << >> WARNING: Shifts and bitwise AND and OR have a lower precedence than the comparison operators! You must use parentheses in cases like this! Example: Use "(OPTIONS & 3) = 2", not "OPTIONS & 3 = 2". The former checks the lowest two bits of the label OPTIONS, the latter compares "3 = 2" first, which always results in zero. Also, unary operators have higher precedence, so if X = 255, "" often refers to an addressing mode. If you really want to use the low-byte or high-byte operator, surround the whole thing with parentheses, like "( MACRO start recording the macro with the name