ASMX semi-generic 8-bit assembler - - - FOREWORD ======== Okay, so it's not really generic, just semi generic. This started from an 8080 assembler I wrote in Turbo Pascal back in my college days when I had a class where I had to write an 8080 emulator. The assembler wasn't part of the class, but no way was I going to hand assemble code again like I did back in my early TRS-80 days. Then when I started dabbling with ColecoVision and Atari 2600 code, I made a Z-80 and a 6502 assembler from it. But Turbo Pascal and MS-DOS are way out of style now, so I ported the code to plain standard C. It should run on any Unix-like operating system. It should also run on any operating system that provides Unix-style file and command-line handling. In particular, it runs fine under Mac OS X, which is what I use on my laptop. Porting it to C wasn't enough, though. I had added some nice features like macros to the 6502 assembler and I wanted them in the Z-80 assembler too. But I didn't want to have to copy and paste code every time I added a new feature. So I turned the code inside out, and made the common code into a gigantic .h file. This made writing an assembler for a new CPU easy enough that I was able to write a 6809 assembler in one day, plus another day for debugging. Unlike most "generic" assemblers, I make an effort to conform to the standard mnemonics and syntax for a CPU, as you'd find them in the chip manufacturer's documentation. I'm a bit looser on the pseudo-ops, trying to be inclusive whenever possible so that existing code has a better chance of working with fewer changes, especially code written back in the '80s. This is a two-pass assembler. That means that on the first pass it figures out where all the labels go, then on the second pass it generates code. I know there are popular multi-pass assemblers out there (like DASM for 6502), and they have their own design philosophy. I'm sticking with the philosopy that was used by the old EDTASM assemblers for the TRS-80. There are a few EDTASM-isms that you might notice, if you know what to look for. But being a two-pass assembler, there are some things you can't do. You can't ORG to a label that hasn't been defined yet, because on the second pass it'll have a value, and your code will go into a different location, and all your labels will be at the wrong address. This is called a "phase error". You also can't use a label that hasn't been defined yet with DS or ALIGN because they affect the current location. Some CPUs like the 6502 and 6809 have different instructions which can provide smaller faster code based on the size of an operand. To make this work, the assembler keeps an extra flag in the symbol table during the second pass, which tells if the symbol was known yet at this point in the first pass. Then the assembler can know to use the longer form to avoid a phase error. The 6809 assembler syntax uses "<" and ">" to override this decision. Some assemblers like to output code in binary. This might be nice if you're making a video game cartridge ROM, but it's really not very flexible. Intel and Motorola both came up with very nice text file formats which don't require any kind of padding when you do an ORG instruction, and don't require a silly "segment" definitions just to keep DS instructions from generating object code. Then, following the Unix philosophy of making tools that can connect to other tools, you can pipe the object code to another utility which makes the ROM image. Anyhow, it works pretty well for what I want it to do. - Bruce - - - - BUILDING IT =========== While I normally compile this using Apple's XCode development environment, that's completely unnecessary if all you want to do is use it. All it takes is a single command to GCC to compile it No makefile, no autoconf, no nothing else. For example, to compile the 6502 assembler, just do this: gcc asm6502.c -o asm6502 That's it. Now you will probably want to copy it to your /usr/local/bin or ~/bin directory, but that's your choice. - - - RUNNING IT ========== Just give it the name of your assembler source file, and whatever options you want. asm6502 [options] srcfile Here are the command line options: -l [filename] make a listing file, default is srcfile.lst -o [filename] make an object file, default is srcfile.hex or srcfile.s9 (if -s option specified) -d label[=value] define a label, and assign an optional value to it, default is zero -e show errors to screen -w show warnings to screen -9 create S9 object code instead of Intel hex -p pipe mode: object file goes to stdout -- end of options Example: asm6502 -l -o -w -e program.asm This assembles the 6502 source file "program.asm", shows warnings and errors to the screen, creates a listing file "program.asm.lst", and puts the object code in an Intel hex format file named "program.asm.hex". Notes: The '--' option is needed when you use -l or -o as the last option on the command line, so that they don't try to eat up your source file name. It's really better to just put -l and -o first in the options. The value in -d must be a number. No expressions. The -p and -o options are incompatible. Only the last one on the command line will be used. Normal screen output (pass number, total errors, error messages, etc.) always goes to stderr. - - - EXPRESSIONS =========== Whenever a value is needed, it goes through the expression evaluator. The expression evaluator will attempt to do the arithmetic needed to get a result. Unary operations take a single value and do something with it. The supported unary operations are: + val positive of val - val negative of val ~ val bitwise NOT of val ! val logical NOT of val (returns 1 if val is zero, else 0) < val low 8 bits of val > val high 8 bits of val ..DEF sym returns 1 if symbol 'sym' has already been defined ..UNDEF sym returns 1 if symbol 'sym' has not been defined yet ( expr ) parentheses for grouping sub-expressions [ expr ] square brackets can be used as parentheses when necessary 'c' One or two character constants, equal to the ASCII value 'cc' of c or cc. In the two-byte case, the first character is the high byte. H(val) high 8 bits of val; whitespace not allowed before '(' L(val) low 8 bits of val; whitespace not allowed before '(' Binary operations take two values and do something with them. The supported binary operations are: x * y x multipled by y x / y x divided by y x % y x modulo y x + y x plus y x - y x minus y x << y x shifted left by y bits x >> y x shifted right by y bits x & y bitwise x AND y x | y bitwise x OR y x = y comparison operators, return 1 if condition is true x == y (note that = and == are the same)' x < y x <= y x > y x >= y x && y logical AND of x and y (returns 1 if x !=0 and y != 0) x || y logical OR of x and y (returns 1 if x != 0 or y != 0) Numbers: . current program counter * current program counter $ current program counter $nnnn hexadecimal constant nnnnH hexadecimal constant 0xnnnn hexadecimal constant nnnn decimal constant nnnnD decimal constant nnnnO octal constant %nnnn binary constant nnnnB binary constant Hexadecimal constants of the form "nnnnH" don't need a leading zero if there is no label defined with that name. Operator precedence: ( ) [ ] unary operators: + - ~ ! < > ..DEF ..UNDEF * / % + - < <= > >= = == != & & || || << >> WARNING: Shifts and bitwise AND and OR have a lower precedence than the comparison operators! You must use parentheses in cases like this! Example: Use "(OPTIONS & 3) = 2", not "OPTIONS & 3 = 2". The former checks the lowest two bits of the label OPTIONS, the latter compares "3 = 2" first, which always results in zero. Also, unary operators have higher precedence, so if X = 255, "" often refers to an addressing mode. If you really want to use the low-byte or high-byte operator, surround the whole thing with parentheses, like "( MACRO start recording the macro with the name