

  80x86 assembly language BCD math
  --------------------------------

  Contents:	Copyright and disclaimer
		General considerations
		BCD formats, signs, min & max
		BCD-related CPU instructions
		Programming notes
		- General notes
		- Interfacing and user responsibilities
		- Assembly formalities
		- Procedural formalities
		- Procedure interrelations
		- Algorithms
		- BCDASM equates and macros list
		Credits
		Related material




  Copyright and disclaimer
  ------------------------

  This text and the accompanying assembly source files and
  binary files ("BCDASM") are copyright Morten Elling, 1997.

  You're free to use BCDASM for educational purposes and in
  software (freeware, shareware, or commercial) but use it
  at your own risk. BCDASM is offered without any guarantee.

  You're free to distribute the BCDASM file archive through
  any media at your disposal on condition that the contents
  of the file archive are not modified.


  June, 1997

  Morten Elling
  Ellemarksvej 12
  DK-8000 Aarhus C
  Denmark

  <mailto:elling@post1.tele.dk>




  General considerations
  ----------------------

  Numbers in business applications software must be large and
  precise. Accounting books must balance so floating-point math
  and its potential for rounding errors is insufficient.

  Performing business math in the CPU registers is OK if you
  can make do with numbers in the range +21,474,836.47 to
  -21,474,836.48 (32 bit). Inflation-ridden nations, large
  companies, and Italian car dealers need bigger numbers.

  BCD math is basic in-memory integer math that lets you handle
  numbers of considerable sizes, typically 18 digits but with
  the capacity for several thousand digits. By tradition, BCDs
  are used for business math and BCD schemes are supported by
  several computers. All Intel's iAPx86 chips (PCs) execute BCD
  instructions.


  Numbers in BCD format are not as storage-efficient as numbers
  in binary and they don't process quite as fast as their binary
  cousins (additional CPU instructions are required to adjust
  results). But being the sole base 10 format in an otherwise
  completely hexadecimal and octal world, BCDs do offer some
  advantages. Base 10 feels natural to humans (at least until
  genetic engineering equips us with 16 fingers). BCDs are easy
  to debug and convert to and from Ascii, and are easily divided
  or multiplied by multiples of 10.

  BCDs are integers. Other than supplying a routine that will
  display any number of decimals, BCDASM supports only integer
  operations. This doesn't mean that you cannot use BCDs for
  floating-point operations; it's unusual, though, and it will
  require extra programming but it may pay off if your primary
  concern is multi-digit results without rounding errors. With
  BCDs, keeping track of the decimal point is easier than with
  binary numbers.


  If you are programming for a PC equipped with an FPU
  (floating-point unit), consider using it. This requires the
  use of 10-byte packed BCDs which is the only BCD format
  recognized by the FPU. It also requires a conversion of the
  BCD numbers to and from the FPU's temporary real format.
  None of the BCDASM routines use FPU instructions since they
  all use a caller-supplied BCD size.




  BCD formats
  -----------

  BCD stands for Binary Coded Decimal.

  BCD numbers are made up from the decimal numbers 0 thru 9
  represented in binary:

                Dec     Hex     Binary
                0       00h     0000
                1       01h     0001
                2       02h     0010
                3       03h     0011
                4       04h     0100
                5       05h     0101
                6       06h     0110
                7       07h     0111
                8       08h     1000
                9       09h     1001

  BCD numbers can be packed (the usual) storing two digits per
  byte (the most-significant digit in the high-order 4 bits), or
  un-packed storing one digit per byte (leaving the high-order 4
  bits unused). The low-order 4 bits of a byte is also called
  the low nibble, the high-order 4 bits the high nibble.

  Packed BCD is sometimes called 'packed decimal' or 'decimal
  integer'. Unpacked BCD is sometimes called 'unpacked decimal'
  or 'ASCII integer'.

        The decimal number 8 is represented as 08h in
        packed BCD, and 08h in unpacked BCD. The decimal
        number 28 is 28h in packed BCD (but 1Ch in binary)
        and would require two unpacked BCD bytes.


  Assemblers usually have limited support for BCD handling: the
  DT directive which defines a ten-byte portion of storage, and
  the TBYTE operator which can create a pointer to a ten-byte.
  In Borland's Turbo Assembler (TASM), the DT directive defaults
  to packed BCD format but may be used for other data types as
  well; there is no assembler directive to support unpacked BCD.

  Examples (TASM.EXE v4.1):

  Pnum dt 81659247              ; Packed BCD (10 bytes)
  Unum db 7,4,2,9,5,6,1,8,0,0
       db 0,0,0,0,0,0,0,0,0,0   ; Unpacked BCD (20 bytes)
  Dnum dt 81659247d             ; 'd' postfix for decimal number
  Nnum dt -81659247             ; Negative packed BCD

  The numbers above are stored in memory as (hexadecimal):
  Pnum 47  92  65  81  00  00  00  00  00  00
  Unum 07  04  02  09  05  06  01  08  00  00
       00  00  00  00  00  00  00  00  00  00
  Dnum 6F  05  DE  04  00  00  00  00  00  00
  Nnum 47  92  65  81  00  00  00  00  00  80 ; Top bit = 1

  i.e. the usual Intel little-endian format (the lower in
  memory, the less significant).

        NOTE:
        Borland's 32-bit assemblers (TASM32 v4.0, v5.0)
        do not recognize the unary minus operator when
        BCDs are initialized with the DT directive, i.e.
        only non-negative BCD values can be initialized. 
        Borland's 16-bit assemblers do, however, accept 
        negative and zero BCD values with DT.


  The term 'most significant byte' is avoided here and in the
  source files, in the interest of clarity. Logically, which is
  'most significant' in the variable Nnum above: the 9th byte,
  the 8th, or the 3rd? It depends on what you're considering:
  the BCD format, the numeric range, or the current variable.

  The sign bit (bit 79 of a tbyte) is 1 for negative numbers, 0
  for positive or zero. The whole top _byte_ is used for the
  sign, i.e. 00h or 80h (limiting the number range somewhat, but
  making BCDs easier to program). To change a BCD from positive
  to negative, only the top bit is changed. Note that the FPU
  ignores anything but the sign bit in the top byte of a tbyte
  (bits 72-78).


  Assuming the top byte is used for the sign, we're left with
  (10 - 1) * 2 = 18 digits for the number, hence the range of
  a 10-byte packed signed BCD is:

        minimum -999,999,999,999,999,999
        maximum +999,999,999,999,999,999

  or - since BCDs are primarily used for business calculations -

            +/-$9,999,999,999,999,999.99

  (almost 10 quadrillions in USA and France; 10,000 billions
  elsewhere. Anyway, it was a lot of money in 1976).



        NOTE:
        The 10-byte packed signed BCD described above is
        the basic BCD type supported by the floating-
        point unit (80x87), and it's the one generally
        used in assembly source files.

        However, BCDs come in many flavors. Examples
        include:
        - 8-byte packed BCDs which fit better into the
          80386's 32-bit registers
        - A 6-byte packed BCD with 47 bits used for the
          number and 1 bit for the sign (more storage-
          efficient).
        - A 12-byte unpacked BCD (easy Ascii-conversion)
        - BCDs stored big-endian, even on Intel machines.
          The numbers are easier to read this way when
          debugging, but perhaps they originated from a
          non-PC computer.





  BCD-related CPU and FPU instructions
  (details in a separate file)
  ------------------------------------

  The following members of the 8086 instruction set were
  designed to support BCD math:

  - Packed BCD    (affects the AL and flags registers)
          DAA     Decimal Adjust after Addition
          DAS     Decimal Adjust after Subtraction
  - Un-packed BCD (affects the AL, AH, and flags registers)
          AAA     ASCII Adjust after Addition
          AAS     ASCII Adjust after Subtraction
          AAM     ASCII Adjust after Multiplication
          AAD     ASCII Adjust before Division


  Two FPU instructions support ten-byte BCD numbers:
  - Packed BCD
          FBLD    Load packed decimal
          FBSTP   Store packed decimal and pop




  Programming notes
  -----------------

  --- General notes

  BCDASM supports C, (16-bit) Pascal, and assembly language.

  The source files include routines that perform basic math,
  comparison, and conversion of packed, signed BCD numbers
  (files bcd*.asm). A single module demonstrates operations on
  unpacked, unsigned BCDs (file bcduu.asm).

  The application programming interface (API) of each procedure
  (function) is described in detail in the procedure's source
  file header. The headers have been extracted to the bcdapi.txt
  file which serves as the BCDASM API reference. Similarly, the
  algorithms used in BCDASM are described in comments embedded
  in the source files.

  In this section, therefore, you'll find a description of some
  general topics related to BCDASM, but not of the API.




  --- Interfacing and user responsibilities

  All users must ensure that BCDASM routines run on an 80186
  processor or compatible. No other initialization or shutdown
  steps are required. BCDASM code does nothing but move bytes
  and bits in memory; no operating system calls, interrupts,
  port or file I/O.


  Assembly language users should use a .MODEL statement with a
  STDCALL, C, or PASCAL language specifier and include the
  bcd.asi header file in modules that call BCDASM procedures.
  (Note that a few procedures return carry, sign, or zero flags
  but the comparison and shift routines return other flags than
  you may expect from their names.)

  C language users must include the bcd.h header file in source
  files referencing BCDASM functions.

  Both assembly language and C language users should link with
  the appropriate BCDASM library file: bcd????.lib where ???? is
  a memory model code. Run makelib.bat without parameters to see
  a list of available memory models. Reassembly of the source
  code for 16-bit models requires TASM v3.2 or later, whereas
  the 32-bit flat model source code requires TASM v4.0 or later.

  Turbo Pascal users must reference the unit BCD (compiled as
  bcd.tpu) in a USES statement to access BCDASM procedures and
  functions (16-bit far code). Note that strings returned by
  BCDASM are not Pascal-format but zero-terminated, C-style;
  for convenience, the WriteZStr function is included in the
  unit file.


  BCDASM code typically adds 2-3 KB to the size of an executable
  (max. 4 KB if all routines are linked), and the stack usage is 
  max. 100 bytes (size of passed arguments and local variables).


  All arguments passed to BCDASM routines must be correct; error
  checking is minimal. In particular, pointers must be valid and
  the byte size of BCD variables must be even and obey the
  following limits.

  (In short: if you keep your BCD size even, and below 24 KB
  - enough to hold 49,000+ digits - you're safe.)

    16-bit limits:
    Min.	  4  bytes
    Max.      0FFFCh bytes (65,532 decimal)
    Max.      07FFEh bytes (32,766 decimal) (*)

    32-bit limits:
    Min.	  4  bytes
    Max.  0FFFFFFFCh bytes = approx. 4 GigaBytes
    Max.  07FFFFFFEh bytes = approx. 2 GigaBytes (*)

	(*) Half-size operands required for sign extension,
	    multiplication, and division; bcdSx, bcdImul,
	    and bcdIdiv return results that are twice the
	    size of the source operand.

	Although BCDASM can handle the maximum sizes, your
	operating system (or your patience) may not. Divide
	the max. size figures by 3 or 4 to get the practical
	limit imposed by the bcdFmt routine (BCD-to-Ascii).


  If necessary, users must devise a method to initialize BCD
  variables in software, typically by manipulating an array of
  bytes. Assembly language users have the option of using the DT
  directive when initializing variables.

  All BCDASM modules assume that bit 7 of the top byte of a BCD
  number holds the sign and that bits 0-6 of the top byte are
  undefined.




  --- Assembly formalities

  The assembly source files were written for Borland's Turbo
  Assembler v4.1 (MASM syntax). In order to keep symbol names
  case-sensitive, TASM's /ml switch must be used, and include
  files must be pointed to using the /i switch. No other
  switches are needed since the source code has been arranged to
  allow TASM's one-pass assembly. You can, however, pass the
  desired memory model and processor to TASM, e.g.

    cd .\src
    tasm /ml/i..\include\ /dMDL=16sp /dCPU=.8086 *.asm, ..\obj\;

  Refer to the model.inc file for these definitions; the default
  model is <large, pascal> and the default processor is <.186>.
  Except for bcd.asi, no include file contains an 'include'
  statement.


  Three features are used that require a recent version of TASM:
  - prototyping (PROCDESC, extended CALL)     (TASM v3.2+)
  - enhanced macro facilities (VARARG, REST)  (TASM v3.0+)
  - model FLAT                                (TASM v4.0+)

  Unless you want the 32-bit flat model, it is fairly simple to
  rewrite BCDASM for an earlier version of TASM (and slightly
  less simple to rewrite it for Microsoft's assembler). After
  two years with Ideal mode syntax, I'm back to MASM syntax but
  kept the good habit of bracketing memory references.

  TASM's built-in high-level language support (PROC, USES, ARG,
  LOCAL, RET, ENDP statements) is used extensively throughout
  the source code. While it isn't necessary to use a .MODEL
  statement to make use of the HLL or prototyping features, it
  does make things somewhat easier:
  - canned segments, available with .CODE, .DATA etc.
  - language and code distance specifiers automatically
    transferred to PROC, PROCDESC, PUBLIC, EXTRN etc.
  - predefined CODEPTR and DATAPTR pointer types as well as
    several model-related equates
  - predefined .STARTUP and .EXIT macros available
  - group override with OFFSET operator not needed

  Despite TASM's user guide, ARG, LOCAL, and DATAPTR don't work
  with 32-bit code unless used with a .MODEL FLAT statement;
  without it, args/locals are equated relative to BP, not EBP
  (grrrrr...).




  --- Procedural formalities

  Procedure entry and exit conditions are described in detail in
  each procedure header (see bcdapi.txt). Most procedures return
  information in the accumulator (AX/EAX) and -- for the benefit
  of .asm users -- a few return flags.

  In the 16-bit memory models, no assumptions are made about the
  ES register or direction flag on entry to a procedure. In the
  models that use near data pointers (tiny, small, and medium),
  DS is assumed to be = SS; in all other models, DS register
  contents are unimportant to BCDASM routines (no variables in
  DGROUP). Registers are preserved according to the requirements
  of the selected model language (refer to the @uses macro in
  modelt.inc). BCDASM supports C, STDCALL, and PASCAL. If the
  direction flag is used, it is always cleared on exit.

  In the 32-bit flat model, Win32 rules apply: DS = ES = SS,
  direction flag clear on entry and on exit. EAX, ECX, and EDX
  registers may be modified, the rest are preserved. Calling
  convention is STDCALL.


  Assembly language users who want 16-bit 'flat' functionality
  (near data models) can achieve this by equating @isDSeqESeqSS
  to one (1); it'll save ES loads and ES overrides. Equating
  @isDirectionUp to one (1) saves redundant CLD instructions.
  See modelt.inc for these equates.




  --- Procedure interrelations

  No 'internal' calls; each module is self-contained.




  --- Algorithms

  Basic multi-digit number-processing all the way, using LODS,
  STOS, and LOOP instructions. BCD multiplication is slow
  because this operation involves heavy use of MUL and AAM
  instructions which are among the very slowest on an 8086 but
  part of the BCD scheme. BCD division is similarly slow due to
  a divide-by-repeated-subtraction approach.

  I've emphasized modularity and clarity over speed during the
  development since multi-digit in-memory calculations are used
  primarily for their capacity.

  For much the same reasons, I've used equates (rax, rbx etc.)
  for the actual register names (uppercase AX, BX etc. is used
  for explicit 16-bit operations; lowercase eax, ebx etc. for
  32-bit). Attempting to write assembly for both 16-bit and
  32-bit like this is definitely _not_ to be recommended but I
  wanted to try it out for this occasion. True 80386+ support
  is rudimentary.

  BCDASM leaves room for improvement; grep for "ToDo" to locate
  a few places.




  --- BCDASM equates and macros list

  Defined by TASM:		Defined or redefined by:
  - @Model, @Interface,
    @CodeSize, @DataSize,
    DATAPTR			.model directive
  - @data, @stack		.model directive,
				  segment opening
  - @WordSize			segment opening,
				  processor directive
  - @Startup			built-in .startup macro


  (The equates and macros defined in model.inc and
  modelt.inc are not redefined by any BCDASM module)

  Equates defined in model.inc:
  - CPU, MDL, @fartext

  Equates defined in modelt.inc:
  - @isCodeNear, @isDataFar, @isDataNear, @isStackFar,
    @isStackNear, @isUse32, @isWin32, @is386
  - @isDirectionUp, @isDSeqESeqSS
  - @ES, @dui, @uint, @dsaddr, @dsr, @ssr, @nullptr
  - @bptr, @uiptr, @wptr, sh
  - @fram

  Macros defined in modelt.inc:
  - @CODESEG, @alignn, @ptype, @proto, @uses
  - @LDSEGM, @LDS, @LES, @cld, @shl, @shr
  - @enter, @leave, ENTERW,ENTERD,LEAVEW,LEAVED





  Credits
  -------

  BCDASM is my own work but it may not have reached the .LIB
  state without a glance or two in "intel.zip" from 1990, a
  collection of assembly source files which unfortunately has
  no named author and no copyright notice. Specifically, the
  division and comparison routines in BCDASM were inspired by
  bcddiv.asm and bcdcmp.asm in "intel.zip" but completely
  rewritten for this occasion.

  The algorithm used in the multiplication routine was taken
  from Ray Duncan's MPMUL1.ASM which appeared in PC Magazine
  Nov 28, 1989 (vol. 8 no. 20).




  Related material
  ----------------

  HugeCalc by Neil J. Rubenking, includes Turbo Pascal v6.0
  source code. HugeCalc performs +-*/ exponential and factorial
  math on up to 254-digit integers, using pascal's string type
  to store the numbers as digits. Implemented as a command-line
  calculator, it supports unsigned numbers only. Downloadable at
  <http://www.pcmag.com> as part of VOL10N16.ZIP, the Sept 24,
  1991 PC Magazine file archive.


  MONEY, a C++ money class by Adolfo di Mare, includes Borland
  C++ v2.0 source code. Uses C's double type (floor(double *
  100.0)) with no fractional part to store the numbers, allowing
  15+ digit precision. Downloadable at Simtel as
  <ftp://ftp.simtel.net/pub/simtelnet/msdos/cpluspls/money.zip>.


