FWKCS(TM) Contents_Signature System, Ver. 2.04, 1995 Aug 30.
(C)Copyright Frederick W. Kantor 1989, 1995. All rights reserved.



New or changed in FWKCS version 2.04:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On the basis of experimental tests, the contents_signature ("cs") originally
introduced in this software in the 1980's, would appear to carry a typical
pairwise statistical error rate of less than, or of the order of, one part in
ten trillion (1/(10,000,000,000,000)) -- more than 1000 times as good as the
32_bit CRC.  This additional statistical resolution has been important in
serving the needs of electronic bulletin boards, because they often contain
more files than can be reliably distinguished by the 32_bit CRC.  This has
played a role in reducing the risk of accident during the automatic
recognition of duplicate files with changed names.

In the years since the original contents_signature was introduced, file
collections have grown until they contain so many files that the original cs
does not provide enough statistical resolution for the level of reliability
desired.

Starting with version 2.04, FWKCS supports a new, enhanced contents_signature
("long cs"), and continues to support the original contents_signature ("short
cs").  The new, enhanced FWKCS cs has an estimated typical statistical
pairwise error rate of less than, or of the order of, one part in 1.0 E+51;
that is, less than, or of the order of, one part in 1,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000.  The original cs and the new
long cs are generated using assembly_language code, in a conveniently short
time.  The new cs long includes the original short cs.  The two cs's can be
used in the same data base:  FWKCS uses a set of rules to automatically
handle the combination.

This enhanced statistical resolution permits FWKCS to serve the needs of the
bulletin board community, makes the FWKCS contents_signature a more powerful
tool for identifying and protecting intellectual property, and can assist in
using FWKCS in situations involving larger statistical bases.

With this enhanced statistical resolution, FWKCS continues to provide
split_second lookup for finding matching contents_signatures.

Background
~~~~~~~~~~
FWKCS is the premier system for automatically recognizing duplicate files
and duplicate zipfiles, independent of filename.  It is used on major
electronic bulletin board systems; quality control is backed by more than
5,000,000 node_hours on giant systems, through which have passed copies of
a large fraction of all the different shareware zipfile products that our
civilization has seen.

When used on a network or in a multitasking environment, FWKCS can provide
24_hour_a_day operation with no down_time for maintenance -- all normal
maintenance operations, including consolidating the data base etc, are
transparent to users; they can be done while the system is up and running and
handling traffic. It is even possible to search the system and rebuild the
data base while the system is up and running.

Series 2.nn includes features to help you protect your system from becoming
involved in software piracy.  For ease in updating, the anti-piracy resource
material for use with FWKCS is being distributed in a companion series,
FWKCXnnn.ZIP; that series started with FWKCX001.ZIP, issued 1995 Jan 16
(note: that series number is not tied to the FWKCS version number).  This
resource material allows you to use powerful FWKCS features for the automatic
recognition and automatic blocking of files, independent of filename.  For
where to get the most recent release, see "Note 2:" near the end of
README.TXT.

The executable code needed, is provided in this package (FWKCS204.ZIP).
See especially XCLEANUP.BAT and FLAG_REV.BAT (both are automatically
installed in your \CS directory; for on_line help in your \CS directory,
while in that directory do  CSM <enter> ).

------------------------

For current users, below is a summary of what is new or changed in FWKCS(TM)
Version 2.04.  This release includes a program (REPLACE.BAT) which lets you
replace your existing, working version of FWKCS, Ver. 1.12 or later, while
keeping your working CS lists, logs, special messages, and configuration.

Changes in FWKCS.EXE:
~~~~~~~~~~~~~~~~~~~~~
1. To control the generation and use of the original contents_signature and
   the new, longer contents_signature, the following commands have been
   added, where N = 0...5, M = 0...15 :

  internal variable setting FWKCS N.M /dcs (see CS line on  FWKCS /d!  screen)

      environment variable FWKCSL=N.M (if not (0...5).(0...15), then ignored)

                  preface option &N.M

 Auxiliary Function 2...6 option &N.M

   In each case, for N.M

      N is bit mapped, 0...5 :

         0 - compatibility mode (default).
         1 - make long cs for plain files.
         2 - if long cs input, then require a long cs for match.
         4 - use only short cs for match even if long cs input.

              For example, "3" means do 1 and do 2.

      M is bit mapped, 0...15 :

         0 - compatibility mode (default).
         1 - if long_cs data found in zipfile, make long cs.
         2 - if missing, prepare long_cs data (+ measure filelengths).
         4 - prepare data whether or not missing.
         8 - revise central directory of zipfile.

         Notes: 2+4 is allowed, but component 4 automatically overrides
                component 2; e.g., 6 has same effect as 4, 14 has same effect
                as 12;

                FWKCS /1 may call PKUNZIP (using whatever name is specified
                in line "6." on the FWKCS /d screen) to unzip files;

                if (2 or 4) and not 8, then 1 is automatically enabled;

                a zip_encrypted (PKZIP option -s) file is not processed to
                make a long cs;

                if a zipfile contains one or more entries for which a long
                cs is not provided, then its zipfile_contents_signature is
                short rather than long;

                if on analysis by FWKCS a zipfile appears defective, FWKCS
                may still prepare long cs's, but in the case of single
                zipfiles returns errorlevel = 7;

                for making the composite long_zipfile_contents_signature,
                each file with 32_bit_CRC=0 and uncompressed_filelength=0
                is skipped; if every file in a zipfile has CRC = 0 and
                uncompressed filelength = 0, the long zipfile contents
                signature includes the MD5 hash for an empty file (this
                avoids treating the non_zero definition for the MD5 nul as
                cumulative); MD5 hash is written in the style set by its
                originators: the lowest byte is at the left, broken into
                its high hexadecimal "nibble" (a "nibble" is 4 bits)
                followed by its low hexadecimal "nibble", then the next
                higher byte is written in the same style (high hexadecimal
                nibble, then low hexadecimal nibble), etc; the long zcs
                uses the MD5 128_bit numbers summed mod 2^128 before being
                converted to an MD5_style string of hexadecimal characters;

                when revising the zipfile central directory to insert MD5
                data, FWKCS also inserts the measured uncompressed file
                length (to avoid tampering);

                when processing and revising zipfiles, FWKCS can process
                zipfiles which contain files which have DOS filenames, long
                filenames, filenames which contain gaps, and filenames with
                multiple "." (e.g., OS/2, Unix), including zipped paths each
                of which can contain up to 127 levels of subdirectories;

                if the zipfile has a Zipfile Authenticity Verification stamp,
                its AV stamp is preserved.

         For example, 1.11 means make long cs for plain files, make long
         cs's for zipfiles if missing and revise the zipfiles, use long
         cs's of plain files and of files in zipfiles, and allow match
         testing against long or short cs (if both types are compared,
         statistical resolution is limited by shorter cs).

2. A long zipfile contents signature is supported, generated in a way which
   avoids treating the non_zero MD5 hash for a zero_length file as cumulative
   (see 'Zipfile_Contents_Signature ("zcs")' via the FWKCS204.REF Table of
   Contents).

3. New function, /1z- , to remove MD5 data from zipfile central directory.

4. When a long cs is present, the "Column_17" flags appear instead in
   column 50.

5. sorting CSLIST on flags, filenames, etc, with mixed long and short cs's:
   /s option A - adjust key pointers as needed to allow for different cs
      lengths: the key positions are defined as for the original cs's, and
      those which fall on column 17 or later are automatically shifted when
      a long cs is found.

   For example, to sort on filenames,

                   FWKCS CSLIST1.SRT OUTFILE /sa18:12

6. Revised code to support scanning for duplicates and making MULTCNT.RPT,
   /1sm , with mixed long and short cs's; can override with N=4 in /&N.M/ or
   in environment variable FWKCSL=N.M

7. New function /c4 , to convert a contents_signature list containing
   long_cs's and short_cs's to all short_cs's, delete named outfile if no
   output:

                   FWKCS CSLIST.SRT /c4 OLDCS

   Note that the output file should be sorted before being indexed.

   If you have multiple lines with different column_50 flags and the same
   contents_signature, you may want to separate out those lines which contain
   a j, k, l, or r flag in column_50 before using /c4, process them
   separately, append them to the main processed file using FWKQA, and then
   sort using  FWKCS filename /s  .

8. New option  c  for use with function f (Find) or g (Get), to test only the
   32_bit CRC (first 8 characters) for a match; the input can be as short as
   8 characters (the hexadecimal representation for a 32_bit CRC).

9. New option  c  under function /c2 : /c2c tests only the 32_bit CRC (first
   8 characters) for a match; the input can be as short as 8 characters (the
   hexadecimal representation for a 32_bit CRC).

10. FWKCS checks to see if the computer supports 32_bit code; if so, it uses
    32_bit code where appropriate for generating contents_signatures.

11. New preface option /&p suppress 32_bit code.

12. Modified function /A7.2, so that /A7.2!! can be used to divide a CSLIST
    containing both long and original contents_signatures into two separate
    files each containing only one kind of contents_signature. For any other
    character used as a flag, the search automatically adjusts for the
    presence of a long cs, so that flags in column 17 for original cs's and
    column 50 for long cs's are treated as equivalent.

13. When processing zipfile uploads and using the new enhanced cs, FWKCS can
    now provide virus testing for files which have DOS filenames, long
    filenames, filenames which contain gaps, and filenames with multiple "."
    (e.g., OS/2, Unix), including zipped paths each of which can contain up
    to 127 levels of subdirectories; multiple files with the same name but
    different zipped paths can be processed, without permitting one file to
    overwrite and block the virus testing of another file with the same name.

14. New format option for function /A7.8 : w1 option for single space
    listing of filenames, ELSE doublespace.

15. The "swap" commands  #; swap all,  #luvz; swap if List Unzip scanV Zip,
    long available under Auxiliary Functions 2-6, are now available as
    "preface options" which can be specified before any family of non_preface
    functions.  Note that the swap command cannot appear as the first preface
    option, because the combination "/#" is reserved for other use. However,
    the default time value for tN is 3 seconds, so you can use the preface
    option combination "/t3#;" without affecting the t value.

16. New option  0  for the reVise function, v0 instead of v , to keep only
    the first entry, each, for an unflagged short cs with CRC=0 Filelength=0,
    and for an unflagged long cs with CRC=0 Filelength=0 MD5=MD5(nul); this
    also applies to Concurrent revision using /tNNNv0cNNNNN.

17. Added FWKCS header line, and registration, to messages prepared under
    Auxiliary Functions 5 and 6 (BBS); added new option &wN N=0 1 2,
      0 put statement at top of FWKCS mid_message
      1 put at top of (composite) output
      2 put at bottom of (composite) output
    0 is default, other values are enabled in registered copy.

18. Modified code re registration key, to accept a registration name up to 64
    characters long (spaces count as characters); it still accepts the prior
    keys.

19. Added ability to find zipfile central directory information in the
    presence of a variety of pointer errors; this permits processing of
    "VENDINFO.DIZ" files, which, as of this writing, do not comply with the
    published standards for zipfile central directory structure and contents.

20. Added option v7 under Auxiliary Functions 4 - 6, to not count VENDINFO.DIZ
    files found inside zipfiles as zipfile_in_zipfile.

21. Added option v7 under /1 commands, to treat VENDINFO.DIZ as nonzip.

22. Added z7 option under Auxiliary Functions 3 - 6, so that if a filename
    has the extension .ZIP, then the file is required to be a zipfile.
    For Auxiliary Functions 5, 6, added in &a; (how to treat files ATTACHed
    to messages (e.g., PCBoard 15.0 or later)) a corresponding option z7, to
    retain any z7 restriction used for non_ATTACHed files (default is to drop
    any such restriction when processing an ATTACHed file).

23. Added /1 options re committing output file:
    o - commit Output of contents_signature lines, use default buffer size
        (networks)
    o1 - like o, but write cs output promptly for each processed file or
        zipfile.

24. Modified to /1 option m report of overflow more than 4157 matches are
    found in a single case; if that occurs, mNNN can be used to capture
    samples, with the ability to display a much larger number of matches
    at the end of the sample line.

25. Added option e for index command /i, to Evaluate contents_signature
    format for both short cs and long cs; when e not used, /i can make an
    index for a list containing long cs, short cs, and 32_bit CRC.

26. Added four exit errorlevels:

      90 - successive lines not in ascending ASCII order.
      91 - bad contents_signature.
      92 - line too long.
      93 - line too short.

    Note: if sorting using option A (see 5, above), and errorlevel 93 is
    generated when a line is too short for a keyed sort, the value reported
    for the minimum required line length is that for a line using a short cs;
    add 33 (decimal) when the line starts with a long cs.

27. Added /1 display option:

      g onGoing display of filecount + current d:\path\filename.ext.

28. Fixed a bug in keyed sorting when not_last key open or longer than line.

29. Fixed a bug which, apparently under certain circumstances under a certain
    network driver, could result in accidental deletion of files in the \CS
    directory.

30. Modified code to avoid a shift of file date when deleting spurious tails
    from .GIF files on HPFS drive under OS/2 Warp 3.

31. Various minor changes.

Changes in FWKCSC.COM:
~~~~~~~~~~~~~~~~~~~~~~
1. Added test for UPLOAD at second position on command tail when FWKCSC is
   running as a bulletin board system client under an FWKCS host and there
   has not been a timely reply from host:

     IF UPLOAD
     AND a directory has been designated for storing unchecked uploads
     AND running as BBS client
      THEN move the input file to that target directory
           copy any file description to a companion target directory (with
             ".D" added to directory name) and use the same name of the file
             it describes for the copy of the description

2. Various minor changes.

Changes in FWKDG.COM:
~~~~~~~~~~~~~~~~~~~~~
1. Added  v  Verbose text, to capture full entry from text descriptions in
   directories DIR0...DIR999 made in same format used in Clark Development
   Company's PCBoard, including multiline file descriptions. (added to
   support new FWKFF.COM, below; for an example of how to make a search file
   for a BBS, see "Using CRLF0, FWKCS, FWKFF, and FWKDG together" (without
   the quotes) in FWKCS204.REF)

2. Added support for text directory names listed on command line, wildcards
   OK; can use d:\path\.

   Format: FWKDG (drive (lastdriv)) /tNNNoption ((d:\path\)textdir) ...)

   The number of text directory identifiers (each of which can contain wild
   cards * ?) which can be listed on the command line is limited by the DOS
   command line length of 127 characters.

   If a text directory is specified on the command line, the default of
   searching text directories DIR0 - DIR999 in the current directory is
   suppressed; to also search the default text directories, use "," (without
   quotes) as an entry on the command line; to suppress search of any text
   directory, put " ." alone after the /option.

   To suppress searching drives, put a "." (without quotes) before the
   /option, instead of a drive letter.

3. If on network, FWKDG opens text directory files "read_only deny_none";
   tNNN option specifies how long FWKDG tries to access a text directory
   file not known to be missing.

4. Various minor changes.

Changes in FWKFT.COM:
~~~~~~~~~~~~~~~~~~~~~
1. Can now directly open file listed on command line for input, and in that
   mode can support network read_only deny_none (it can still accept
   redirected input).

2. Various minor changes.

Changes in FWKQA.COM:
~~~~~~~~~~~~~~~~~~~~~
1. Added network feature of opening files "deny_none".

2. Various minor changes.

Changes in other .COM programs:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Various minor changes in FWKHI.COM, DSA.COM, FWKCSS.COM, FWKCST.COM,
CRLF0.COM ("FWKCRLF0(TM)"), FWKEM.COM, FWKM.COM, and FWKLW.COM.

New .COM program:  FWKFF.COM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FWKFF(TM) provides conveniently fast, case sensitive search, starting in the
left column, of a sorted (ascending) ASCII file, where each line ends with
"carriage_return line_feed" (hexadecimal 0d,0a).  Although FWKFF does not
provide the phenomenal speed of FWKCS's contents_signature search, this small
program fits well with human_interface applications, and does not require any
index.

   Format:  FWKFF (/options) "search string" (<) INFILE (>(>) OUTFILE )

   For use with sorted (ascending) ASCII text files (line terminator = 0d,0a).
   To include quote " in search string, use "" .
   Comparison for match starts in left column.
   Comparison is case sensitive.

   When opening file directly in network or multitask environment,
   FWKFF opens INFILE 'read_only deny_none'.

   options:
      * - show this help screen; set errorlevel = 99 decimal.
      c - Capitalize search string before searching (won't find lower case
            item).
      f - text block begins Flush left (indented lines included in text
            block).
    ver - set exit errorlevel per version number sans ".".

   exit errorlevel:
     0 - match found.
     1 - match not found.
    99 - see help screen.
    re system error: exit errorlevel = DOS error + 100 decimal.

FWKFF is especially convenient for use with DIRGUIDE.TXT, routinely prepared
using FWKDG and FWKCS.  (DIRGUIDE.TXT provides a convenient list of all the
files on a system, their d:\path\, and the identification of the respective
text directories DIR0-DIR999, if present and suitably formatted, in which
they appear.)

FWKFF can also be used to search a sorted text file containing multiline
entries where each text block starts flush left and the rest of the lines in
that text block are indented; e.g., multiline file descriptions used with
Clark Development Company's PCBoard. (for a detailed example, see
"Using CRLF0, FWKCS, FWKFF, and FWKDG together" (without the quotes) in
FWKCS204.REF)  The new GT.BAT (below) uses FWKFF.COM.

Search time reported for finding a file in a 3.79 Meg copy of DIRGUIDE.TXT,
using a 486, was less than half a second; running the same test with an 8088,
the reported search time was less than 2 seconds.

This simple search program can be used with many different sorted files.

New .COM program:  FWKTLCSL.COM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Special utility to truncate a file at end of last " cs " line. (for
   S_REVCSL.BAT crash recovery)

New .COM program:  FWK1D21.COM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Special utility to convert long_cs column_17 structure flag 1Dh to 21h,
   or vice versa; e.g., for sending long_cs data in email.

New .COM program:  FWKFACC.COM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Utility to test file accessibility, e.g., for use in batch programs in a
   network or multitasking environment. If file is not accessible, FWKFACC
   returns the extended DOS errorlevel, which can be used to control
   branching in .BAT programs.

New or changed .BAS and .BAT programs:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See item 6, below, about change in calling S_REVCSL.BAT.

1. New: SET_CS.BAT, to set internal defaults for contents_signature length
   and usage.

2. New: BLOKSORT.BAT, to sort single_line or multiline text block files,
   where the first line of each multiline block of text starts flush left,
   and all other lines in each block are indented at least one space (every
   line must end with 0d,0a); shipped with maximum multiline block size =
   2000 bytes; can be set to up to 4000 bytes; maximum filesize is limited by
   various operating systems to slightly less than 2, or slightly less than
   4, Gigabytes, drive space permitting.

3. New: GT.BAT, template for batch program to get text re file location
   and/or text description. This template should be edited to set the correct
   d:\path for where the files DIRGUIDE.TXT, FILEDESC.SRT, and (optional
   file) ALL_F.SRT are stored. GT.BAT uses the new FWKFF.COM (above), and may
   use FWKCS and CRLF0.

4. New: MISSED.BAT, to find and test zipfiles which lack long_cs data.

5. Modified REVCSL.BAT and S_REVCSL.BAT, so that in transferring unique
   signatures from the prior data base, the zipfile contents signature
   ("zcs") of a zipfile which contained only one file is dropped in favor of
   the contents signature of the file it contained. Added support for long
   contents_signatures.

6. Modified S_REVCSL.BAT, to support use of long contents_signatures; added
   crash recovery, to resume building new data base after interruption (uses
   new FWKTLCSL.COM). S_REVCSL.BAT now has two input digits on the
   command line. The new first digit specifies buffering for output, using
   the new FWKCS network commit option.

7. Modified CSAMACS.BAT and SETFWKCS.BAT to automatically install

     #v; - (swap most of FWKCS.EXE out of memory when running virus test
       programs) a default setting in macro [x]. Virus test programs have
       been getting bigger, and may not be able to use memory above 1 Meg
       because BBS software may already be using it.

     v7 - do not count VENDINFO.DIZ files as zip in zip.

     z7 - if file has .ZIP extension, the file is required to be a zipfile.

   If you wish to remove  #v; , v7 , or z7 from macro [x],
     run GET_DFLT in your \CSA directory to create PUT_DFLT.BAT,
     search PUT_DFLT.BAT for the first line which ends with dX (it starts
       with "FWKCS"),
     remove the triplet "#V;" (without the quotation marks), and/or
     remove the pair "V7" (without the quotation marks), and/or
     remove the pair "Z7" (without the quotation marks), and
     run the modified copy of PUT_DFLT.BAT.

8. Modified GET_DFLT.BAT, so that if it finds a prior PUT_DFLT.BAT in the
   current directory, it renames prior PUT_DFLT.BAT as PUT_DFLT.OLD ...
   PUT_DFLT.OL7.

9. Modified CSM.BAT and CSAM.BAT on_line help menus, to add new material,
   etc.

10. Numerous changes, for compatibility with changes and new features in
    FWKCS.EXE described above.

11. Corrected a bug in FINISH.BAT which in some cases reduced execution
    speed.

12. Various other changes.

Changes in docs:
~~~~~~~~~~~~~~~~
1. Various changes re the new long cs, new options, etc.


Notes:
~~~~~~
1. The remote lookup functions, including Rcrosref, are available in a
   relatively small kit, FWKLU204.ZIP, released 1995 Aug 30.

   Most of the remote lookup functions (but without Rcrosref), are available
   in a special, even smaller kit, FWKLZ204.ZIP, releaased 1995 Aug 30.
   FWKLZ204.ZIP does not require registration.

   If you run a BBS, you may wish to get FWKLU204.ZIP and FWKLZ204.ZIP for
   your users, especially if your BBS is a "feeder BBS" and many of your
   users are other BBS's.  The kits come with instructions, and FWKLU204.ZIP
   contains a short bulletin, FWKLU204.BLT, suitable for posting.

2. The longer form of FWKCS contents_signature includes the 32_bit CRC,
   the uncompressed file length, and the "MD5" hash:

   Thanks, to   R. Rivest, of MIT Laboratory for Computer Science
                and RSA Data Security, Inc., for introducing the MD5
                algorithm and placing it in the public domain. (see
                RFC1321, April 1992, including the statement, "The MD5
                algorithm is being placed in the public domain for
                review and possible adoption as a standard.").

                FWKCS uses an algorithm which generates the 128_bit
                "MD5" hash. Noting also the work of Colin Plumb (1993),
                there are at least four different logical sequences
                which satisfy the truth table for generating an MD5
                hash; the one used here is different from, and faster
                than, the one provided by Rivest. Note is made also of
                work by Ray Gwinn (1995). The high_speed 32_bit and
                16_bit embodiment for the algorithm used in this
                application is by Fred Kantor.

                To the extent that the code used in FWKCS.EXE may,
                directly or indirectly, be derivative of the C program
                copyrighted (1991) by RSA Data Security, Inc. ("RSA"),
                please note RSA's public statement,

                  'License to copy and use this software is granted
                   provided that it is identified as the "RSA Data
                   Security, Inc. MD5 Message-Digest Algorithm" in all
                   material mentioning or referencing this software or
                   this function.'

                  'License is also granted to make and use derivative
                   works provided that such works are identified as
                   "derived from the RSA Data Security, Inc. MD5
                   Message-Digest Algorithm" in all material mentioning
                   or referencing the derived work.'

