       BIGSORT V4.1: A Fast in-memory sort for files of any size.
          ----------------------------------------------------
                         User Supported Program
                Continued use requires a donation of $20


(C)1988-93 Turgut Kalfaoglu <turgut@frors12.bitnet>,<turgut@frmop11.bitnet>


BIGSORT uses the fastest known sorting algorithm to sort files that can 
be as large as your swapping area (not RAM) allows. A wide range of 
options along with multiple key fields enable you to pinpoint the 
desired sorting method.

BIGSORT is especially well suited for batch files, and to be called from 
other programs. It returns specific error codes, and never prompts for 
verification or additional information. Always using the defined 
(primary) collating sequence for your country, BIGSORT will be able to 
place your national characters in the correct order.

Unless /Verbose option is specified, BIGSORT never writes its messages 
to the standard output, to prevent its messages from getting written to 
an output file. Its messages are always directed to "standard error." 
Under OS/2, it is possible to redirect stderr to a file, if desired.

This program is shareware: A registration allows you stay up-to-date on 
enhancements to the product, and enables you to purchase the source 
code.


Usage:

BIGSORT [options] < inputfile > outputfile

if you omit the '< inputfile' part, BIGSORT will wait for an answer from 
the keyboard. If that is what you wish, enter the data, separating each 
one by a RETURN character, then enter CTRL-Z to finish the entry.

if you omit '> outputfile' part, BIGSORT will send its output to the 
screen.

For some online help, type
      BIGSORT HELP


The normal usage of BIGSORT is either thru OS/2 "pipes", or thru 
redirection. Pipes enable a program's output to be sent as input to a 
second program. This is specified by using the "|" symbol between the 
two programs. Redirection is similar, it allows the output of a command 
to be sent to a file, instead of getting displayed on the screen. The 
">" symbol indicates that the output should be sent to the file, not to 
the screen. Note that the > symbol causes the previous contents of the 
file to be lost. The >> symbol can be used to append to the previous 
contents. 



Options
-------

    Use options to change the default behavior of BIGSORT, which is:
        * Start sorting on the first position of each line,
        * Do a case-sensitive alphanumeric sort,
        * Reserve room for 100,000 lines of input file. (Lines, not 
          bytes).

    If you wish to use multiple options, you need to separate them by 
spaces. The options available with this version are:
  
  /+nnn   where nnn's are a number, will cause BIGSORT to start sorting 
          items from that column. If omitted, BIGSORT sorts the file 
          starting from the first character.
          
  /+nnn-mmm where nnn and mmm are the column numbers, causes the program 
          to focus only on the area between those two columns. This 
          option can be repeated as many times as necessary to specify 
          secondary sort keys. See the chapter on multiple keys.
            
  /R      Reverses the sort order. The sorting order will be descending 
          order for that field, if this option is specified.

  /Ds     Specifies the symbol to use as a delimiter for the date 
          symbol. 's' can be either a dash "-", a slash "/", a period 
          "." or nothing. If "/D " is specified, BIGSORT assumes that 
          the digits are attached to each other, like 19921220, to 
          specify a date of Dec 20th, 1992. Default: /D-
          
  /I      Ignore case. Without this option, A comes before a, and Z comes
          before a. Use this option to prevent this.

  /MMDDYY The field is a date field, in the format of MM-DD-YY. Unless 
          the /D option is specified, BIGSORT assumes that dashes 
          separate the digits.
  /DDMMYY Similar - the data field is in DD-MM-YY format.
  /YYMMDD Similar - the data field is in YY-MM-DD format.
  
  /Snnnn   Specifies the index size for the file. One index entry per 
          each line in your file is needed to load the file. Normally 
          BIGSORT reserves a room for 100 thousand lines. If the number 
          of lines (or records) in your file is more than 100 thousand, 
          you need to specify the /S option. For example, if your file 
          is 200 thousand lines, and you expect it to grow, you may tell 
          bigsort to reserve room for 500 thousand:  /S500000

  /N      Indicates that the field specified is a numeric field, thus a 
          numeric comparison should take place. This prevents leading 
          blanks and other characters from interfering with the sorting 
          order of numbers. If /N is specified, BIGSORT compares the 
          field contents after converting them to floating-point 
          numbers. This ensures accurate sorting of numbers with decimal 
          places.
          

Specifying Ranges
-----------------

Ranges limit the area where BIGSORT should focus on. For example, if you 
wanted to sort the files displayed with OS/2 DIR command, based on the 
file sizes, you could tell BIGSORT to sort the data based on the 
information contained between column 17 and 27. The option for that 
would look like:  "/+17-27".

Options are parsed from left to right, changing the internal defaults as 
it goes along. When BIGSORT comes upon a range field, it records the 
current setting of such options as "/R", "/I", "/N" and date-related 
options, for that range. It then resets these options to the defaults, 
to enable you to construct a completely new set of rules of your second 
sort field specification, if any.

Let's see this with an example. Let's try to sort DIR's output on 
creation date, and then on the name. Our idea is to display the files in 
the chronological order, but files created on the same day, should be 
sorted by name within themselves.

Here is a ruler line, followed by a sample output of the DIR command:
12345678901234567890123456789012345678901234567890

  9-11-92  8:01p     <DIR>         922  .
  9-11-92  8:01p     <DIR>         690  ..
 12-22-92  7:14p      5638           0  BIGSORT.BAK
 12-22-92  7:42p      5267           0  BIGSORT.C
 12-22-92  7:14p       869           0  BIGSORT.H
 12-22-92  8:44p      5167           0  BIGSORT.TXT
 12-22-92  7:12p      3004           0  COMPARES.C
 (...)


The command to give to BIGSORT to sort the above list would be:

DIR | BIGSORT /MMDDYY /d- /+1-9 /i /+41-80 > myfile.output

Let's analyze this command. When OS/2 "sees" this line, it first erases 
and opens the "myfile.output" which will store the results of the 
operation. It then runs the "DIR" command, passing its output to BIGSORT 
with all the parameters.

When BIGSORT starts, it defaults to using its case-sensitive 
alphanumeric sort. The first option changes this to the sort on the date 
field. The second option specifies that the month, days and year digits 
are separated by dashes, which is the default, by the way. When we 
specify the range, "/+1-9" our /MMDDYY option is recorded as the desired 
sort method for the first range, and BIGSORT resets the sort method to 
case-sensitive alphanumeric sort. Now it reads the "/I" option and 
switches its current sort method to case-insensitive, alphanumeric sort. 
When it reads the "/+41-80" range specifier, it records that our second 
field selection should be sorted using an alphanumeric, but 
case-insensitive sort.

When BIGSORT is done, it sends the result to "standard output" which has 
been redirected to a file with the "> myfile.output" part of the 
command. 

Each time a range is specified, the specified options are recorded for 
that range, and the options are reset. Thus, you may have to specify the 
same option several times in the command line, in some cases. For 
example, to sort on the Lastname, then on the firstname, both using 
case-insensitive sort, you need to put something like:

        TYPE myfile | BIGSORT /i /+10-25 /i /+27-40


Multiple Ranges
---------------

BIGSORT accepts up to twenty ranges. This means that you can specify up 
to twenty different "zones" in your data for BIGSORT to sort on. 
Multiple key fields are useful if you wish for example that your 
database output be sorted on the date field, and within that, on the 
last name of the person. You can tell BIGSORT to sort the first field 
definition corresponds to a date, and the second one to a name. This 
way, BIGSORT will continue sorting records on the name, if the dates are 
identical.


BIGSORT and multiple files:
---------------------------

If you wish to sort and merge several files into one, you can do it with 
one command under OS/2. Just do a:

TYPE *.* | BIGSORT > result.txt

Note that the TYPE command also sends the filename of the file it is 
processing, to "standard output". You may have to manually remove such 
records.

BIGSORT Swap Area:
------------------

BIGSORT creates no swap area of its own, but uses OS/2 to allocate 
necessary memory to load the entire file, along with an index pointer 
for each line. You can "guesstimate" that an input file of 8MB, will 
occupy a little over 8MB of RAM when BIGSORT is working on that file. 
OS/2 will first allocate all available RAM, then provide the rest from 
its swap area. Since this area grows and shrinks automatically, there is 
nothing wrong with having a temporarily large swap file. 


Shareware
---------

BIGSORT represents countless hours of work. Please contribute to the 
shareware discipline by sending $20 of registration fee to:

        Turgut Kalfaoglu
        1378 Sok. 8/10
        Izmir 35210
        Turkey
        


Source Code
-----------

BIGSORT is compiled under IBM C/SET 2, at CSD level 28. Clear and 
well-documented source code with documentation is available ONLY to the 
registered users. Send $10 (to cover shipping charges) and a blank disk 
(3.5 format) to the above address to receive the source code.

The author encourages you to register, but also to ask for additional 
features, or comments. Please don't think that you need to register this 
software for additional features or to report problems or suggestions. 
However, regular use requires a donation of $20 to the above address.



Update History
--------------

Version 4.1:
------------

Implements the "/N" option. No bugs were found with 4.0.

Version 4:
----------

Implements multiple key fields,
Implements sort ranges,
Implements the /D option.
Implements country-specific information

Version 4 is almost a complete re-write with division of source code 
into five segments.


New Version: V3
---------------
Implements unlimited input filesize.



New Version: (V2.1 to V2.2)
---------------------------

Added features: It can now handle dates as well!
                Now /R and /I can be used at the same time.
                Improved performance, but code size still about
                the same (you should see the tricks that were done
                to keep it that way :)

Bugs Removed:   None were found in V2.1

