NAME

	soundx - genealogy soundexing of files and input.

SYNOPSIS

	soundx  [-Mode] [Flags] [Input_File] [Output_File]
	soundx2 [-Mode] [Flags] [Input_File] [Output_File]

DESCRIPTION

	This manual documents the soundx and soundx2 programs.  The 
	programs are identical in operation and function; however,
	soundx is a 16-bit DOS program and soundx2 is a full 32-bit OS/2
	program. The purpose of this program is to take a surname and
	output the census soundex value.  This program has four basic
	mode of operation: interactive, file oriented, piped and
	piped/file.  The various modes will be explain later in the
	examples section.

	The registered version of the program can also use a prefix file
	for name prefix removal since the rules for soundex processing
	require a name with prefixes to be processed with and without
	the prefix.  The program will detect that a prefix file called
	prefix.sou is in either the current directory or the directory
	that the soundex program is in. The program will read the prefix
	list and parse each surname until every prefix in the list has
	been removed.  The prefix processing can have some interesting
	side effects which are discussed in the description of the
	interactive mode.

OPTIONS

	MODE

		-h, -H, -?

			Help mode: gives the usage prompt for the program.  No other
			options are parsed if this mode is entered on the command line.

		-i, -I

			Interactive mode: This allows the user to enter a name to be
			soundex processed and see the results of the conversion.  This
			mode is handy when prefix processing is enabled because it will
			output the remaining part of the name that was used to generate
			the soundex value after each prefix is removed.  This allows one
			to see that the reason a name like VanDeVanter has four soundex
			values which are VanDeVanter, DeVanter, Vanter, and ter.

	FLAGS

		-d, -D

			Specifics the field delimiter used to separate the different
			fields on a line.  The default separator is a tab character and
			if -d is entered without any other character immediately
			following it the separator will become a space. To change to a
			comma just place " -d, " option on the command line.  Any other
			character can be specified in a similar manner.

EXAMPLES

	Soundx has what can be considered four modes of operation.  Some
	modes are quite common in the UNIX world but, not as common in
	the DOS and OS/2 worlds.

	INTERACTIVE

		The first mode, which is quite common in the DOS world, is
		interactive mode. The command line for this is:

				soundx -i

		This mode allows the user to type in a surname and see the
		results of soundexing immediately upon pushing the enter
		(return) key.  The problem with this mode is the results are not
		preserved and the user must type in each input every time. An
		example of a interactive session would look like:

				c:>soundx -i
					Enter name to convert (999 to end):  vandevanter
					vandevanter soundex value is V531
					devanter soundex value is D153
					vanter soundex value is V536
					ter soundex value is T600
					Enter name to convert (999 to end):  999
				c:>

		In the example the prefix file had "van" and "de" as valid
		prefix values so after processing the full name the prefix "van"
		was removed which results in "devanter".  Again the soundex
		value was calculated and the next prefix was removed which
		resulted in "vanter".  This recursive behavior will continue
		until there is no name left to process or until there are no
		prefixes left to remove.  Normally one would only want the first
		two results; however, I didn't want to preclude the possibility
		that someone would want more than just the two.

	FILE ORIENTED

		The second mode, which is usually more useful and reasonably
		common in the DOS world, is file oriented.  This mode requires
		an input and an output file to be specified on the command line.
		The program then reads the input file does soundexing on it and
		places the results in the output file. This mode does have one
		undesirable side effect that it OVERWRITES the output file.  The
		command line for this type of session is show below.

				c:>soundx input_file output_file
				c:>

	PIPED

		The third mode, which is potentially the most useful and is
		quite common in the UNIX world, is piped mode.  This mode has
		soundx act like a filter to a data stream.  This allows many
		small and specialized programs to be chained together in order
		to change the data in any way one might want.  An example of
		this type of command line is:

				c:>type input_file | soundx | sort >> output_file
				c:>

		The previous line uses the built in DOS command "type", which
		would normally just list the input_file to the screen, to
		provide the "data stream". The "|" operator redirects the output
		of "type" from the screen to the input of soundx which then
		performs the soundex processing on each line it receives and
		sends it to the output stream. The next "|" then redirects the
		stream to the sort program which puts the data stream in order
		and places the resulting data on the output stream.  The ">>"
		operator then appends the data into the output_file without
		overwriting the current contents.  This is a fairly trivial
		example of piping and it is recommended that one read the DOS
		manual or better yet a good manual on UNIX for a more thorough
		treatment.

	PIPED/FILE

		The piped/file mode is also quite common in the UNIX world.  In
		this mode input is usually taken from a file and output to a
		data stream like the type command did in the previous example.
		Soundx has this mode along with a less common mode where it
		takes the input from the data stream and puts the results into a
		data file. The usage as one might expect is a hybrid between
		piped and file modes.  Two examples are shown below

				c:>type input_file | soundx output_file
				c:>soundx input_file | sort >> output_file

		The first example does the same thing as the piped example
		except the output is placed into the output_file.  The second
		example reads the input_file and places the results of the
		soundexing on the data stream as if "type input_file | soundex |
		sort >> output_file" had been typed instead.

History

	This program was conceived to help my mother in her genealogy
	habit.  She had access to large databases of names that required
	soundexing so Soundx was born.  She has also been my beta tester
	and helpful in some of the design decisions required by the
	program.

Quirks

	The program currently only accepts the surname to be soundex as
	the first item on each line.  Leading white space and
	non-alphabetic characters are discarded up to the first
	alphabetic character or occurrence of the separator character. 
	This will occasionally result in some names getting soundexed
	that you were not expecting; however, this behavior can usually
	be corrected by changing the separator character the program
	uses.

	One problem that exists when processing database type files is
	how to handle what is considered improperly formatted data. 
	People that use databases tend to think of things in terms of
	fields whereas us programmers think in terms of tokens.  In some
	cases the terms are synonymous and in others they are not.  In
	the case of this program I considered a token the surname that
	was being soundex processed.  If the database had a field of
	"Last Name" then the token and field were the same; however, if
	the field was Name then the token was only a part of the field. 
	This second thing brought up the problem of how to handle
	something that wasn't a valid character and wasn't defined as
	the token separator.  For example when the fields are separate
	by a tab but the name field has a comma separating the last name
	(my token) from the rest of the name.  To solve this problem I
	decided to look upon all "funny" characters as a terminator but
	flag the field as having something strange in it.  The way I
	handled this was to put a * after the soundex value to flag the
	user to look at the result. In most cases the result will be
	correct but I prefer the conservative approach of flagging it
	anyway.

Limitations

	The program can only handle a single line length of 4090
	characters.  I'm not sure what happens after that but it will
	not be pretty especially in DOS.

	The program has no limit on the actual number of lines in a file
	as it reads, processes and outputs one line at a time; however,
	if you are outputting to a file ensure that there is enough disk
	space to write a second copy of the input file along with the
	soundex values (extra 6 bytes per line) to. If the program runs
	out of disk space things will probably get a little weird.  OS/2
	shouldn't have a problem but who know with DOS.  Also, realize
	when doing piping operations under DOS each program runs to
	completion into a temporary file before the next one is invoked
	so you may actually have three or more copies of the file for
	some piping operations.

	It is possible with large prefix files that the DOS version
	could run out of memory in which case it will give an error
	message and quit running.  This should not be a problem until
	the prefix file grows over 300k so I wouldn't worry too much. 
	If you hit this limit let me know and I'll have to rethink how I
	handle the prefixes.  OS/2 does not have a memory limitation
	because of the use of virtual memory.

Bugs

	There are currently no known bugs in the program other than the
	limitations listed above.  This is not to say that there are not
	any bugs but just to say I don't know about them.  If you
	encounter something you consider a bug send me a bug report at
	the registration address.  Please include a disk containing the
	file that you were processing which caused the bug along with a
	detailed description what the bug was.  If I determine that it
	is a real bug and you are the first person to report it you will
	be entitled to a free upgrade.  As an alternative to sending me
	a disk you can e-mail me the bug report via GENIE.  My GENIE
	address is L.Pennock.

Registration

	This program is shareware not freeware.  You are allowed to use
	this program for a limited period of time, less than 30 days, to
	determine if it fits your needs.  After the evaluation period
	you either need to register the program or discontinue its use. 
	Registration is in one of two forms. The first and least
	expensive is to send $5 to ease your conscience; however, this
	will not get you the version that does prefix processing. The
	second and preferred method is to send $15 which will entitle
	you to receive the latest full DOS and OS/2 version that will
	have prefix processing and any new features I made add enabled. 
	To register send your name, disk preference (3.5" or 5.25"), and
	check or money order  to:

				L. K. Pennock
				LP Engineering
				951 S. Sunset Court
				Chandler, AZ 85224

	Please make the check or money order payable in US dollars and
	made out to L. K. Pennock. If you don't specify a disk size you
	will receive a 5.25" disk. If you have any suggestions for
	improving the program you may include them at the same time. You
	can send suggestions anyway but I am more likely to listen to
	paying contributors.  Remember that only you can make shareware
	work.

Authors

	Copyright 1993, 1994 LP Engineering: Leonard K. Pennock

