CBNORMAL 0.3a  (32-bit version) Copyright Rob Weir, 1994-96

CompuServe: 71165,2722
Internet: rweir@cybercom.net

This program is free for personal use.

=======================================================================
WARNING: This program produces modified ChessBase data files, something
quite difficult, and quite undocumented.  This program seems to work for
me, but don't you think it would be better if you made a backup of your
BIG ChessBase database before using me?!
=======================================================================

New in Version 0.3a!

Fixed bug where we were ignoring the Pivot option in CBNORMAL.INI.

Also, expanded the "AddComma" option to change:
"Kasparov Gary" -> "Kasparov,Gary"

=======================================================================

New in Version 0.3!

This is a 32-bit port with many new features:

a) Rewritten to take advantage of 32-bit features, like memory-mapped
files and additonal available memory.

b) Several additional heuristics for fixing malformed headers

c) CBNORMAL.INI used to turn on/off the various modifications

d) Practice mode, which shows you exactly what would be modified

=======================================================================
New in Version 0.2!

I've fixed a few bugs:

1) I now detect and don't let pass games which have checksum errors or
games which require a user id to access.  I've provided a program which
you can download called CBCHKSUM.EXE which can read and unprotect such
datafiles.

2) I now enforce a 47 character limit on the names and source
fields.

3) I detect and ignore user-defined substitutions which are circular, i.e.
"New York"="New York City".

4) Fixed a big where a field could be corrupted if the length was increased
via a substitution.

Other changes:

5) When calculating what determines a word (the smallest unit for
user-defined substitutions) I now use a sequence of letters and numbers.
Version 0.1 defined a word as a sequence of letters.

6) Changed the display of percent progress

7) New code for doing the search-and-replace operations which is faster


Additions:

8) Changed date fields of year=1792 to year=blank (0).

9) Remove text in parenthesis from names.

=======================================================================

Files you now have:

CBNORMAL.TXT  the file you are reading
CBNORMAL.EXE  the CBNORMAL program
CBNORMAL.INI  CBNORMAL configuration file
NAMESUB.DAT   sample name-field substitutions
PLACESUB.DAT  sample source-field substitutions
PLAYERS.DAT   large list of preferred player names

=======================================================================
The program CBNORMAL takes a ChessBase data file and creates a new data file
with user-defined modifications made to the text of the player and source
fields in the game header.

CBNORMAL produces a new file (always called CBNORMAL.CBF/CBI)
which contains the modified games.  The original data files are left
untouched.

Why would you want a program like this?  If you are like me, you get games
(directly or indirectly) from many different sources: the Internet,
CompuServe, or other BBS's.  These games come in many different formats, like CBF,
PGN, NTF, NIC, etc.  After I go through the effort of converting the games
over to a single data format (CBF), I find that the games do not follow a
common layout of the game headers.  For example, I'll see the same game in
several different styles:

Karpov,A-Kasparov,G
Moscow

A. Karpov-G. Kasparov
Moskva

Karpow.A - Kasparov.G
Moskau


And so on. We see differences in spelling, usage of accented characters
and umlauts, punctuation, spacing, etc.  I wrote CBNORMAL to help with this problem.

CBNORMAL reads a ChessBase file and creates a new file with the following
modifications:

1) Convert "foreign" characters to the nearest ASCII representation.  So,
an accented letter 'i' is converted to a plain 'i'.  German umlauts are
converted in the standard way (o-umlaut goes to oe, etc). German "ess-zet"
is converted to "ss".

2) The players names are put in a standard format like "Karpov,A-Kasparov,G".
This may involve rearranging the names, deleting extraneous spaces and
punctuation, etc.

3) User defined substitutions are applied.  These are defined in the text
files "NAMESUB.DAT" and "PLACESUB.DAT", which apply to the players field
and source field respectively.  Think of these files are a list of
search-and-replace operations which are applied, in order, to every game.
Each entry in these files look like this:

"Kortschnoj"="Kortchnoi"

The string on the left is the string to be search for, and the string on the
right is the replacement.  The search is case-sensitive and applies only
to whole words, not to portions of a word.  To be exact,CBNORMAL looks for
the search pattern, and if it finds it, and the character before the beginning
of the pattern is not a letter or number, and the character after the pattern
is not a letter or number, then the substitution is made.

For example, if you have an entry in your PLACESUB.DAT which says:

"corr"="cr"

this will match the following strings:

"corr match"
"thematic corr"
"corr(9)"

but will not match these:

"Corr match"
"correspondence"
"corr9"

Take a look at the two files NAMESUB.DAT and PLACESUB.DAT to get a feel
for what can be done with them.

Along with the user-defined substitutions, CBNORMAL has knows how to correct several
categories of mal-formed headers.  All of these heuristics can be individually enabled
or disabled in the CBNORMAL.INI.  Examine this file for more details.
=======================================================================
CBNORMAL is easy to use.  You just pass in the name of a ChessBase file
as an argument and let it run.

For example, if you have a ChessBase file of World Championship games
called WCH.CBI and WCH.CBF, you run CBNORMAL like this:

CBNORMAL WCH.CBF

The results will be found in files CBNORMAL.CBF and CBNORMAL.CBI

There is also a practice mode, which is enetered like this:

CBNORMAL WCH.CBF -practice

In this case, no output database is created, but instead three text files:

NAMES.TXT    - A list of all the modifications whcih were made to player names
PLACES.TXT   - A list of all the modifications whcih were made to the source fields
SUBCOUNT.TXT - A tally of all user-defined substitutions which were made

=======================================================================
Future directions?

I'd like to do the following for future releases:

1) Add regular expression support for more sophisticated searches

2) Add more heuristics for dealing with mal-formed headers

3) Figure out how Dutch and German names should be expanded when the are
abbreviated.  When does vWijk become van Wijk or von Wijk, or when does vd
become "von der" versus "von den" versus "van der" and "van den"?

4) Add option for users to selectively enable/disable the automatically
applied heuristics,like character conversions.

5) Instead of just deleting the text within parentheses I the name field,
I'd like to make sense of it.  If it is a round number or country, I'd like
to add it to the source field.  If it is a rating, I should make apply it
to the rating field.  If it is an ECO code, I should put it there.

Let me know if there is something you think would be really neat for
CBNORMAL to do, or if you come up with a useful set of name or place
substitutions you'd like to share.
=======================================================================


