
The following is in response to many questions I have seen posted.

  -Paul

ps: with the recent release of PGNSort, another, perhaps easier, method
    for breaking out a large file by ECO or opening descriptions is possible.
    The procedure outlined here is still useful for updating individual
    ECO or opening collections from files composed of miscellaneous games.


(Re) Organizing Your PGN Collection
===================================

There are many reasons for building a collection of chess games but common
to all is some element of research during a review of the data. This means
there must be a method to locate games of interest.

The large chess file is one approach and I have heard from someone who
maintains files holding 10,000 games each. One problem then is finding
software which will handle such files. Perhaps only a large editor or word
processor is suitable.

ChessU4 and most of the other "U4" utilities are somewhat more modest--the
optimum file size is 4000 games.

Splitting games among smaller files can also have a disadvantage--one must
have a system to locate a given game across multiple files. To reduce
multiple-file searches, there are two reasonable methods for organizing
files:

	1) by ECO (Encyclopedia of Chess Opening) codes
	2) by PGN Opening descriptions

The second method has only been possible since early 1994 when PGN was
introduced. Realistically, it came about a year later when chess software
was finally developed to assign opening descriptions.

Two "U4" programs which will add the "Opening" tag and description to PGN
games are NORMAL.exe and ECOClass.exe. NORMAL does so by expanding
previously assigned ECO codes into descriptions. ECOClass is more
sophisticated--it will assign both the ECO codes and the descriptions.

	Using NORMAL
	  - games must have ECO codes
	  - the file, ECOIND.txt, available in many libraries, must
	    be in the local (NORMAL) directory
	  - run NORMAL with switch "expandECO=1"

	Using ECOClass
	  - any valid PGN game can be classified
	  - all materials necessary for classification are included
	    with the download
	  - see the ECOClass help file for instructions (ReadMeEC.txt)



If 1000 games in a file of high-quality (GM) games were assigned
descriptions, we might find this distribution,

	195 - Queen's Gambit
	170 - Sicilian
	 75 - English
	 75 - King's Indian
	 75 - Ruy Lopez
	 50 - Queen's Indian
	 40 - Caro-Kann
	 30 - French
	 30 - Gruenfeld
	 30 - Nimzo-Indian
	 30 - Perc

That would account for all but 200 of the games. Right away we can see a
method for distributing the games across several files.

Is it going to be a big job? Not at all. The main problem will be one of
concentration--not making any mistakes which would botch the process and
cause one to start over. An old-fashioned pencil and paper will be
necessary to take down game counts and make sure everything was accounted
for at the end.

First, however, an observation must be made to investigate the spelling of
descriptions. If the games have been taken from numerous sources, it is
perhaps best to reclassify them all to get standard opening names assigned.

Looking just at the Queen's Gambit we might find these descriptions,

[Opening "Queen's Gambit"]
[Opening "Queen's Gambit, Accepted"]
[Opening "Queen's Gambit, Declined"]
[Opening "Queen's Gambit, Declined, Anti-Meran Defense"]
[Opening "Queen's Gambit, Declined, Exchange System"]
[Opening "Queen's Gambit, Declined, Meran Defense"]
[Opening "Queen's Gambit, Declined, Semi-Slav Defense"]
[Opening "Queen's Gambit, Slav Defense"]

"Queen's Gambit" -- that makes sense. All we have to so is remember the
apostrophe. It's not a bad idea either to include the starting quote in the
search as some opening names are used again in variations of other openings.

Now return to our mixed file of 1000 games. Call it file AAAAA.txt.

Start ChessU4 with AAAAA.txt as input then enter the find command,

	> f*

You will be prompted for a string...

	> "Queen's Gambit

...then a file name to save the selected games...

	> BBBBB.txt

ChessU4 will reply with, "195 games written to BBBBB.txt." A new prompt
will appear,

	Save the NOT component of selected games? y/n >

Enter "y" (yes). Save to a third file,

	> CCCCC.txt

Again, the echo, "805 games written to CCCCC.txt."

The "NOT component save" subtracts out the games from the first iteration,
saving what's left to a separate file. Our source file for the second pass
will now be file CCCCC.txt, not AAAAA.txt.

Now, to continue the process, do a restart (r) with ChessU4, opening the
file CCCCC. Keep recording the game counts and file names on your process
log to assure accuracy.

Once file CCCCC has been opened, you can start from the top again, doing an
initial "f*" search on,

	> "Sicilian

When you're through, the final "NOT component" file will contain just the
miscellaneous openings--those that weren't selected out by any of the
searches. When done, you'll want to rename the files (but you probably
already guessed that meaningful file names could have worked just as well.)

You might also want to combine several of the less-common openings into a
single file. You _could_ do this by appending but perhaps a single-pass
method is simpler,

	> f*
	> <"Nimzo-Indian>,<"Gruenfeld>,<"Perc>

The bracketed <> strings separated by commas tell U4 to select matches on
string 1 *OR* string 2 *OR* string 3. If the commas were omitted, the
condition is *AND* (not apt to be suitable to the above example).


Other ChessU4 Searches
======================

The next time you want to find games matching a certain position, you'll
know where to look. ChessU4 has two types of position searches. The basic
position search requires a terminated line of notation,

  Queen's Gambit line
  1.d4 d5 2.c4 c6 3.Nf3 Nf6 4.Nc3 dxc4 5.a4 Bf5 6.Ne5 e6 7.f3 Bb4 1/2

Put it as the only (or top) line in default file, GamesU4.txt. Start
ChessU4 and open the Queen's Gambit file (mine was BBBBB). Select the
position search option...

	> p

...then verify that the search line is in file GamesU4. On 200 Queen's
Gambit games, you should get several matches. If you did not, use the "p*"
position search command,

	> p*

This tells ChessU4 to keep advancing towards the start of the line until a
match is found. Once a match _has_ been found, "No" answered to the "search
complete?" prompt will continue searching at an ever more shallow depth.
(Pressing Enter is the same as answering, "No.")

The games output to a file will be ordered from the nearest matches on
down. Use the (r) restart option to open and review the saved games.


That's enough for a start at reorganization of a chess game collection.
There are many other possible approaches. One such would be to use PGNSrt
followed by CChunk and process all the divisions in a single pass.

ps: don't forget that ChessU4 can search game headers and notation in other
ways as well. Here's one example to find all Alekhine's Def. games by
searching on the ECO codes,

	> f*
	> <ECO "B02>,<ECO "B03>,<ECO "B04>,<ECO "B05>

...and another that could be used to split games into the five ECO
volumes...

	> f*
	> ECO "A
	...
	> ECO "B	<etc.>

  [fin]


