How ThinkTank (tm) Works
by: Rod da Silva



The Beginnings of An Idea

As an experienced CA-Clipper trainer of several years
now, I am often approached by students during the breaks
or after my class with source code to some program asking
me if I can help find the problem they are having.
What I found was that when looking at students' code
samples my eye would automatically spot the very
improper programming techniques that I was eluding to
and warning against in the class.  These seemed to jump off
the page and it became apparent to me that without ever
knowing what the source code I was looking at did, I could
nonetheless make many suggestions on how to improve it.

Suggestions for improvement ranging from speeding the
code up, to making it more readable would fill my head as
I read the code, and I had to struggle hard to filter those
distractions out, in order to remain focused on the job at
hand of searching for the critical problem that was
preventing the software from working as desired.
Usually the problem wasn't hard to spot.  You know, the
kind that really just needed another pair of eyes to find it.
But what amazed me was how often the root of the
problem could be traced back to improper technique being
used in the first place.

So I began to entertain the idea of writing a piece of
software that would use the checklist of "proper
techniques" that I had built up through experience over the
years, and apply them to source code for the purposes of
spotting potential problem areas.  I imagined the software
being much more than a common syntax checker (often
called a lint utility).  It would know about CA-Clipper, and
CA-Clipper relevant coding techniques.  It would know,
for example, the rules under which a PRIVATE could be
safely changed into a LOCAL without adversely affecting
the program.  And it would be smart enough to find all
such PRIVATE variables in a program and recommend
they be changed to LOCALs.  Finally, it would educate the user
by giving a detailed description of what they are doing
wrong and how to go about correcting the situation.

Such a program would tirelessly analyze a set of source
code, looking for violations of all the technique "rules" it
knew about, and once found, would report the location and
description of the problem and a detailed recommendation
describing how to correct the problem.  What the software
would NOT do, is attempt to diagnose logic problems as this
would be virtually impossible.  Only the programmer
knows what he/she wants to do.  Instead the software
could only look for sets of program statements that are
used improperly, independent of the context they are used
in.

In short, what I was contemplating was a grammar
checker for source code!  Just like the grammar checker in
your word processor cannot improve the entertainment
value of the short story you are working on in any
meaningful way, the software I had in mind would not be
able to improve the logic design of your program.
However, in a similar manner to the way in which the
grammar checker can tighten up your short story and
make it flow more smoothly adding to its sense of
cohesion, this software idea would do the same for your
source code.

I imagined how useful it would be if I could take some of
my older code and run it through this software idea of
mine and have it automatically suggest where I could
make changes that would directly affect my bottom line of
choice - speed, memory usage, readability, portability,
etc.,

Well, after convincing myself that the idea was at least not
out of the realm of possibility, and that it would be a very
useful piece of software to have, I began to ask my self
some harder questions concerning just exactly how I was
going to build this software.  The problems I would face
where not going to be small.  But then again, neither was
the ambition level.

Such was the humble beginnings of a software project
known as ThinkTank.  This short white paper endeavors
to explain my approach to building such a software
program.



A Tirade of Technology

It became clear to me that ThinkTank was going to be
made up of three major technologies;  A compiler to parse source code and 
translate it into a database of source code elements for later ease of 
manipulation. An efficient repository to store this database
along with the knowledge base of rules for good
programming practice.  And a built-in, dynamic macro
language that would facilitate the application of the rules
(whether built-in or user-defined) against the database in
order to obtain recommendations for improvement or other
forms of useful output.  These three components of
ThinkTank are discussed in more detail below.


The Core Compiler

ThinkTank's core compiler is responsible for converting
the source code of the system you want analyzed into a
database of source code elements.  The primary purpose of
ThinkTank is to analyze your source code.  However,
working directly with the ASCII text that makes up your
source files is very difficult due to the string matching that
is necessary.   Instead a compiler was written that
understands CA-Clipper syntax and is able to translate it
into another more malleable form - a relational database.

The database exhaustively describes every relationship
between each syntactic element of the source code.  The
output of the compiler is literally a complete set of related
data tables which describe the relationships between the
various syntactic elements (entities) of your source code.
Tables such as:

  - Function Declarations
  - Parameters
  - Variable Declarations
  - Function Calls
  - Arguments
  - Variable Assignments
  - Variable Accesses
  - Constants
  - Operators
  - Expressions
  - Statements and Statement Blocks
  - Constructs
  - etc.,

are generated full of information pertaining to each
instance of the entity in question.  Moreover, each of these
entities is crossed referenced to other related entities where
appropriate and to its location in the source code.

By translating the source code into such a database, the
compiler effectively "flattens" it (and the inherent
relationships and hierarchies within it) into the more
flexible form of a relational database.  This alternative yet
equivalent form of your source code lends itself better to
the manipulation required by the analysis portion of the
system.

The ThinkTank compiler is designed to work with CA-
Clipper .PPOs rather than .PRGs.  A .PPO is the form the
source code takes after the  preprocessor front-end of the
CA-Clipper compiler has finished translating all
preprocessor directives and User-Defined Commands
(UDCs) into their equivalent function calls.  This form of
your source is syntactically, simpler to work with than the
original .PRG and the compiler needed to translate it is
likewise easier to build.

[NOTE: that working with the .PPO does NOT represent a
loss of source code information.  Clearly all syntactic
elements of the program remain after the preprocessing
stage otherwise your program wouldn't function as you
had intended.  To create a .PPO you simply compile the
.PRG using the /p switch.]


The  Repository

Technically speaking, ThinkTank is not implemented as a
true expert system since it currently does not perform
inferencing (forward or backward chaining) of any kind.
However, it does bear many similarities to an expert
system since it has a knowledge base of rules and facts and
it does perform a diagnoses function (a classical expert
system problem domain).  What sets ThinkTank apart
from typical expert systems (besides its implementation) is
that unlike most expert systems, ThinkTank does not
require you to enter in any "symptoms" or other forms of
facts into the knowledge base.

Normally a knowledge base is made up of expert supplied
rules which operate on user supplied facts that are
different for each problem trying to be solved.  However,
ThinkTank's knowledge base improves upon this idea by
generating its own facts automatically!  The facts, of
course, are the elements buried within every aspect of your
source code.  'x' is a variable declaration, 'y' is a function
declaration, 'b' is assigned a literal codeblock which has
two parameters and calls functions 'a', 'b', and 'c', etc.,.
The database automatically generated for you by the core
compiler amounts to a complex web of interrelated facts
about your source code that completely describe the
application being analyzed.

Depending on the size of the application you want to
analyze, ThinkTank can produce dozens of tables worth
of facts, each having possibly thousands of records and up
to several indexes ordering it in different ways.  In
addition to the this unique set of data the system generates
for each and every application it is run against, ThinkTank
must also, of course, store all the built-in expert rules
which are applied across each application being analyzed
by the software.

Managing all this data in its many forms as a traditional
set of DBF/NTX tables/indexes would have produced too
complex a system to manage efficiently.  Instead a brand
new, proprietary repository was architected from the
ground up which supports fast access to all the data in the
system while only using a single DOS file handle.
Designed as a true CA-Clipper Replaceable Database
Driver (RDD) this repository serves as the backend for the
ThinkThank system (which is itself written in CA-Clipper
with a generous supplement of C/+++).  All data, from the
above mentioned knowledge base of rules and facts, to the
data dictionary or meta-data that describes the entire
system itself, is stored in a single DOS file that
dynamically grows as required.

Many of the advanced features of the proprietary
repository file format, such as true variable length records,
never needing PACKing, sophisticated data dictionary,
etc., will lie buried behind the scenes, unbeknownst to the
end user.  It is exactly these hidden technologies that allow
this approach to be used and greatly simplify the job of
implementing a project such as ThinkTank.  The
repository will remain an unassuming hero to the ThinkTank cause.


The Built-In Macro Language

The third major component of Think Tank is the built-in
macro language that allows for the specification of "rules"
or "heuristics" to be applied to the facts manufactured by
the compiler.  This powerful sub-system allows virtually
any question to be asked of your source code.  If you can
answer a question by looking in your source code, than a
heuristic can be written to automate it.

The first order of business was choosing a macro
language.  One thing I didn't want to do was force people
to learn another language, so I decided to use CA-Clipper
itself.  In addition to being highly CA-Clipper compatible,
I required it be itself compilable.  That is the routines
written with the macro language had to have performance
that approach that of CA-Clipper itself and so a macro
interpreter was ruled out in favor of a macro compiler.

The macro language is completely integrated into the
product.  From within macros you can call any CA-
Clipper function, or your own user-defined macro
functions, or any combination of the two!  As a true subset
of CA-Clipper, it will support many of your favorite CA-
Clipper constructs including code blocks and arrays.  You
can have your macros prompt for data and generated
output in any form you desire (to built-in ThinkTank
generic reports, to the screen, to DBFs you create, or low-
level to DOS files).  Working with it will be as familiar as
working with CA-Clipper itself.

As an example, to create a report on all variables that are
dynamically created using the CA-Clipper macro operator,
(i.e.; PRIVATE &x)  you would enter a short macro
language function similar to the following :

FUNCTION FindMacroVarDecs
  // Create a new report to hold output and save workarea
  LOCAL nReportHandle :=  ;
           NewReport( "Macro Created Variables",... )
  LOCAL cOrgArea := ALIAS()

  // Move into the Variable Declarations work area
  SELECT VariablDec
  GO TOP

  // Loop  processing all macro created variables
  DO WHILE .NOT. EOF()
     IF VariablDec->IsMacro .AND. ;
          VariablDec->Type $ ;
               "PRIVATE PUBLIC DECLARE"
         // Add relevant information to report
         AddToReport(nReportHandle,;
               FIELD->Name, RECNO(),... )
     ENDIF
     SKIP
  ENDDO

  // Close report and restore workarea
  CloseReport( nReportHandle )
  SELECT( cOrgArea )
RETURN NIL

Once the program has been entered into the Heuristic
Editor Window, you simply click the Compile button and
save it along with a description, etc.,.  Now the heuristic is
automatically added to the main menu of Reports to be run
when you like.

By studying the example above it becomes clear why it
was essential that the repository be implemented as a true
RDD.  This construction allows the macro language to use
the standard CA-Clipper Database Manipulation
Language (DML) to access the various fields and tables it
requires.  This compiled macro language routine will
actually access the very Variable Declarations table that
was generated by ThinkTank's core compiler.  That is,
this expert rule or heuristic is directly accessing the facts it
needs to be able to generate a report with a proper
explanation of why macro (operator) created variables are
an inefficient and a hint or two on how to correct the
situation.

Since the database schema for the information generated
by the compiler will be completely documented, and the
Repository is a true CA-Clipper RDD, you can simply
write your own functions to access the data tables and their
indexes directly in any manner you like.  The heuristic
compiler (the second compiler in the system) will come
complete with a debugger to help you find and eliminate
any logic bugs you might have.

Of course you don't have to write CA-Clipper code to get
information out of the system.  Dozens of pre-defined, pre-
compiled heuristics will be included allowing you to glean
valuable information from your source code immediately.
Each of these built-in heuristics will be implemented with
a macro language function similar to the example above,
and this source code will be provided so that you can see
how the heuristic was constructed or even modify it for
your own purposes if you desire.

Some interesting examples of the kinds of pre-defined
heuristics that you can expect include:

  -  Unreferenced Variables Report
  -  Undeclared Variables Report
  -  Routines with Multiple RETURNs Report
  -  Input Points Report
  -  Output Points Report
  -  Recursive Routines Report
  -  Out of Routine PRIVATE Variable Access Report
  -  etc.,

There will even be special interactive reports which will
allow you to specify a function name for example, and
return a list of every function that makes a call to the
function, etc.,.



The Knowledge Base

Clearly, one of the most attractive aspects of ThinkTank is
its built-in knowledge base.  When I thought up this
concept I recognized that for this product to be successful,
it would have to be loaded with several different
perspectives on programming technique, rather than just
my own.  We all have different approaches to different
problems, and it occurred to me that one of this products
strengths could be its ability to introduce new approaches
to the programmer.  Programming == Choice.  And any
product that could automatically offer a programmer
choices in approach to a given situation would be valuable
asset indeed.

The big problem was where to obtain the knowledge to
encode into the system?  It didn't take long to realize that
the best way to get the widest available base of
programming experiences was to reach out to you the CA-
Clipper community and ask for your help.  The idea of
harnessing thousands of years of collective programming
experience from the corners of the CA-Clipper community
excited me.

Imagine, being able to have hundreds of your peers
critiquing your source code simultaneously in the privacy
of your own office or home, without them even knowing
it!  It seemed to be a natural conclusion that anyone using
a tool that embodied the collective knowledge of an
industry, would be sure to benefit from that knowledge in
some way.  This is the basic principal underlying
ThinkTank:  Tap an industry's experience in order to
provide new solutions to old problems.

So we decided to actively solicit the CA-Clipper
community for rules or heuristics to be added to our expert
knowledge base of rules.  We devised a rule template and
disseminated it throughout the CA-Clipper community, so
that contributors can formally describe their tips and tricks
for inclusion in the product with full credit.  We offer a
free copy of the software to anyone who contributes a rule
that is not already in the knowledge base.  It is our hope
that this unique approach will instill a sense of team-work
across entire industry toward a single goal - building a
software program that can collectively raise the level of
our industry's programming abilities.



The Possibilities are Endless

In the most general sense, ThinkTank is designed to allow
you to retrieve as much information about the structure of
your system as you need.  Whether it is a system you are
intimately familiar with and has just become too large to
keep track of, or you are inheriting someone else's legacy
system, you will find ThinkTank an easy way to analyze
the details of your source code that can  elude you during
casual inspection.

With the built in macro language, the analysis can be for
any purpose you choose - from finding out information on
how better to optimize your program, to simply breaking
down the details of your code's operation to gain a better
understanding of it.

While measuring your source code against the insights of
the industry will be of interest to many ThinkTank users,
there are countless other uses for the product as well.  For
example, because of the system's open architecture it is
entirely possible to write your own translators or mini code
generators with it.  You could, for example, write a
heuristic to find all databases that your program opens,
and then use the DBSTRUCT() function to determine their
structures.  With these structures in hand you could
automatically generate header files of manifest constants
which map field names to field positions.  Such header
files could be used to write array-based scatter/gather
routines that are easy to read and maintain.

Another possibility would be to write a heuristic that
would actually automatically make the changes that the
tool recommended to you!  Since you have access to every
aspect of the your source code (including the original
source itself), there is nothing stopping you from
generating any type of output you wish including
augmented source code.  While the first version of the
ThinkTank will not make changes to your source code, a
natural progression for this product is to be able to apply
the changes it recommends, automatically to your code
upon your confirmation.  Well, you wouldn't have to wait
for us to implement this feature if you really wanted it
now.  ThinkTank Ver. 1.0 will provide everything you
need to do it yourself!

If you are really feeling up for a challenge, consider that
you have everything you need to do your own type
inferencing in order to automatically generate strongly-
typed CA-Visual Objects or C/C++ code from your CA-
Clipper applications.  ThinkTank provides you all the
information inherently embedded in your application in a
familiar relational database format, along with a powerful
macro language that allows you to manipulate it any way
imaginable.

Virtually any tedious, mechanical task that you carry out
on your source code today, can be automated with an
appropriate heuristic in ThinkTank.  Many such ideas
jump to mind and ThinkTank as a tool has the ability to
spawn many interesting "spin off" products and services.
Whether it be a specific set of rules for a given domain
(e.g. an OOP rules set for Class(y) users, or a rules set to
help spot problem areas when converting CA-Clipper to
CA-Visual Objects), or a separate product such as one of
the code generators or translators mentioned above,
ThinkTank has all the industrial strength power to help
you mine the wealth of information that lies dormant in
application source code.



An Eye to the Future

Finally, I did not neglect the future when I designed this
product.  It has been clear to me for a very long time that
programmers have a lot in common regardless of the
language they use.  They face the same problems every
day, only their problem domain really changes.  That's
why ThinkTank was designed to be largely language
independent.

The first version of ThinkTank will support only the CA-
Clipper language.  However, we fully expect to provide
compilers and knowledge bases for other languages such
as CA-Visual Objects, C/C++, COBOL, Pascal, xBase,
Basic, etc.,  in the future.  The secret to being able to
support all these different languages with essentially the
same ThinkTank software is the that the database that is
generated by the core compiler abstracts away the details
of any given language.  It has tables such as Variable &
Routine Declarations, and deals with generic concepts
such as Loops & Decision Constructs and Expressions
with Operators and Constants, etc.  These notions are
common to  all modern programming languages.
Thus ThinkTank is already well on its way to supporting
multiple languages by simply supplying multiple front-end
core compilers and of course, language specific knowledge
bases which capture the nuances of each language.



Bibliography

Rod da Silva is president of Software Perspectives,
Toronto, Canada, and the chief architect of ThinkTank.
He is a professional trainer specializing in CA-Visual
Objects and CA-Clipper.  Rod has been a featured
speaker at technical conferences all over the world.  He is
contributing editor and a regular columnist in the Clipper
Advisor magazine and has also been published in several
other CA-Clipper journals.  He was runner-up winner of
Nantuckets Clipper Champion Award for outstanding
application development in 1991.


