Issue #002
March, 1996


Contents:

Comparing C/C++ With Java Part 2 - Sizes of Primitive Types
Chars, Unicode, and File I/O
A Way of Doing Class Initialization
What Happens When You Output a Character?
An Annotated Example of Java Usage
Interfacing to an Applet


INTRODUCTION

In this issue we will continue to introduce Java the language, with
the centerpiece of the issue a substantial annotated program example.
We will also talk about Java I/O in several contexts, and show an
interesting technique for doing class initialization.


COMPARING C/C++ WITH JAVA PART 2 - SIZES OF PRIMITIVE TYPES

If you've used C or C++ at all you will be familiar with common
fundamental types like char, int, and double.  Java also has these
types, but with a couple of twists.

The first new angle is that the types are of uniform size across all
Java implementations.  Specifically, sizes in bits are:

        boolean         N/A

        byte            8

        char            16

        short           16

        int             32

        long            64

        float           32

        double          64

The boolean type is not integral and so no size is listed.  It can
have the values true and false.  The character type is 16 bits using
the Unicode character set, more about which below.

The advantages of uniform sizes are obvious.  Even today it is still
very easy to stumble across code that is non-portable because someone
assumed that an "int" would hold more than 16 bits (it doesn't on most
PCs) or that a long and a pointer are the same size.

Java has no sizeof() operator like C and C++.  With uniform data
sizes, and the compiler handling the details of computing the size of
space needed for an allocation statement like:

        long x[] = new long[189];

there is not the same need for such an operator.

One drawback to this approach is that if the size of a data type is
not a "natural" fit with the underlying hardware, some penalties in
performance can be expected.  For example, it's true today that the
natural size for long is 32 bits on many machines, and requiring that
such a type be 64 bits may result in slower code.  But with the rapid
pace of change in hardware, this concern isn't that significant,
especially when weighed against the benefits of uniform sizes.


CHARS, UNICODE, AND FILE I/O

In the last section we mentioned that a char in Java is 16 bits,
stored as two bytes.  The high byte typically is 0, and various of the
Java library classes and methods allow one to specify the high byte. 
Here's an example of byte and character I/O that illustrates some of
these points, in a file "uni.java":

        import java.io.*;

        public class uni {
                public static void main(String args[])
                {
                        InputStream istr = null;

                        try {
                                istr = new FileInputStream("testfile");
                        }
                        catch (FileNotFoundException e) {
                                System.err.println("*** file not found ***");
                                System.exit(1);
                        }

                        try {
                                int b;
                                String s = "";
                                while ((b = istr.read()) != -1) {
                                        s += (char)b;
                                }
                                System.out.print(s);
                        }
                        catch (IOException e) {
                        }

                        System.exit(0);
                }
        }

In this example, we attempt to open a file input stream to an input
file "testfile", catching an exception and bailing out if the open
fails (more about exceptions below).  Note that we don't close the
file explicitly.  This is done by something akin to a C++ destructor,
a method called finalize() that is invoked when garbage collection is
done.  We will talk about this area at some point;  the semantics of
resource cleanup and freeing are different in Java because of delayed
object destruction.

Then we read bytes from the file using the read() method.  The bytes
are returned as ints, so that -1 can be used to indicate end of file
(C has a similar trick with EOF).  We take each int (byte) and cast it
to a character and append it to a String object that we'd initialized
to the empty string.  Finally, we print the string.

A String object has a sequence of characters in it, and we have
converted the input bytes that were read into characters and shoved
them into the string.  Since characters are Unicode, we have converted
a sequence of input bytes into Unicode.

But it's not quite this easy.  In casting to a character, there is the
implicit supplying of a 0 to fill the high byte of the character,
resulting in code that's not very portable.  A better way to express
the line:

        s += (char)b;

would be:

        byte x[] = {(byte)b};
        s += new String(x, 0);

In other words, build a vector of bytes and construct a String from
them, with the high byte fill value explicitly specified.

We will be saying more about Java I/O in the future.  The Java library
has a variety of classes and methods for dealing with input and output
of various types.  The I/O example shown above illustrates a way of
doing low-level input.  There are higher-level mechanisms available in
the library.


A WAY OF DOING CLASS INITIALIZATION

In C++ one can use constructors to initialize class object instances
when they're created, and employ static data members that are
initialized when the program starts.  But what if you'd like some code
to be executed once for a given class, to kind of set things up for
the class?  One way of doing this in Java is to say:

        public class A {
                static {
                        System.out.println("got to static initialization");
                }
                public static void main(String args[])
                {
                        System.out.println("start of main()");
                        A c1 = new A();
                        A c2 = new A();
                }
        }

No matter how many instances of objects of class A are created, the
block of code at the top will be executed only one time.  It serves as
a hook to do class setup at the beginning of execution.


WHAT HAPPENS WHEN YOU OUTPUT A CHARACTER?

The technique shown in the previous section has one very important
use.  When you say:

        System.out.println("x");

what happens?  It's interesting to trace through the sequence of
operations used to output a character.

In the first place, System is a class defined in the Java library.  It
is a wrapper class that you do not actually create object instances
of, nor may you derive from the System class, because it is declared
as "final".  In C++ such a class is sometimes referred to as a "static
global class".

System.out is defined as:

        public static PrintStream out;

meaning that it's available to all and that there is only one object
instance of PrintStream for "out".  This PrintStream stream
corresponds to standard output, kind of like file descriptor 1 in
UNIX, stdout in C, or cout in C++.  Similar streams are established
for input and standard error output.

The output stream is initialized via a static initialization block of
the type illustrated above.  The actual code is:

        out = new PrintStream(new BufferedOutputStream(
            new FileOutputStream(FileDescriptor.out), 128), true);

This is a mouthful that says that a PrintStream is based on a
BufferedOutputStream (with a buffer 128 long) which is based on a
FileOutputStream with a specified file descriptor, and that output is
line buffered.

Saying:

        System.out.println("xxx");

means that you're invoking the println(String) method for a
PrintStream.  Doing so immediately results in the sequence:

        PrintStream.print("xxx");
        PrintStream.write('\n');

PrintStream.print("xxx") contains a loop that iterates over the
characters in the String ("xxx" is a String, not a vector of
characters) calling PrintStream.write() for each.  PrintStream.write()
calls out.write(), implementing line buffering as it goes.

What is out.write()?  When the output stream was initialized, we
created a PrintStream object and said that it should be based on a
BufferedOutputStream.  "out" is an instance variable of a class
FilterOutputStream from which PrintStream derives ("extends"), and out
is set to reference a BufferedOutputStream object.  In a similar way,
BufferedOutputStream is based on FileOutputStream.

out.write() in BufferedOutputStream collects characters into a buffer
(specified in the creation line illustrated above).  When the buffer
becomes full, out.flush() is called.  This results in a different
write() being called in the FileOutputStream package.  It writes a
sequence of bytes to the file descriptor specified when the stream was
created.  This last method is native, that is, is implemented in C or
assembly language and not in Java code itself.

This approach to I/O is quite flexible and powerful, and names like
"stream nesting" and "stream filtering" are used to describe it.  It's
not a terribly efficient approach, however, especially since Java
itself is interpreted and many of the higher layers of the system are
written in Java itself.

One other note:  when trying to figure out just what methods are
called in an example like the one in this section, it's helpful to use
the profiling feature of JDK:

        $ java -prof xxx

This shows called methods, who called them, and how many times they
were called.


AN ANNOTATED EXAMPLE OF JAVA USAGE

Here is a longer example of a complete Java program (not an applet). 
This program does simple expression evaluation, so for example, input
of:

        (1 + 2) * (3 + 4)

yields a value of 21.

If you're not familiar with this sort of programming, similar to what
is found in language compilers themselves, a brief explanation is in
order.  The program takes input and splits it into what are called
tokens, logical chunks of input.  For the input above, the tokens are:

        (
        1
        +
        2
        )
        *
        (
        3
        +
        4
        )

and the white space is elided.  Then the program tries to make sense
of the stream of input tokens.  It implicitly applies a grammar:

        expr -> term | expr [+-] term

        term -> fact | term [*/] fact

        fact -> number | ( expr )

Don't worry too much if you don't understand this.  It's a way of
describing the structure of input.  You can think of it as a way of
converting an input expression into the Reverse Polish Notation that
some older calculators used to use.

Here is the actual program, in a file "calc.java".  We will have more
to say about this program in the next section below.  Annotations are
given in /* */ comments, while regular program comments use //. 
(Note: we're not trying to do anything fancy with comments for JavaDoc
purposes, a subject to be presented another time).

        import java.io.*;
        
        public class calc {
                private String in_line;                 // input line
                private int in_len;                     // input line length
                private int currpos;                    // position in line
        /*
        The input line, its length, and the current position in it.
        */
                private byte curr_tok;                  // current token
                private int val_token;                  // value if num
        /*
        The current token and its value if it's a number.
        */
                private boolean had_err;                // error in parsing
        /*
        Used to record whether a parsing error occurred on the input.
        Exception handling could also be used for this purpose, and
        is used for another type of error (divide by 0).
        */
                private static final byte T_NUM = 1;    // token values
                private static final byte T_LP = 2;
                private static final byte T_RP = 3;
                private static final byte T_PLUS = 4;
                private static final byte T_MINUS = 5;
                private static final byte T_MUL = 6;
                private static final byte T_DIV = 7;
                private static final byte T_EOF = 8;
                private static final byte T_BAD = 9;
        /*
        Possible token values.  These are private (available only to the
        class), static (shared across all class object instances), and
        final (constant).
        */
                // get next token from input line
                private void get_token()
                {
                        // skip whitespace
        
                        while (currpos < in_len) {
                                char cc = in_line.charAt(currpos);
        /*
        in_line.charAt(currpos) returns the current character from
        the string.
        */
                                if (cc != ' ' && cc != '\t')
                                        break;
                                currpos++;
                        }
        
                        // at end of line?
        
                        if (currpos >= in_len) {
                                curr_tok = T_EOF;
                                return;
                        }
        
                        // grab token
        
                        char cc = in_line.charAt(currpos);
                        currpos++;
                        if (cc == '+' || cc == '-')
                                curr_tok = (cc == '+' ? T_PLUS : T_MINUS);
                        else if (cc == '*' || cc == '/')
                                curr_tok = (cc == '*' ? T_MUL : T_DIV);
                        else if (cc == '(' || cc == ')')
                                curr_tok = (cc == '(' ? T_LP : T_RP);
        /*
        This block of code could also be handled via a switch statement
        or in a couple of other ways.
        */
                        else if (Character.isDigit(cc)) {
                                int n = Character.digit(cc, 10);
                                while (currpos < in_len) {
                                        cc = in_line.charAt(currpos);
                                        if (!Character.isDigit(cc))
                                                break;
                                        currpos++;
                                        n = n * 10 + Character.digit(cc, 10);
                                }
                                val_token = n;
                                curr_tok = T_NUM;
        /*
        The above code grabs a number.  Character.isDigit(char) is a method
        of the character class that returns a boolean if the character is a
        digit.  Character.digit(char, int) converts a character to a number
        for a given number base (10 in this case).
        
        The primitive types like char have corresponding class types, though
        you cannot call a method directly on a primitive type object.  You
        must instead use the techniques illustrated here.
        */
                        }
                        else {
                                curr_tok = T_BAD;
                        }
        /*
        The case where the token can't be recognized.
        */
                }
        
                // constructor, used to set up the input line
                public calc(String s)
                {
                        in_line = s;
                        in_len = in_line.length();
                        currpos = 0;
                        had_err = false;
                        get_token();
                }
        /*
        The constructor sets up an object instance for doing calculations.  We
        set up the input line, clear any error condition, and grab the first
        token.
        */
                // addition and subtraction
                private double expr()
                {
                        // get first term
        
                        double d = term();
        
                        // additional terms?
        
                        while (curr_tok == T_PLUS || curr_tok == T_MINUS) {
                                byte t = curr_tok;
                                get_token();
                                if (t == T_PLUS)
                                        d += term();
                                else
                                        d -= term();
                        }
                        return d;
                }
        /*
        This and the next method are similar.  They grab a term() or fact()
        and then check to see if there are more of them.  This matches input
        like:
        
                1 + 2 + 3 + 4 ...
        
        As each token is consumed, another one is grabbed.
        */
                // multiplication and division
                private double term()
                {
                        // get first factor
        
                        double d = fact();
        
                        // additional factors?
        
                        while (curr_tok == T_MUL || curr_tok == T_DIV) {
                                byte t = curr_tok;
                                get_token();
                                if (t == T_MUL)
                                        d *= fact();
                                else {
                                        double d2 = fact();
                                        if (d2 == 0.0 && !had_err)
                                               throw new ArithmeticException();
                                        d /= d2;
        /*
        This code is similar to expr() above but we check for division by 0
        and throw an arithmetic exception if we find it.  We will see below
        where this exception is handled.
        */
                                }
                        }
                        return d;
                }
        
                // numbers and parentheses
                private double fact()
                {
                        double d;
        
                        // numbers
        
                        if (curr_tok == T_NUM) {
                                d = val_token;
                                get_token();
                        }
        /*
        If a number, retrieve the value stored in val_token.
        */
                        // parentheses
        
                        else if (curr_tok == T_LP) {
                                get_token();
                                d = expr();
                                if (curr_tok != T_RP) {
                                        had_err = true;
                                        d = 0.0;
                                }
                                get_token();
                        }
        /*
        If (, then grab the expression inside and check for ).  If not found,
        record that we had an error.  We could also throw an exception at this
        point.
        */
                        // garbage
        
                        else {
                                had_err = true;
                                get_token();
                                d = 0.0;
                        }
        /*
        The token was not recognized, so we have bad input.
        */
                        return d;
                }
        
                // parse input and get and print value
                public String get_value()
                {
                        double d;
        
                        try {
                                d = expr();
                        }
                        catch (ArithmeticException ae) {
                                return new String("*** divide by 0 ***");
                        }
                        if (had_err || curr_tok != T_EOF)
                                return new String("*** syntax error ***");
                        else
                                return String.valueOf(d);
        /*
        Here is where we actually try to get the value of the expression.  We
        convert its value back to a String for reasons of flexibility in
        handling error conditions.
        
        Division by 0 will result in an exception being thrown and caught
        here.
        
        If we encountered an error, or if we've not exhausted the input string
        (for example, for input "((0)))"), then we also flag an error. 
        Otherwise, we return the string value of the double using the method
        String.valueOf(double).
        */
                }
        
        
                // get a line of input from the keyboard
                private static String getline()
                {
                        DataInput di = new DataInputStream(System.in);
                        String inp;
        
                        try {
                                inp = di.readLine();
                        }
                        catch (IOException ignored) {
                                inp = null;
                        }
        /*
        This is a wrapper function to get a line of input from the keyboard.
        */
                        return inp;
                }
        
                // driver
                public static void main(String args[])
                {
                        String inp = "";
        
                        // command line arguments
        
                        if (args.length > 0) {
                                for (int i = 0; i < args.length; i++)
                                        inp = inp + args[i];
                                calc c = new calc(inp);
                                System.out.println(c.get_value());
        /*
        If there are command-line arguments, we will append them into one
        string using the "+" operator and then evaluate the value of the
        expression. args.length is the number of command-line arguments, and
        args[i] is the i-th argument.
        
        The line:
        
                calc c = new calc(inp);
        
        creates a new calc object and calls its constructor with inp as the
        String argument to the constructor.
        
        c.get_value() returns the expression value as a String.
        */
                        }
        
                        // no command line arguments, prompt user
        
                        else {
                                for (;;) {
                                        System.out.print("input string: ");
                                        System.out.flush();
        /*
        We flush output here because it's normally line buffered and we've not
        output a newline character.
        */
                                        inp = getline();
                                        if (inp == null)
                                                break;
        /*
        End of input.
        */
                                        calc c = new calc(inp);
                                        System.out.println(c.get_value());
                                }
                        }
                }
        }


INTERFACING TO AN APPLET

Suppose that we want to take the above calculator program and call it
from an applet.  How would we do this?  Here's a simple example of an
applet that will interface with the calculator code.

        import java.awt.*;

        public class applet extends java.applet.Applet {
                public void paint(Graphics g)
                {
                        String input_expr = getParameter("input_expr");
                        calc c = new calc(input_expr);
                        String out = c.get_value();
                        g.drawString("Input = " + input_expr, 25, 50);
                        g.drawString("Value = " + out, 25, 75);
                }
        }

This is similar to the applet illustrated in the last issue, save for
the lines:

        String input_expr = getParameter("input_expr");
        calc c = new calc(input_expr);
        String out = c.get_value();

The last two of these we saw in the example above.  The first line
illustrates how one can get parameters passed to the applet from HTML
code, kind of similar to command-line parameters.  The corresponding
HTML to run this applet would be:

        <html>
        <head>
        <title>Interface to Calculator Applet
        </title>
        </head>
        <body>

        <applet code="applet.class" width=150 height=150>
        <param name=input_expr value="1/2/3*4">
        </applet>

        </body>

        </html>

This HTML is similar to that illustrated in newsletter #001, save for
the line:

        <param name=input_expr value="1/2/3*4">

which actually passes in the parameter value.  When this applet is
executed, the result will be something like:

        Input = 1/2/3*4

        Value = 0.666667


ACKNOWLEDGEMENTS

Thanks to Thierry Ciot, Mike Paluka, and Alan Saldanha for help with
proofreading.


SUBSCRIPTION INFORMATION / BACK ISSUES

To subscribe to the newsletter, send mail to majordomo@world.std.com
with this line as its message body:

subscribe java_letter

Back issues are available via FTP from:

        rmii.com /pub2/glenm/javalett

or on the Web at:

        http://www.rmii.com/~glenm

-------------------------

Copyright (c) 1996 Glen McCluskey.  All Rights Reserved.

This newsletter may be further distributed provided that it is copied
in its entirety, including the newsletter number at the top and the
copyright and contact information at the bottom.

Glen McCluskey & Associates
Professional Computer Consulting
Internet: glenm@glenmccl.com
Phone: (800) 722-1613 or (970) 490-2462
Fax: (970) 490-2463
FTP: rmii.com /pub2/glenm/javalett (for back issues)
Web: http://www.rmii.com/~glenm
