Sunday, February 28, 2010

Keep It Local

Deciding whether a variable should be a member of a class or a local variable may not be clear to newer developers. I have always found the following rule of thumb to work best:
If it can be local, make it local.

That is, if at all possible, make the variable local. Making it a class member should be a last resort, not the default. Why should local be preferred over a class data member? For the same reason that local variables are preferred over global variables. Local variables are easier to understand. They only exist in the block they are defined in and can only directly influence code in this block. Data members exist for the entire lifetime of the object they are part of. Determining if they are initialized correctly and their effect requires one to look at every method of the class. In a multi-threaded environment, synchronization of access to data members is another complication that local variables do not have. Data members also increase the memory footprint of your objects.

Variables that should be data members are ones that hold the persistent state of the object that must be maintained between method calls. Variables that hold transient values that are only needed temporarily should be local variables.

So when reviewing a class, look carefully at each of its data members and be sure that you can justify why it's a data member and not a local variable. If you can't justify it, fix it.

Sunday, May 31, 2009

The Decorator Pattern

I volunteered to do a presentation to the Java User's Group at my work and adapted an old lecture to discuss the Decorator design pattern. I thought this would also be a good post too. Source code is available from a link at the bottom of the post.

Patterns are techniques used to approach solving a common programming problem. They are object-oriented, but language independent. The Decorator pattern is a very useful pattern, but not one that gets a lot of attention. The "Gang of Four" book (by Gamma, Helm, et.al. see below) defines the purpose of this pattern as:
Attach additional responsibilities to an object dynamically. Decorators provide a flexible alternative to subclassing for extending functionality.
A decorator class adds a new capability to an existing object by impersonating the original object and intercepting methods calls to it. By intercepting it can modify the normal result that the original method call would have. To impersonate the original, both the original class and the decorator class must implement the same interface.


Typically, the decorator object's methods will perform some additional operation and also call the original object's method. This enhances the result of the original operation in some way.



The advantages of the decorator pattern are:
  • Decorators can be chained to add multiple behaviors
  • Very flexible reuse of code
  • Can be runtime decision (unlike inheritance)
  • Avoids huge inheritance trees
As an example of using the decorator pattern consider Java's InputStream/OutputStream classes. Java defines subclasses of these (FilterInputStream/FilterOutputStream) specifically designed to create decorators.

Let's look at how to create a decorator for any OutputStream using FilterOutputStream.


The out attribute of the FilterOutputStream is a reference to the OutputStream that the class is decoratoring. This attribute is initialized by the constructor of FilterOutputStream. The default implementations of the FilterOutputStream methods just forward their call exactly to its out attribute.

We'll create a decorator named DebugOutputStream that traces the method calls on an OutputStream. First lets look at the beginning of its definition:

public class DebugOutputStream extends FilterOutputStream {

  private String name;

  public DebugOutputStream(String name, OutputStream out) {
    super(out);
    this.name = name;
}

The name parameter is used to label the output of this particular DebugOutputStream (useful if there is more than one). The out parameter is passed up to the FilterOutputStream and is used to initialize it's protected out attribute.

Let's look at what the close method of DebugOutputStream:

public void close() throws IOException {

  StringBuilder msg = new StringBuilder(
     "DebugOutputStream(" + name + "): called void close()");
  try {

    out.close();   // Call to decorated object

  } catch( IOException e) {

    msg.append( " Exception: " + e + " thrown");
    throw e;

  } catch( RuntimeException e) {

     msg.append( " Exception: " + e + " thrown");
    throw e;

  } finally {

     System.out.println( msg);

  }
}
Most of the code is composing a message to print out information about the call to close on the decorated object (i.e., referenced in the out attribute).

Here's an example of using the DebugOutputStream to spy on how GZIPOutputStream works.


InputStream input = new BufferedInputStream( 
  new FileInputStream( args[0]));
OutputStream output = new DebugOutputStream( "GZIPOutputStream",
  new GZIPOutputStream(
  new DebugOutputStream( "FileOutputStream",
  new FileOutputStream( args[1]))));

byte [] buffer = new byte[4096];
int numRead;
while( (numRead = input.read(buffer)) > 0) {

  output.write( buffer, 0, numRead);

}
input.close();
output.close();

Note that the BufferedInputStream is acting as a decorator to the FileInputStream. Two DebugOutputStreams are used to trace themethod calls at two places. The first one (given the name "GZIPOutputStream") will trace the calls made to the GZIPOutputStream. The second one will trace the calls made to the FileOutputStream.

The output of this program on a sample input file is:
DebugOutputStream(FileOutputStream): called void write( byte []) (10 bytes)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 3424)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 253)
DebugOutputStream(FileOutputStream): called void close()
DebugOutputStream(GZIPOutputStream): called void close()

From this we can see that first a 10 byte header is written out to the file, then we see the data is being given to the GZIPOutputStream in 4096 byte chunks. But the GZIPOutputStream is not writing the data out immediately. Only after all the data from the input file has been exhausted is the compressed data written out to the file and in 512 byte chunks. For a bigger input file, compressed data would have been written out before the end. GZIPOutputStream has some internal cache that is written out when it fills up or the stream is closed. The latter is happening here.

I also used this same technique to help with layout problems with Swing panels. One can create a JComponent or JPanel decorator that prints out how the methods used by LayoutManagers (like getSize(), getPreferredSize(), etc.) to see how the LayoutManager is interrogating your class to lay it out.
Next, let's look at another use of a decorator to add a more practical enhancement to an input/output stream, encryption. I am not an expert on cryptology at all, so please don't use this code for any information you really need to secure! To encrypt data, I exclusive OR the data with a sequence of pseudo-random bytes as a mask. I then use the fact that:
(x ^ mask) ^ mask == x
to decrypt the data (^ is the exclusive OR operator in Java, C and C++). By exclusive OR'ing the encrypted data with the same sequence of pseudo-random bytes I will get the original data back.
The abstract EncryptingOutputStream class extends FilterOutputStream and overrides the write(int) and write( byte [], int, int) methods. The write( byte []) method calls write( byte[], int, int) so I don't need to override it. An abstract method void encrypt( byte []) is defined that must be overridden to do the actual encryption of the data. (This is an example of the Template Method pattern.)
Similarly, there is an abstract DecryptingInputStream class that extends FilterInputStream and overrides the int read() and int read( byte [], int, int) methods. It defines an abstract method void decrypt( byte [], int, int) that must decrypt the data in the specified section of the array passed to it.
Concrete classes named XOREncryptingOutputStream and XORDecryptingInputStream are defined that define the appropriate abstract methods. Both classes are passed a key to their constructors. This key is used as a seed to a random number generator. If the key is the same, the same pseudo-random numbers will be generated to encrypt/decrypt the data.
As an example, putting all this together I compressed then encrypted (with the key 1234567890) a web page and put it on my web site at http://www.drpaulcarter.com/mystery. Let's look at how to use the classes discussed here to read the web page. We need to create a chain of InputStreams that read the web page, decrypt it and then uncompress it. We can do that with the following code:

new GZIPInputStream(
        new XORDecryptingInputStream(
        new URL(url).openStream(), key)));
The url variable is string containing "http://www.drpaulcarter.com/mystery" Then we only need to read from the GZIPInputStream as we would any InputStream. Here's a sequence diagram showing what happens as data is read:

To download and display the web page type:
java test.DecryptingBrowser http://www.drpaulcarter.com/mystery 1234567890
An important point to realize is the InputStream reading the data from the web server knows nothing about compression or encryption. However, we have dynamically added these capabilities by chaining it to together with decorators that add these capabilities. This is the power of the decorator pattern.
I originally developed this example back in 1998 when I was teaching Java. This was in the Java 1.2 days. In creating this presentation, I discovered that Java 1.4 added encryption streams, javax.crypto.CipherInputStream and javax.crypto.CipherOutputStream. They both use a javax.crypto.Cipher object to do the encryption and decryption. This is yet another pattern, the Strategy pattern.
The full example code of the classes described here can be found here.

Recommended books:


Wednesday, May 13, 2009

typedef in C and C++

I like C++, but it is a complex language. After programming in it for 19 years, you'd think I would know it pretty well, but I'm still learning things even now.

Recently, I discovered that typedef works differently in C++ than in C. In C, you can't repeat a typedef even if it's exactly the same as the original definition. For example:

typedef int T;
typedef int T;

will give you a compiler error in C. However, this is perfectly legal in C++. I'm pretty sure this wasn't the case in early C++. I wonder when this changed? Anybody know?

I found a very good web site for the differences in C and C++ here:
http://david.tribble.com/text/cdiffs.htm

Saturday, March 21, 2009

Finding the source of an error in an inlined function using gdb

Yesterday at work, I was trying to find the source of a crash from a coredump using gdb and discovered that if the crash was caused by inlined code, the stack trace that gdb shows does not show you the information you might need to determine the exact location. In this case, gdb told me that the crash occurred in a call to std::string::size(). However, there was no call to this method in the next function up in the stack trace. The reason was that std::string::size() was being called from another inlined function, but gdb wasn't showing what that function that was. After fumbling around for about 30 minutes I was able to figure out that the crash was happening inside a std::map::find() call (the map's key was a std::string). I want to document the procedure that I used here.

Let's use a simple program that illustrates the problem (in a file named gdbtest.cpp):

#include <cstdio>

inline void f1( int *p)
{
(*p)++; // Line 5
}

inline void f2( int * p)
{
std::printf("%p\n", p); // Line 10
f1(p);
}

int main()
{
int i;

f2(&i);
std::printf( "%d\n", i); // Line 19
f2(0);
std::printf( "%d\n", i);
f2(&i);
std::printf( "%d\n", i);

return 0;
}

Here it's obvious which line is going to cause the segmentation fault, but the program is useful to demonstrate the general technique.

Build the executable with:
g++ -g -finline -o gdbtest gdbtest.cpp
The -finline flag tells g++ to inline the functions even in debug mode.

Make sure that core files are enabled by typing:
ulimit -c unlimited

Now run the program:
./gdbtest

The output I get from a run (I'm running 64-bit Linux):
0x7fff2cdee1fc
1
(nil)
Segmentation fault (core dumped)

Start up gdb with the corefile:
gdb gdbtest core

gdb displays it's header, etc. at the bottom appears:
#0 0x00000000004005fb in main () at gdbtest.cpp:5
5 (*p)++;
(gdb)

The hex address will probably be different for you. If you type where (or info stack) to see the stack trace you get (user input is in italics):
(gdb) where
#0 0x00000000004005fb in main () at gdbtest.cpp:5

This is not very useful, there is only one stack frame shown, main's. This does not show us which line in the actual body of the main function is the source of the segmentation fault. Line 5 is the line in the inlined f1 function that is called multiple times.

The key to determining the real location is the hex address listed. This is the address of the machine instruction that caused the fault. You can use the info line gdb command to show the source line that maps to a code address.

Typing:
info line *0x4005fb
displays:
Line 5 of "gdbtest.cpp" starts at address 0x4005f7 and ends at 0x400606 .

Again the actual address values will probably be different for you. Here gdb is telling you that line 5 maps to the code from address 0x4005f7 to 0x400606. For ordinary code, a source line will map to only a single range of code. However, inline code will be mapped to multiple places. Every place that the code is inlined to will be mapped to the inline source. In the example above, line 5 will mapped to three different places since it is used it f2 which itself is inlined and used in three different places. The problem is to find which of these three places is causing the segmentation fault.

The key is to look at the code around the faulting code to determine where we are in main. From the above, we see that line 5 starts at address 0x4005f7, so the previous instruction will end at address 0x4005f6, we can use info line to see what line that is:
(gdb) info line *0x4005f6
Line 10 of "gdbtest.cpp" starts at address 0x4005dc and ends at 0x4005f7 .

This tells us that the line executed before the crashing line was line 10. This is just what we expected, but doesn't tell us which call to f2 caused the crash. If we continue moving backward, using address 0x4005db (the address right before 0x4005dc where line 10 starts):
(gdb) info line *0x4005db
Line 19 of "gdbtest.cpp" starts at address 0x4005c2 and ends at 0x4005dc .

Line 19 is in the body of main and this tells us that it is second call to f2 (line 20) that caused the crash.

It appears that gdb has added a new option to the disassemble command, /m that would display the assembly code with addresses mixed in with the source code. This would allow one to easily determine the location. However, this option is not in the most recent version of gdb I have access to (6.8)

Sunday, March 08, 2009

UML Considered Harmful

At work, we use Unified Modeling Language (UML) to document our designs. I think UML has its place as part of the design process and documentation, but I believe that UML is not in itself sufficient to document a non-trivial design. I have been to numerous design reviews where a few UML class and sequence diagrams are the only description of the design. I find these inadequate to communicate more than a vague idea of the overall design.

For a complete description of the design, I feel that a full design document with words and pseudocode is a requirement. I get the impression that UML diagrams are being used as a way to say that a design is documented (for documentation requirements like CMMI) without really providing any meaningful documentation.

If anyone is out there, how is your company documenting design? Do you think your process works?

Monday, December 22, 2008

Jar files and indexing

Recently at work, I ran into an interesting "feature" of how jar files and indexing work with Java and ant.

For background on indexing see:
http://closingbraces.net/2007/05/13/jarclasspathandindex/

On some other developer's machines, some java applications were failing with NoClassDefFoundError exceptions. However, I personally never saw this error. The jar's with the classes that were not being found were listed in the Class-Path entry of the manifest of the jar that used them. I was extremely puzzled by this. I could fix the problem by explicitly adding the jar's to the classpath, but I didn't understand why I needed to do this when everything worked fine on my machine.

From the title of this post you can probably guess that the problem involved indexing. Our build process (using ant), creates a jar (let's call in myjar.jar) with the <jar> task, then it runs "jar -i" on it to index it (using an ). If all this happens everything works fine. However, there seems to be a race condition between when ant closes the jar file that it created with the <jar> task and when the "jar -i" command is run. When this happens, the "jar -i" command fails because the file is still locked and the index is not created.

I will go into exactly why this caused the error in a minute, but first I want to look at how I first tried to fix this because it also brings up an important point on the problem. The <jar> task has an optional attribute named "index" which can be used to have ant index the file. I tried using this instead of calling "jar -i" on it, but it did not fix the problem. Why didn't it? Because setting index="true" in the <jar> task does not do the same thing as running "jar -i" on the jar! This is very uninituitive! So what is the difference? Using the <jar> task, on the classes in the jar itself are added to the index; however, using "jar -i" the classes in the jar and all the classes used by the jar are added to the index.

Why does this make a difference? The key is in the second link above. Quoting:
The executable jar must either not have a META-INF/INDEX.LST file, or if it does have such an INDEX.LST file this needs to list the contents of both the executable jar and all of the other jars as well. Anything not in this list will not be found on the class path, regardless of the “Class-Path” entry in the manifest...

Two things to note, the filename I see is META-INF/INDEX.LIST (not LST) and the link goes on to say:
and regardless of any command-line “-classpath” (which is ignored anyway).
This did not apply in my case. We are using java -cp file.jar file.MainClass to run our apps, not java -jar file.jar. In a nutshell, if the jar has an index, it's Class-Path entry is ignored.

In my case, our application uses jacorb.jar (from JacORB). It uses two other jars: logkit-1.2.jar and avalon-framework-4.1.5.jar. These are the two jars I had to add to the classpath to fix the problem. My jar has jacorb.jar listed in its Class-Path entry.

If everything builds properly and myjar.jar is indexed using "jar -i", everything works because entries for all the classes used by myjar.jar are included in the index (including the ones in logkit-1.2.jar and avalon-framework-4.1.5.jar. The Class-Path entry is ignored since the index is present.

However, if the "jar -i" command fails, then the Class-Path entry is in effect and classes in jacorb.jar can be resolved, but not the classes that jacorb.jar uses. Why not? Because jacorb.jar was indexed using ant. So, it has an index, but it only contains its own classes, not the ones in the other two jars.

If you want to see this problem yourself. You can download the code I used to verify this from here. It has 3 jars: myjar.jar, dependjar.jar and depend2jar.jar. myjar uses dependjar which uses depend2jar. This is equivalent to myjar.jar which uses jacorb.jar which uses logkit-1.2.jar.

The ant script by default will build the jars without indexing and use jar -i to create indexed versions of the jars named imyjar.jar, idependjar.jar and idepend2jar.jar.
If you run ant with -Dindex="yes" it will index the plain jars (myjar.jar, dependjar.jar and depend2jar.jar) using ant's index method.
To recreate the problem I saw, type:
ant
ant build.dependjar -Dindex="yes"
java -cp myjar.jar myjar.MyClass
The first line builds all the plain jars with no indexing. The second line replaces dependjar.jar with one using ant's index funciton (just like jacorb.jar). The last line demonstrates the error that occurs. You can also verify that adding dependjar.jar to the classpath doesn't fix the error either.

In general I see several ways to fix this problem.
  1. Avoid using indexing
  2. Put all the jars on the classpath
  3. Fully index all jars with "jar -i"
Hopefully this post will help anyone else running into this problem.

Sunday, December 21, 2008

Time to start using this blog?

I guess it might be time to start using this blog.

Here's some background. I'm a software developer and ex-CS professor. When I was teaching, I wrote a book on 32-bit x86 assembly language that I provide free on the web.

In my current job, I mostly program in C++ with some Java. I'm also a big fan of Python. I expect that most of my posts with be dealing with these languages (and assembly of course).