Sunday, May 31, 2009

The Decorator Pattern

I volunteered to do a presentation to the Java User's Group at my work and adapted an old lecture to discuss the Decorator design pattern. I thought this would also be a good post too. Source code is available from a link at the bottom of the post.

Patterns are techniques used to approach solving a common programming problem. They are object-oriented, but language independent. The Decorator pattern is a very useful pattern, but not one that gets a lot of attention. The "Gang of Four" book (by Gamma, Helm, et.al. see below) defines the purpose of this pattern as:
Attach additional responsibilities to an object dynamically. Decorators provide a flexible alternative to subclassing for extending functionality.
A decorator class adds a new capability to an existing object by impersonating the original object and intercepting methods calls to it. By intercepting it can modify the normal result that the original method call would have. To impersonate the original, both the original class and the decorator class must implement the same interface.


Typically, the decorator object's methods will perform some additional operation and also call the original object's method. This enhances the result of the original operation in some way.



The advantages of the decorator pattern are:
  • Decorators can be chained to add multiple behaviors
  • Very flexible reuse of code
  • Can be runtime decision (unlike inheritance)
  • Avoids huge inheritance trees
As an example of using the decorator pattern consider Java's InputStream/OutputStream classes. Java defines subclasses of these (FilterInputStream/FilterOutputStream) specifically designed to create decorators.

Let's look at how to create a decorator for any OutputStream using FilterOutputStream.


The out attribute of the FilterOutputStream is a reference to the OutputStream that the class is decoratoring. This attribute is initialized by the constructor of FilterOutputStream. The default implementations of the FilterOutputStream methods just forward their call exactly to its out attribute.

We'll create a decorator named DebugOutputStream that traces the method calls on an OutputStream. First lets look at the beginning of its definition:

public class DebugOutputStream extends FilterOutputStream {

  private String name;

  public DebugOutputStream(String name, OutputStream out) {
    super(out);
    this.name = name;
}

The name parameter is used to label the output of this particular DebugOutputStream (useful if there is more than one). The out parameter is passed up to the FilterOutputStream and is used to initialize it's protected out attribute.

Let's look at what the close method of DebugOutputStream:

public void close() throws IOException {

  StringBuilder msg = new StringBuilder(
     "DebugOutputStream(" + name + "): called void close()");
  try {

    out.close();   // Call to decorated object

  } catch( IOException e) {

    msg.append( " Exception: " + e + " thrown");
    throw e;

  } catch( RuntimeException e) {

     msg.append( " Exception: " + e + " thrown");
    throw e;

  } finally {

     System.out.println( msg);

  }
}
Most of the code is composing a message to print out information about the call to close on the decorated object (i.e., referenced in the out attribute).

Here's an example of using the DebugOutputStream to spy on how GZIPOutputStream works.


InputStream input = new BufferedInputStream( 
  new FileInputStream( args[0]));
OutputStream output = new DebugOutputStream( "GZIPOutputStream",
  new GZIPOutputStream(
  new DebugOutputStream( "FileOutputStream",
  new FileOutputStream( args[1]))));

byte [] buffer = new byte[4096];
int numRead;
while( (numRead = input.read(buffer)) > 0) {

  output.write( buffer, 0, numRead);

}
input.close();
output.close();

Note that the BufferedInputStream is acting as a decorator to the FileInputStream. Two DebugOutputStreams are used to trace themethod calls at two places. The first one (given the name "GZIPOutputStream") will trace the calls made to the GZIPOutputStream. The second one will trace the calls made to the FileOutputStream.

The output of this program on a sample input file is:
DebugOutputStream(FileOutputStream): called void write( byte []) (10 bytes)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 4096)
DebugOutputStream(GZIPOutputStream): called void write(byte [], 0, 3424)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 512)
DebugOutputStream(FileOutputStream): called void write(byte [], 0, 253)
DebugOutputStream(FileOutputStream): called void close()
DebugOutputStream(GZIPOutputStream): called void close()

From this we can see that first a 10 byte header is written out to the file, then we see the data is being given to the GZIPOutputStream in 4096 byte chunks. But the GZIPOutputStream is not writing the data out immediately. Only after all the data from the input file has been exhausted is the compressed data written out to the file and in 512 byte chunks. For a bigger input file, compressed data would have been written out before the end. GZIPOutputStream has some internal cache that is written out when it fills up or the stream is closed. The latter is happening here.

I also used this same technique to help with layout problems with Swing panels. One can create a JComponent or JPanel decorator that prints out how the methods used by LayoutManagers (like getSize(), getPreferredSize(), etc.) to see how the LayoutManager is interrogating your class to lay it out.
Next, let's look at another use of a decorator to add a more practical enhancement to an input/output stream, encryption. I am not an expert on cryptology at all, so please don't use this code for any information you really need to secure! To encrypt data, I exclusive OR the data with a sequence of pseudo-random bytes as a mask. I then use the fact that:
(x ^ mask) ^ mask == x
to decrypt the data (^ is the exclusive OR operator in Java, C and C++). By exclusive OR'ing the encrypted data with the same sequence of pseudo-random bytes I will get the original data back.
The abstract EncryptingOutputStream class extends FilterOutputStream and overrides the write(int) and write( byte [], int, int) methods. The write( byte []) method calls write( byte[], int, int) so I don't need to override it. An abstract method void encrypt( byte []) is defined that must be overridden to do the actual encryption of the data. (This is an example of the Template Method pattern.)
Similarly, there is an abstract DecryptingInputStream class that extends FilterInputStream and overrides the int read() and int read( byte [], int, int) methods. It defines an abstract method void decrypt( byte [], int, int) that must decrypt the data in the specified section of the array passed to it.
Concrete classes named XOREncryptingOutputStream and XORDecryptingInputStream are defined that define the appropriate abstract methods. Both classes are passed a key to their constructors. This key is used as a seed to a random number generator. If the key is the same, the same pseudo-random numbers will be generated to encrypt/decrypt the data.
As an example, putting all this together I compressed then encrypted (with the key 1234567890) a web page and put it on my web site at http://www.drpaulcarter.com/mystery. Let's look at how to use the classes discussed here to read the web page. We need to create a chain of InputStreams that read the web page, decrypt it and then uncompress it. We can do that with the following code:

new GZIPInputStream(
        new XORDecryptingInputStream(
        new URL(url).openStream(), key)));
The url variable is string containing "http://www.drpaulcarter.com/mystery" Then we only need to read from the GZIPInputStream as we would any InputStream. Here's a sequence diagram showing what happens as data is read:

To download and display the web page type:
java test.DecryptingBrowser http://www.drpaulcarter.com/mystery 1234567890
An important point to realize is the InputStream reading the data from the web server knows nothing about compression or encryption. However, we have dynamically added these capabilities by chaining it to together with decorators that add these capabilities. This is the power of the decorator pattern.
I originally developed this example back in 1998 when I was teaching Java. This was in the Java 1.2 days. In creating this presentation, I discovered that Java 1.4 added encryption streams, javax.crypto.CipherInputStream and javax.crypto.CipherOutputStream. They both use a javax.crypto.Cipher object to do the encryption and decryption. This is yet another pattern, the Strategy pattern.
The full example code of the classes described here can be found here.

Recommended books:


Wednesday, May 13, 2009

typedef in C and C++

I like C++, but it is a complex language. After programming in it for 19 years, you'd think I would know it pretty well, but I'm still learning things even now.

Recently, I discovered that typedef works differently in C++ than in C. In C, you can't repeat a typedef even if it's exactly the same as the original definition. For example:

typedef int T;
typedef int T;

will give you a compiler error in C. However, this is perfectly legal in C++. I'm pretty sure this wasn't the case in early C++. I wonder when this changed? Anybody know?

I found a very good web site for the differences in C and C++ here:
http://david.tribble.com/text/cdiffs.htm

Saturday, March 21, 2009

Finding the source of an error in an inlined function using gdb

Yesterday at work, I was trying to find the source of a crash from a coredump using gdb and discovered that if the crash was caused by inlined code, the stack trace that gdb shows does not show you the information you might need to determine the exact location. In this case, gdb told me that the crash occurred in a call to std::string::size(). However, there was no call to this method in the next function up in the stack trace. The reason was that std::string::size() was being called from another inlined function, but gdb wasn't showing what that function that was. After fumbling around for about 30 minutes I was able to figure out that the crash was happening inside a std::map::find() call (the map's key was a std::string). I want to document the procedure that I used here.

Let's use a simple program that illustrates the problem (in a file named gdbtest.cpp):

#include <cstdio>

inline void f1( int *p)
{
(*p)++; // Line 5
}

inline void f2( int * p)
{
std::printf("%p\n", p); // Line 10
f1(p);
}

int main()
{
int i;

f2(&i);
std::printf( "%d\n", i); // Line 19
f2(0);
std::printf( "%d\n", i);
f2(&i);
std::printf( "%d\n", i);

return 0;
}

Here it's obvious which line is going to cause the segmentation fault, but the program is useful to demonstrate the general technique.

Build the executable with:
g++ -g -finline -o gdbtest gdbtest.cpp
The -finline flag tells g++ to inline the functions even in debug mode.

Make sure that core files are enabled by typing:
ulimit -c unlimited

Now run the program:
./gdbtest

The output I get from a run (I'm running 64-bit Linux):
0x7fff2cdee1fc
1
(nil)
Segmentation fault (core dumped)

Start up gdb with the corefile:
gdb gdbtest core

gdb displays it's header, etc. at the bottom appears:
#0 0x00000000004005fb in main () at gdbtest.cpp:5
5 (*p)++;
(gdb)

The hex address will probably be different for you. If you type where (or info stack) to see the stack trace you get (user input is in italics):
(gdb) where
#0 0x00000000004005fb in main () at gdbtest.cpp:5

This is not very useful, there is only one stack frame shown, main's. This does not show us which line in the actual body of the main function is the source of the segmentation fault. Line 5 is the line in the inlined f1 function that is called multiple times.

The key to determining the real location is the hex address listed. This is the address of the machine instruction that caused the fault. You can use the info line gdb command to show the source line that maps to a code address.

Typing:
info line *0x4005fb
displays:
Line 5 of "gdbtest.cpp" starts at address 0x4005f7 and ends at 0x400606 .

Again the actual address values will probably be different for you. Here gdb is telling you that line 5 maps to the code from address 0x4005f7 to 0x400606. For ordinary code, a source line will map to only a single range of code. However, inline code will be mapped to multiple places. Every place that the code is inlined to will be mapped to the inline source. In the example above, line 5 will mapped to three different places since it is used it f2 which itself is inlined and used in three different places. The problem is to find which of these three places is causing the segmentation fault.

The key is to look at the code around the faulting code to determine where we are in main. From the above, we see that line 5 starts at address 0x4005f7, so the previous instruction will end at address 0x4005f6, we can use info line to see what line that is:
(gdb) info line *0x4005f6
Line 10 of "gdbtest.cpp" starts at address 0x4005dc and ends at 0x4005f7 .

This tells us that the line executed before the crashing line was line 10. This is just what we expected, but doesn't tell us which call to f2 caused the crash. If we continue moving backward, using address 0x4005db (the address right before 0x4005dc where line 10 starts):
(gdb) info line *0x4005db
Line 19 of "gdbtest.cpp" starts at address 0x4005c2 and ends at 0x4005dc .

Line 19 is in the body of main and this tells us that it is second call to f2 (line 20) that caused the crash.

It appears that gdb has added a new option to the disassemble command, /m that would display the assembly code with addresses mixed in with the source code. This would allow one to easily determine the location. However, this option is not in the most recent version of gdb I have access to (6.8)

Sunday, March 08, 2009

UML Considered Harmful

At work, we use Unified Modeling Language (UML) to document our designs. I think UML has its place as part of the design process and documentation, but I believe that UML is not in itself sufficient to document a non-trivial design. I have been to numerous design reviews where a few UML class and sequence diagrams are the only description of the design. I find these inadequate to communicate more than a vague idea of the overall design.

For a complete description of the design, I feel that a full design document with words and pseudocode is a requirement. I get the impression that UML diagrams are being used as a way to say that a design is documented (for documentation requirements like CMMI) without really providing any meaningful documentation.

If anyone is out there, how is your company documenting design? Do you think your process works?