JPA2 Serialized Blob Never Really Null

This one got me late on a Friday (actually after work so I was doubly frustrated) and is to do with how JPA2 persists attributes of items that are not of basic Java types and which are also not entities. In order words attributes that implement Serializable and get shoved into BLOB fields.

As a general rule I try to keep my database as BLOB free as possible. I just plain don’t trust something I can’t easily read! Now that I’ve come across this I have another good reason for not trusting them.

One of the entities I need to store in the application I’m currently working on is a collection of configuration parameters for running backups. Some of the backups need to be scheduled so one of the fields in the configuration settings is of type TimerHandle. The idea being that using the handle I can get the Timer back whenever I want to cancel or modify it. This works really well, the only fly in the ointment is that TimerHandle has to be stored in a BLOB.

To cut a long story short there came a point where I needed to get only those backup configurations that had timers set on them. The obvious way of doing that is to execute this JPQL query:

SELECT b FROM BackupConfiguration b WHERE b.timerHandle IS NOT NULL

I’m guessing that just about everyone thinks that is going to work but in actual fact it gives the wrong result. You see when JPA2 persists an object (that implements Serialiazable) there is no such thing as null it would appear. If I have a configuration object that doesn’t have a timer handle set when I check the database I see a BLOB field with five bytes of data in it:

ac ed 00 05 70

which de-serializes into a null field correctly. I’m guessing that what JPA is doing is using an object to represent null and this is the serialized form of that object. Indeed if you Google for those five bytes you will turn up a number of pages (example) that show those bytes as the start of a serialized object. If I create a create a timer I find the first five bytes in the BLOB are:

ac ed 00 05 73

Odd that the fifth byte is different but not totally unexpected. Of course this leaves with with a bit of a problem. There doesn’t appear to be any way to query for only those configurations where the timer handle is null.

Update 1

So I did a bit of reading and I now know what the five bytes “ac ed 00 05 70” mean. The first two tell the system that the stream being read in is a serialization stream. The next two define the serialization protocol version and the last byte indicates that we are dealing with a null object. A byte of 73 tells the system to create a new object. Well, I have a warm fuzzy that I know what the bytes mean but I still don’t know how to query for a null!