Saturday, November 7, 2009

Relational storage using JAXB, JPA and HyperJAXB

Let us imagine a situation when you are writing an application sitting on the ESB, or providing a web service enabled interface. In both cases you will probably end up having some XML Schema describing your data transfer format. Also, good chances are that you will have to have a RDBMS schema, too. Usually both are designed, implemented and documented separately. But when you write your application from scratch, you have a unique possibility to seriously reduce amount of work need to be done. Here I want to share some of my ideas how to achieve this.

Few variants are possible. You can start with DDL, generate JPA-annotated beans using Hibernate Tools for example, and then annotate it with some JAXB stuff (or use something not requiring annotations at all, like XStream). You will be able to get an XSD using some JavaBeans-to-XSD tool (like you have in IBM Rational tools when generating WSDLs for given beans), or just generate a bunch of sample XML files by marshaling some test data and use any modern XML editor to generate XSD by example. But I must warn you that the XSD will be rather ugly, because usually you don't have too much control over the whole process, that's why the proper tooling is vital for success. Nevertheless, this approach is quite common and supported by many powerful tools, such as my favorite IBM Rational Software Architect 7.5.

From the other hand, you can start with some UML and generate Java Beans, then put all the necessary annotations (both JAXB and JPA) inside. But you will have to do a double amount of work fixing the beans structure and hardcoding annotations for both technologies. Once again, you will need to fix the resulting XSD manually, that's for sure.

What I prefer to do is design the data layer entirely in XSD (which maps to Java classes in a more natural way than DDL), then generate Java beans using JAXB or something similar, enrich it with JPA annotations and generate DDL as the result. Sounds pretty nice, but there are some problems, particularly at the JPA annotations stage. The major one is that you lose flexibility in XSD changing, because after each change your beans will be generated again and you will lose all your precious annotations (yes, you can keep JPA configuration in XML, but this approach has its own drawbacks, and it's beyond the scope of this post anyway).

Fortunately there exists a tool which reduces (and in some cases can even eliminate it at all) the amount of the manual work at the JPA stage. It is implemented by Aleksei Valikov and called HyperJAXB 3 (version 1 is outdated and version 2 supports Hibernate instead of pure JPA). It is more-or-less a plugin set for XJC, adding JPA annotations to JAXB-generated beans. It is distributed as an easy to use bundle containing everything (including build.xml) to start converting your XSDs to JAXB + JPA enabled Java Beans.

Here comes the concepts of relationships and keys. The problem is, when not specified explicitly, HyperJAXB creates an autogenerated integer ID for every entity, which is not always what you really want to get. Let me explain. I want to automate not only the data schema creation, but also the API to access data. For example, when there comes an XML from the client, I don't want to check if the same piece of data is already present in my DB (in fact it can be a hell lot of work to do, especially when some complex, deeply nested XML structures are considered). I want the intelligent middleware to merge it automatically with the existing data. But it won't happen by default, because of the autogenerated IDs, which are always issued new. So, if you will try to persist the same XML twice, it will put two copies in your DB, which is probably not what you expect to see.

To fix this behavior, you need to explicitly specify the meaningful IDs for your entities. With HyperJAXB you can specify it directly in your XSD by marking some of the entity fields with the special XSD annotation in the same way you document your schemas (For details please consult HyperJAXB documentation. Yes, you still can't use the XSD uniqueness constraints support because it's a bit too hard to parse by XJC-like tool). Same is true for the relationship options – for example, you can define one-to-many with all the possible JPA attributes, so it's quite flexible in case you need it.

So far, so good. Things start being complicated when you have compound IDs (and I think it happens really soon in the real life applications). Fortunately HyperJAXB gained support for IdClass few weeks ago. But here comes another problem of “cascading” IDs – when you have a complex primary key (in entity A) used as a foreign key in another entity (B), which at the same time is a part of its (B's) primary key. We had a brief discussion on this topic with Aleksei (BTW, I want to say THANKS! for his great instant support for HyperJAXB and related stuff), and it seems to be too hard to implement in the foreseeable future due to some limitations of XJC extension mechanisms. So, the only way out is to tweak generated beans manually. Well, I think it is something like 20% of the job, and another 80% is done automatically.

So, in the end the results are pretty sweet. You design your data structure as XML Schema (which itself can be generated based on XML samples). Then you put some metadata (mostly ID information) inside to make sure it is complete. Now you run HyperJAXB against your XSDs and fix the generated Java Beans in case you need to handle some “complex” situations. Then, using one of the tools (either Hibernate Tools or OpenJPA Mapping Tool) you get DDL in a matter of seconds. After that you are able to (un)marshal your entities in a single line of code using JAXB, and load / merge them from / to any RDBMS in a single LOC using JPA. That totally eliminates all the serialization and storage-related code, reducing complexity, improving reliability, bla-bla-bla :)

P. S.: I must warn you that there are some issues when using this approach with different JPA providers. I've tried both Hibernate and OpenJPA and they seem to be not really compatible with each other for anything but the most trivial models. It worth a separate post and I'm not going to explain it here, just to make it short: I prefer OpenJPA over Hibernate nowadays, because I find it less restricting in defining relationships and complex IDs. Also, from the tooling point of view, those are pretty close.

P. P. S.: HyperJAXB functionality is growing rapidly, so I won't be surprised to see the majority of the issues mentioned in this post addressed in the upcoming releases. I wish this project good luck and can't wait to see it gaining popularity in the JEE development community.

P. P. P. S.: Code samples and / or detailed tutorial will come later, because it is a considerable amount of work and I'm pretty busy nowadays. Though, if you have any questions regarding the aforementioned approach, please feel free to ask.