Monday, December 31, 2012

Top 12 on Maven Central

For my project, which I am hoping to share more about soon, I am having a full copy of Maven Central and some other repositories. Since the work I do is related to dependencies I have an list of artifacts in ranking order. I based this ranking on the popularity (number of transitive inbound dependencies) and its weight. Dependencies are calculated per program, using its latest version. There were almost 40.000 programs in the database. This is not an exact science, some heuristics were used. However, having a top ten to close 2012 sounds interesting.

#1 Hamcrest Core — Never heard of it before. It turns out that this is a library that adds matchers to Junit, making test assertions more readable. Its (for me unexpected) popularity is likely caused by JUnit that depends on it (actually embeds it). The number of inbound dependencies is almost equal (27772 versus 27842 for Hamcrest).

#2 JUnit — is a regression testing framework written by Erich Gamma and Kent Beck. It is used by the developer who implements unit tests in Java. It has more than 10000 direct dependent projects and is likely the most dependent upon project.

#3 JavaBeans(TM) Activation Framework — The JavaBeans(TM) Activation Framework is used by the JavaMail(TM) API to manage MIME data. It is, for me, a perfect example of a library that was over designed during the initial excitement of Java. It has a complete command framework but I doubt it is used anywhere. However, the Javamail library did provide a useful abstraction and it depended on the activation framework.

#4 JavaMail API — The illustrious Java Mail library, developed before there even was a Java Community Process. Provides functionality to mail text from Java (which few people seem to know can also be done with the URL class, but that is another story). Still actively maintained since the artifact was updated less than 10 months ago.

#5 Genesis Configuration :: Logging — Provides the common logging configuration used by the build process, primarily used to collect test output into 'target/test.log'. Surprisingly, it has over 20.000 transitive inbound dependencies. Likely caused by the fact that it looks like every Geronimo project depends on it.

#6 oro — I remember using Oro somewhere south of 1999, it was a regular expression library since Java before 1.4 did not support regular expressions. It turns out that Oro was retired 7 years ago and should not be used anymore. Still it has also over 20.000 dependencies. At first sight, many Apache projects still seem to depend on it even though it recommends that the Java regular expressions should be used.

#7 XML Commons External Components XML APIs — xml-commons provides an Apache-hosted set of DOM, SAX, and JAXP interfaces for use in other xml-based projects. Our hope is that we can standardize on both a common version and packaging scheme for these XML standards interfaces to make the lifes of both developers and users easier. The External Components portion of xml-commons contains interfaces that are defined by external standards organizations. Has not been updated for 7 years (I guess XML's heydays are over by now). 
#8 OpenEJB :: Dependencies :: JavaEE API — An open source, modular, configurable and extendable EJB Container System and EJB Server. The popularity of this library is likely caused by the fact that it has an inbound dependency of log4j.

#9 & #10 mockobjects:mockobjects-core — A library to make mock objects. It is over 8 years ago when it was updated but it still has more than 20.000 inbound dependencies.

#11 org.apache.geronimo.specs:geronimo-jms_1.1_spec — Provides a clean room version of the JMS specification. Since this ended so surprisingly high up I look where its popularity came from. It turns out that again here log4j is the culprit.

#12 Apache Log4j — Which brings us to the artifact that is pushing all these previous artifacts to greater heights than they deserve. log4j is directly referenced by a very large number of projects. The following image shows its dependency tree:
Why a log library should depend on the Java EE API is a bit of a puzzle. Anyway, happy 2013! Peter Kriens

Monday, December 17, 2012

The Looming Threat to Java

A meteorite likely caused the demise of dinosaurs; since that time we tend to use the term dinosaurs for people that are too set in their ways to see what is coming. Though an awful lot of practitioners still feel Java is the new kid on the block we must realize that the languages is in its mid life after 20 years of heavy use. The young and angry spirits that fought the battle to use Java over C++ have long ended up in the manager's seat. Java today has become the incumbent so, can we can keep on grazing the  green and lush fields and not having to worry about any meteorites coming our direction?

In 1996 Applets were the driving force behind Java in the browser. They were  supposed to bring the programmability to the browser in an attempt to kill of Microsoft's dominance on the desktop. While applets got totally messed up by Sun due to a complete lack of understanding of the use case (they did it again with Web Start), Java's silly little brother Javascript grew up and  has recently become an exciting platform for UI applications. With the advent of the Web Hypertext Application Technology Working Group (WHATWG) that specified HTML 5 we finally have a desktop environment that achieves the dream of very portable code with an unbelievable graphic environment for a large range of devices.

"Great", you think "we support HTML5 and Javascript from our web frameworks. So what's the problem?" Well, the problem (for Java at least) is that AJAX now has grown up and is calls itself JSON. Basically, all those fancy Java web frameworks lost there reason of existence. The consequence of a grown up programming environment in the browser is that the server architecture must adapt or go in extintction. Adapt in a very fundamental way.

One of the primary tenets of our industry is encapsulation. Best practice is to hide your internal data and do access through get/set methods. On top of these objects we design elaborate APIs to modify those objects. As long as we remain in a single process things actually work amazingly well as the success of object oriented technology demonstrates. However, once the objects escape to other processes, the advantages are less clear. Anybody that has worked with object relational mapping (JPA, Hibernate, etc) or communication architectures knows the pain of ensuring that the receiver properly understand these  "private" instance fields.You might have a chance in a homogenous system under central control but in an Internet world such systems are rare and will be rarer. Unfortunately, clinging to object oriented technologies has given us APIs that work very badly in large scale distributed systems.

The first time I became aware of this problem was with Java security in 1997. The security model of Java is very object oriented, hiding the semantics of the security grant behind a user defined method call (implies). Though very powerful its cost is very high. Not only is it impossible to optimize (the method call is not required to answer the same answer under the same conditions), it is also virtually impossible to provide the user interface with this authorization information. Though a browser based program cannot be trusted to enforce security, the authorization information is crucial to make good user interfaces. Few things are more annoying than being able to push a button and then be told you're not allowed to push that button. Such an unauthorized button should obviously not have been visible in the first place. Remote procedure calls for such fine grained authorization calls are neither feasible nor desirable from a scalability point of view.

Another, more recent problem is the JSR 303 data validation API. This specification uses a very clever technique to create elaborate validation schemes. Incredibly powerful but due to reliance on inheritance and annotations. When the UI is built in the server, this provides a neat tool but when the UI is executed remotely you are stuck with a lot of obtuse information that is impossible to transfer to the browser where the user can be guided in providing the right input. Simple regular expressions might not be nearly as powerful but are trivial to share between browser and server.

The last example is just plain API design. Most of the APIs I've designed heavily rely on object references. The reference works fine in the same VM but has no meaning outside this VM. Once you go to a distributed model you need to have object identities that can travel between processes. Anybody that needs to provide an API to MBeans knows how painful it is to create a distributed API on top of a pure object oriented API. It requires a lot of mapping and caching code for no obvious purpose. A few weeks ago I tried to use the OSGi User Admin but found myself having to do this kind of busy-work over and over again. In the end I designed a completely new API (and implementation) that assumes that today many Java APIs must be useful in a distributed environments.

To prevent Java from becoming obsolete we must therefore rethink the way we design APIs. For many applications today the norm is being a service in a network of peers, where even the browser is becoming one of the peers. Every access to this service is a remote procedure. Despite the unbelievable increase in network speed a remote procedure call will always be slower than a local call, not to mention the difference in reliability. APIs must therefore be designed to minimize roundtrips and data transfers. Instead of optimizing for local programs I think it is time to start thinking global so we can avoid this up-coming meteorite called HTML5.