There are no decent XML API's.

Yesterday after struggling briefly with dom4j and trying to convince it not to go fetch external DTD’s (thanks haus bob for showing me the fugly hack required), I became sad and despondent. What is our crime, such that we must be punished with such crappy XML API’s?

The only two contenders to try and address this that I’m aware of are dom4j and jdom. Ideologically, dom4j wins because James Strachan is more competent than Jason Hunter. Practically speaking though, jdom has a much more civilised and natural API.

Unfortunately, jdom seems to be growing uncontrollably, with every release an order of magnitude fatter than the previous one. It also looks like its developers have entirely lost interest in the whole thing (shades of xdoclet), and are now off into lalaland to play with shinier toys. We’re promised a 1.0 version in Q1 2004, which has long since come and gone.

So why can’t there be a decent XML API? Why must a selective list of children nodes consistently violate the principle of least surprise, when it comes to what happens when you insert in it? Why should both of those API’s constantly surprise you with their behaviour, until of course you’ve been brainwashed into it seeming ‘natural’?

So are there any nice java-friendly lightweight wrappers for XML manipulation, or is that a particular problem that’s considered too boring for today’s fast-paced dedicated professional fappers?

36 Responses to “There are no decent XML API's.”

  1. Joseph Ottinger Says:

    You left out EXML and XOM.

    EXML is so useful that WebObjects almost immediately made it impossible to download – you now get it as part of GLUE.

    XOM is an unknown to me, but ERH certainly seems infatuated with it, since it’s his project.

  2. Lyndon Says:

    The perl XML XPath APIs seem pretty good…

  3. MB Says:

    I find myself in a similar situation – sick of JDOM rotting on the vine, generally appalled by dom4j.

    Someone else mentioned XOM, http://www.cafeconleche.org/XOM

    Haven’t used it, but I remember reading an interview with Elliot Rusty Harold, the author,
    on Artima.


    http://www.artima.com/intv/xomdesign.html

    Anyone out there using it, who care to comment? Especially interested in hearing from users of JDOM.

  4. MB Says:

    I find myself in a similar situation – sick of JDOM rotting on the vine, generally appalled by dom4j.

    Someone else mentioned XOM, http://www.cafeconleche.org/XOM

    Haven’t used it, but I remember reading an interview with Elliot Rusty Harold, the author,
    on Artima.


    http://www.artima.com/intv/xomdesign.html

    Anyone out there using it, who care to comment? Especially interested in hearing from users of JDOM.

  5. heard of google hani ? Says:

    Ignorant fool.. xom is what you’re looking for,.

    p.s. heard of google ?

  6. Anonymous Bastard Says:

    If you want a very simple API and absolutely correct XML, XOM IS the way to go.

  7. Rob Misek Says:

    simple. write your own. ;-)

  8. morten wilken Says:

    my sentiments excactly!
    we NEED to have this incorperated in the JDK, and in that case i dont much care if it is jdom or dom4j style…
    case in point: Rome, which uses jdom. I personally use dom4j in all my applications and it annoys me to lug around 2 different xml apis.

    who wants to use SAX and DOM directly?

    sincerely
    morten wilken

  9. Dan Barber Says:

    Well, it’s not an XML api…but I’ve found myself using Castor to handle XML binding. My code uses basic beans. Castor provides the serialization/deserialization….works well when passing XML in webservices…I have used Ant to script the castor code generation task.

  10. AR Says:

    Give XMLBeans a try (unless you’re a XML Schema-challenged kind of guy).

  11. Egg Man Says:

    I WANT MY BILE! This is a lame blog entry. I want fury and passion and stuff.

  12. Akuma Says:

    I’d recommend XmlBeans, if I didn’t like the project myself, and want to shield them from being a future bile blog target :)

  13. Gabriel Mihalache Says:

    Why not use, *gasp*, the stuff in the JDK?

  14. Eskorbutin Says:

    LOL. The guy is not capable to figure out how to do some Java xml processing with the several tools available out there, besides the obvious JDOM and dom4j.

    You’re such a retard Hani. Unable to do your own fucking homework and research.

    Go pursue a career as a standup comedian instead of working with computers. It’s beyond you.

  15. Anonymous Says:

    Ditto on XMLBeans

  16. Mr. Wobbet Says:

    Most of the XML parsing API really are crap. Worked w/ W3C DOM via DOM4J, SAX, JDOM, etc. and hated every one of ‘em. Just hated it. Went with JDOM ’cause it was the one API that made a general kind of sense. But…

    I am currently in the process of ripping JDOM out of my application in favor of SAX. My big issue is that memory allocation is *way* out of control w/ JDOM and as the instance documents that are flowing into the pipe increase in size the failure rate due to OutOfMemoryException does as well.

    But SAX is, well, kinda simplistic and from a thread management and control perspective woeful. If you don’t have threading issues SAX works well (and it really makes me want to have a switch statement that works on String values so that I can more cleanly handle element terminations).

    I haven’t played with it yet but have you taken a look at StAX? That will be my next test for XML parsing technology. If the fact that your friend Mr. Beattie endorses prevents you from reviewing it, oh well. From reading the glossy, four-color, buzzword compliant literature it solves the memory problems of the DOM architectures and the threading issues of the SAX architecture and also lets you cleanly get out of the parsing when you decide that you’ve had enough rather than being forced to wait for SAX to tell you that you’ve had enough. Sounds promising.

    One thing that I do like from the reading that I’ve done is you don’t have to worry about your code looking like you’ve glued together thirty different libraries that all ignore Java standards. It actually uses an Iterator to run through the element events. A plain and simple Iterator. If they can get XML parsing down to an Iterator, I’m pretty much sold right there…

    rjsjr

  17. bob mcwhirter Says:

    Also, fwiw, StAX might be considered harmful to Hani since it’s now at the haus.

    http://stax.codehaus.org/

  18. LalalandInhabitant Says:

    PriDE O/R-Mapper released

    From:
    http://www.theserverside.com/news/thread.tss?thread_id=26754

    PriDE “takes pride” in being really lightweight and transparent compared to JDO implementations or O/R Mappers like Hibernate

    Is it like… gays take pride in being, well…, gay!?!?

    I bet the author / authors is / are gay….

  19. gay programmer Says:

    who said programmers can’t be gay?

  20. Kevin Says:

    Anyone used xml pull yet?

    http://www.xmlpull.org. VERY simple to use, very fast (faster than sax2), very tiny in size (10K library for KXML2, 23K for XPP3). As a snippet:

    InputStream is = new BufferedInputStream(new FileInputStream(someFile));
    XmlPullParser parser = new MXParser();

    parser.next(); // skips over initial DOCUMENT START portion of file

    int event;
    // make sure we are at start of document and its valid
    parser.requires(XmlPullParser.START_TAG, “topNodeName”);

    // Start iterating, telling parser when YOU want
    // the next node, not waiting for it to fire
    // events
    while ((event = parser.nextTag()) != XmlPullParser.END_TAG)
    {
    if (“subnodeName”.equals(parser.getName()))
    {
    // handle subnodeName
    String attr1Val = parser.getAttributeValue(null, “attr”);
    String text = parser.nextText();
    }
    else if….
    {
    }
    }

    It’s really that easy. You are in control of when you want the next node, you can skip large numbers of nodes you don’t care about very quickly, and you can stop the parsing at any point. For example, if you only need to parse say the header out of an xml file where the header is at the top, and below it or millions of nodes making the xml file HUGE, DOM/JDOM/DOM4J would choke the OS much less the JVM. Sax2 would spend a LOT and LOT of time parsing EVERY event until the document was done, even though you only care about the header info at the top. With xml pull, you simply parse the header part and return out of the loop. Don’t parse any more. That’s it! I have seen a full xml file parse 3x faster than sax2 and use less memory.

    Frankly, I hate DOM/JDOM stuff because they can only work with small files. Anything that forces you to load in the entire file, and at that takes up 2 to 8x more memory than the file itself, sux. Maybe that is because I am a freak about resources, memory, speed, etc, but I subscribe to only use it if you need it and at the time you need it allocate it.

    Give it a try. You’ll like it. Especially for configuration files and such. Replacing property files is a breeze with xml pull.

    }

  21. LalalandInhabitant Says:

    > I WANT MY BILE! This is a lame blog entry. I want
    > fury and passion and stuff.

    > Posted by Egg Man

    Egg Man,

    you sound like a gay. flaming gay!!!! you know that???

  22. Elliotte Rusty Harold Says:

    Given that you need to not load external DTDs, XOM won’t work for you; at least not in version 1.0. If someone convinces me they have a well-informed reason to not load external DTDs and that’s the only thing keeping them from using XOM, I’ll consider adding this feature in 1.1 or 2.0; but it’s not going to happen tomorrow, regardless. :-(

  23. Sulka Says:

    How about xstream? Depending on what you do, it might be the simplest thing.

    http://xstream.codehaus.org/

  24. Rob Fletcher Says:

    How about; because maybe I’m running offline or behind a firewall and the last thing I need is an XML handling API that throws a hissy fit when it can’t find a DTD reference or to spend half an hour delving around in some obscure proxy configuration API? I hate libraries that blithely assume there is internet access wherever they may be running.

  25. Anonymous Says:

    I can’t believe this: I actually found something useful in these comments!!! Next the hell will freeze…

  26. Eric Foster-Johnson Says:

    I really like kXML, at http://www.kxml.org (actually a redirect). The pull-based parsers, like kXML, are much easier to use.

  27. Anonymous Says:

    Oracle’s stuff seems to work ok so far. We’re running a BC4J project and it sorta kinda comes free.

    http://otn.oracle.com/tech/xml/xdk/xdk_java.html

  28. Mark Hughes Says:

    I got so sick of dealing with fucking Xerxes, the slowest XML processor possible, that I wrote my own “Least XML” parser.

    LeastXML violates the spec in several ways, but it’s insanely fast, reads all non-pathological XML documents, reports errors fairly intelligently, and has the simplest and most Java-native API I could make.

    No guarantees that it’ll be useful to anyone else, but I put an older version of it under a BSD-like license in my Thought project:

    Enjoy.

  29. Mark Hughes Says:

    Okay, that didn’t work. How about this:

    http://kuoi.asui.uidaho.edu/~kamikaze/Thought/

  30. aaa Says:

    Jade’s Sax parser is pretty decent in terms of speed, yet it is not the issue i guess.
    http://jade.dautelle.com/api/com/dautelle/xml/sax/RealtimeParser.html

  31. Cameron Says:

    SAX sucks.

    DOM is dumb.

    Can’t we talk about Java APIs for XML, not some C-like callback crap?

  32. Oleg Proudnikov Says:

    There is an effort to standardize pull parsing: StAX – Streaming API for XML – JSR 173
    JSR-173
    BEA STAX Home page
    Specification PDF
    Quick summary notes

  33. pappin Says:

    I read someone say it should be part of the JDK… Fuck, no more extra crap and bloat that we can’t dump because it’s become part of the standard API!

    With each release the latest “craze” API gets included, which no one ends up using… which means it’s more impossible to distribute software because the poor sod on the other end has to download another 20 megs of useless crap. For example do we really need the CORBA api *in the JDK* when very few use it (it doesn’t even conform to the Sun standard of package naming)?

    fuck!

  34. pappin Says:

    LalalandInhabitant you homophobe… couldn’t you come up with an insult that didn’t make you sound like a troglodyte?

  35. Ruby Baby Says:

    REXML is the best XML API ever. However, it’s only available for Ruby (in fact, it’s bundled with Ruby). Frankly, that’s a plus, but if you’re insanely wedded to Java, you’re as screwed as you claim to be.

    puts persons[1].elements["address"].attributes["city"]

    What’s easier than that?

  36. Cowtowncoder Says:

    Couple of additional notes: Xerces is actually not all that slow. Test it, and for the features you get it’s only marginally (10-25%) slower in non-validating Sax mode than the fastest pull parsers. But to test it fairly, do NOT just use a single run, but longer span (much like server apps would use it; after 20 or so runs HotSpot has optimized it nicely). I say this only because I actually have tested it to compare performance and I think it’s unfair to discredit the most complete and stable Java-based XML parser there is (but I’m no Xerces developer or related to anyone who is).

    Having said that, there are faster alternatives.
    kXML was mentioned, and then there’s the newest
    StAX-compliant high-performance parser, Woodstox.
    It’s getting close to 1.0, is open source; and
    also has a related StaxMate project that further simplifies StAX-based iterating.
    Google should find the project’s home page; it was linked to from the blog at http://www.cafeconleche.org as well.

Leave a Reply