Categories
databases programming

Export spreadsheet to plain XML with OpenOffice 3.x (works!)

Microsoft Office:

  1. Costs stupid amounts of money
  2. Isn’t very good
  3. Is only available on windows
  4. …but “usually” works

OpenOffice:

  1. Is completely free
  2. Is an almost exact clone of Microsoft Office circa Office 2000/XP
  3. Is open-source (so that sometimes you can easily fix it yourself, quite surprisingly!)
  4. …but apart from the Microsoft Word part, “often doesn’t work”

I’ve been using OpenOffice’s Word clone as a complete replacement for Word for the past 3 years, and it’s been perfect. Previously, I used to use Word *and* OpenOffice, because the latter had some big bugs left in it.

Sadly, OpenOffice’s Excel clone is … often shockingly buggy. I won’t go into the details. But this post is about one missing/broken feature in particular: OpenOffice by default saves / exports XML which (for most people, and all simple uses) is unusable/unreadable – and is very hard to convert with XSLT. Read on for a script that will fix this for you…

An Obsession, with Compression…

In a well-intentioned-but-actually-stupid attempt to make the filesize slightly smaller, it implements a trivial form of compression that – in any imperative programming language – would be easy to decompress.

Sadly, the primary language for XML processing is XSLT – which is a functional language. Functional languages fundamentally (I keep using this word, but it’s true – this is a core part of the theory of the basic precepts of the language!) *cannot* handle that form of decompression. By design. By mathematical proof.

NB: you can workaround this, of course, but doing so within a functional language is difficult for most people.

There is no option in OO to “not use this optimization”. The file format for OO requires that the output XML file be compressed as a ZIP file anyway! So this compression trick is pointless!, maybe? (unless you are using non-XSLT code to process it. Which is pretty sad, since XSLT/XML are designed to be used together)

Several people have tried to provide exporters for OO to get around this (hmm. It’s a tad ironic that the built-in file-format is so hard to work with that people have to write custom exporters instead). Unfortunately, most of these exporters WILL FAIL on any real spreadsheet and WILL CORRUPT THE DATA, because the exporter-authors didn’t look carefully enough at the incoming data.

I’ve found one instance of an otherwise quite nice exporter that has the data corruption bug here – http://digitalimprint.com/misc/oooexport/ (NB: I’ve emailed my fixed version to the author of that page – hopefully they will replace their version with the fixed one)

I’ve found one instance of a “fixed” exporter that worksaround some of the data corruption, BUT … is missing the basic features everyone wants of “use the column label as the XML tag”.

And I’ve found loads and loads of broken attempts :(.

FIXED: an XML exporter that seems to work even on large, complex spreadsheets

So … after dusting off my XSLT skills, I’ve taken the former one, rewritten most of it, and here’s a fixed exporter for OpenOffice that will write your spreadsheet out as simple XML. I’ve been using this to convert XL spreadsheets to various things, including custom SQL databases (XLS can be opened directly in OpenOffice; OO exports as XML using my script; I then run a final XSLT from the command line using SAXON which converts the XML into a series of SQL INSERT/CREATE/UPDATE statements, as appropriate).

To install this, you need to follow the instructions at the http://digitalimprint.com/misc/oooexport/ site, but overwrite the “generic_xml.xslt” file that creates with the one provided here: (sorry, I haven’t got around to packaging it myself yet – I’m hoping this gets adopted by the digitalimprint folks!)

Download this file, save it as “generic_xml.xslt”, and copy it over the top of the one that oooexport installed; NB: different Operating Systems will install that file in different locations
generic_xml.xslt

I have some unit tests, but I’m trying to work out where to put them online for easy download (I’m almost tempted to start an SF.net project page just to hold them. Unless there’s somewhere better, I’ll probably do that)

Oh, BTW – here’s a small design flaw in OO: when you run an export in OO, any error messages are ignored and replaced with a useless error dialog that just says “writing to file failed” (it doesn’t even give the filename!). Those error messages are hiding on the serial console – if you’re a sufficiently advanced unix sysadmin to know what a serial console is, and how to read it.

Anyway, if you try running the exporter and it fails, now you know where to look…

7 replies on “Export spreadsheet to plain XML with OpenOffice 3.x (works!)”

Terrific! Exactly what I needed – and works like a charm too. I skipped the whole oooexport thing and just created an export filter using your xslt. You just saved me a couple of hours! Thank you!

Thank you Adam, found this today and it saved me.

What would we do without the web, a good search engine, and smart people willing to share their knowledge and ideas?

This certainly fixes the issue with oooexport crashing, but it appears to omit tags where the preceding cell has the same contents.

Table Contents
First|Last|Gender
John|Doe|Male
Jane|Doe|Female
George|George|Male

Results
JohnDoeMale
JaneDoeFemale
GeorgeMale

George is missing from the last row.

If you change the cell to “George1” then the tag is output.

Right. I believ I already Fixed that in the version that I sent to the ooo website; have you tried using the eraion from there?

Yes I tried the version from hxxp://digitalimprint.com/misc/oooexport/ and it causes calc to hang. After much searching found this website, changed the filter to use your version and it completes fine.

Are you saying there is a newer version than the one on here ? If so, can you point me it please ?

Damn.

Now the OOo folks have broken OpenOffice so badly that I can’t run the XSLT *at all*.

(Apple updated the location of Java, and apparently the OOo folks didn’t notice, and just allowed the current and recent OO’s to stop working – upgraded, still broken, and now I’ve got to try and do a clean re-install, and just “hope” it fixes the bug)

OpenOffice is working again – they finally fixed it.

I’m using the XML exporter on a project right now, managing large XL documents from Google Docs.

One note: Google docs seems to export OpenOffice files in a “strange” format which breaks OOo’s export process. Can’t tell if it’s a bug in the export code, or in OOo.

The workaround: when downloading from Google Docs, select “save as … XL” – even though you’re going to open it inside OpenOffice! – because Google saves the XL in a purer / cleaner form. It strips out some of the rubbish it includes in the OOo file.

Comments are closed.