DVDs and Documents

The high-definition DVD format struggle is over. Toshiba's High-definition DVD (HD-DVD) was slugging it out in the market with Sony's BLU-RAY disk format. BLU-RAY has won, and no one except for the creators of HD-DVD is really sorry. I have an HD-DVD player stuck upstairs in a closet (inherited from the previous owner when we moved into our new house) and even I don't care. I never bought any HD-DVD's you see.

The problem with the battle between HD-DVD and BLU-RAY was they were identical standards for high-definition video players. Sure, the tech-geeks who cared knew that they were different, but for people who just wanted to watch cinema-quality video at home all that having two standards that did the same thing meant was that no one bought any equipment or movies in either format. We've all seen this movie before, you see, so we knew how it ended. Back in the 1980's it was called "Betamax vs. VHS". I was the proud owner of a Betamax video player back then as well. This time everyone waited until one of the competing standards won.

I do have concerns about a Sony-controlled format becoming the standard, due to their "content" division being long-time fans of digital restrictions management, or DRM. But I'm still glad the conflict is over. Having one standard format means all I have to do is to wait for the freedom-loving digital underground to break the copy-protection on the format so I can back up my movie purchases, and then it'll be safe to buy equipment and movies. For a while I suppose, until the next attempt to change the standard format occurs.

The arguments over high-definition digital video standards has a salutary lesson for the world of document formats. Once the dominant owner of all document formats, the ubiquitous .DOC, .XLS and .PPT file types, Microsoft is attempting to force all computer users to standardize on their latest efforts. This new format goes by the unwieldy name of Office Open XML (OOXML). This name is so confusing that even Microsoft executives often mistakenly refer to it as "Open Office XML", more commonly associated with their Free Software competitor, OpenOffice.

The trouble is, there's already an existing International Standards Organization (ISO) document format, Open Document Format (ODF), so Microsoft is trying to make their OOXML format into an ISO standard. They don't seem to care that they might break the International Standards process or ISO itself by doing it. I've written about this process already in my column "The Definition of Insanity" but I haven't written much about the OOXML standard itself or the changes that it is currently undergoing in order to pass it as an ISO standard.

If you've ever subscribed to the Microsoft Developer Network, or MSDN as it's commonly known, then you'll find the OOXML "standard" document familiar. It's a typical example of Microsoft MSDN-style technical documentation. It isn't badly written; indeed for proprietary documentation it's about as good as it gets, but as I've said before of Microsoft documentation, it's fuzzy on the details. It's not a standards document, something you can use to unambiguously create an implementation from scratch, without a great deal of trial and error testing against the Microsoft version of the same "standard".

A good example to use to compare it to real standards documents is to examine Internet Engineering Task Force (IETF) "Requests for Comments" (RFC's) documents, which are publicly available on the Web. They use key words such as "MUST", "REQUIRED", "SHALL", "SHOULD", "MAY" and "OPTIONAL" and these words have real meaning in the standard, such that an implementor can be guided by these terms. The OOXML spec just doesn't use the same precision in language that a real specification needs. It was almost certainly written by documentation professionals, not by engineers who actually understand the needs of the implementors of a standard. But of course the goal really isn't to encourage other implementations, but to bless the one existing Microsoft Office implementation as a standard at whatever cost.

As has been widely reported, OOXML has many technical flaws which were noted in comments by National Standards Bodies. The European Computer Manufacturers Association (ECMA), the front group that Microsoft used to insert OOXML into the ISO process, then produced resolutions for these comments. I've spent the last few weeks going through these to see if they fixed the original flaws and it's been a very illuminating task.

In some cases they did resolve the problems, in others they pushed back and claimed there was no original flaw, but for the most part they were remarkably open to adding extra features which seemed to resolve the issues. So much so that I began to realize two things. Firstly that ECMA was willing to say yes to almost anything in order to get OOXML passed as a standard. Secondly, that the things they were pushing back on and were saying "no" to were any modifications to the specification that would mean a change to the existing Microsoft implementation of OOXML. There were many thousands of pages of comments so it is possible I missed one, but I couldn't find any agreed change that would cause a single service pack for Microsoft Office to be released. In fact ECMA even used the fact that a change would "break compatibility with existing implementations" as a reason for rejecting it.

An example is illustrative here. The date formats specified in OOXML are flawed. There is too much detail to go into here, but to summarize, different bugs in older Microsoft Excel and Lotus 1-2-3 implementations meant that there are two different ways to store a date specified within the original OOXML specification, with different semantics. The obvious way to fix this in future documents is to specify a single standard date format (ISO8601 is such a standard) and convert to that format when reading old documents in .XLS format. Funnily enough, this is exactly what the Free Software alternative OpenOffice does, when converting to the existing ISO Open Document Format (ODF) standard. ECMA agreed, and so added the ISO8601 format to the list of allowable date formats in OOXML. But they didn't remove the old buggy formats from the specification. They just added one more, with a note that the old format is "deprecated".

The "change" adopted by ECMA had the exactly the properties required by their sponsor. It paid lip service to the principals of ISO standardization, and required no changes to any existing Microsoft code, which will just ignore the new format. Maybe later they'll implement it, maybe not. Either still fits within the "standard". With standards this low, it's hard not to meet them. But this is a problem for interoperability. Because there's no single mandated date format, it forces any other implementations to replicate the bugs of the past. There's no other way to be sure your implementation can read OOXML files correctly without implementing the bug, and you have to write out the buggy dates as you can't be certain that any other implementation will implement the ISO8601 date format. The claimed deprecation is hollow here, this bug will live forever. Highly inappropriate for a date bug, if you ask me.

In their marketing claims around getting OOXML anointed as an International Standard Microsoft claims that more standards mean greater consumer choice. But sometimes less is more. We only have to examine HD-DVD vs BLU-RAY to see the consequences of this. Or to finish with an old joke:

"How many Microsoft engineers does it take to change a light bulb?
None. They just declare darkness the new standard".

Jeremy Allison,
Samba Team.
San Jose, Californiia.
February 2008.



The ultimate irony of this article is that
I can't see the first few lines properly because
a Google ad for DVD services covers them up

Maybe I'll have find a Windows/IE machine to
read your article on ;-)



what ad for DVD services?

Re: DVDs

Ctrl-+ should fix it.

please check your date

i think you have the wrong year on your signature


date corrected. Ed.


At least that bug was short-lived

MS Office compatibility

There's one thing I don't quite get. You say that the changes accepted for OOXML won't mean that there will be a change necessary to MS Office. Yet I've seen quite a few analyses on the net which say that what Office 2007 spews out isn't OOXML at all, that there isn't any implementation of OOXML in existence, not even from MS, and that even MS won't be able to fully implement it. So which is it?

I suppose that even if MS gets its OOXML format accepted as a standard it will backfire. The majority of computer users don't really care, they don't know the first thing about formats. I've seen any number of people who believe that all it needs to change a BMP file to JPG is to edit the suffix.

So what will OOXML do? It will cause many problems to unwary user X, just as Office 2007 docx is already doing. Just google for words like 'docx frustration' and you'll find forum or blog posts by people who saved work in docx on one machine to find that it doesn't even work in older implementations of MS Office. They happily send docx around to people by e-mail and are completely baffled when they learn others can't open the format.

So what will happen if OOXML does get accepted? If what others wrote as paraphrased above is true, then it will be a third MS format to not work in older Office implementations. And if the standard gets broken down enough to be actually implemented, then what keeps competitors from sticking to the parts which are acceptable and not deprecated, as in your example, and let only MS Office fail miserably at working with those documents? Our user X won't ever know where the file originated, he will just see it fail in his MS Office. Who will he blame? He won't have a clue that the other party is not using MS Office.

Then there should be some way of independently checking validity, like you can check HTML validity. A webpage for that would be great. This page could analyse OOXML code and point out everything which is proprietary, deprecated, outside the standard, and people would be able to see which programs can create valid and useful OOXML and which can't. If MS propagates its bugs and proprietary extensions it won't create valid and useful OOXML. And in the end maybe we end up with two standards; but if they are sanitized enough to actually work, it won't hurt anyone as would having to buy both a HD-DVD and a blue-ray player. It's not as if an additional file format parser gobbles up space, empties your fridge or makes your software terribly expensive.

I'm hoping for an effect like the one in web design. Granted, there still are webpages programmed for IE only, but I get the impression there used to be many, many more. You still have to use the occasional CSS hack for legacy IEs, but you don't have to script for different browsers on each and every page anymore, like you used to do. I get the impression that proprietary extensions to webpages are on the way out. If done correctly, I believe that the same will happen to OOXML, when its users see it fail once too often.

It will be a pretend standard so governments, etc. can accept it

The problem is that most government bodies tend to want open standards. It looks bad when you require someone to buy a specific CronyWare product to do things required by law, or which even affect law. (And lets say the fill in document has a flaw which means your deed or title is completely invalid, or that it says you owe 1000x taxes or it says you are doing something criminal so the SWAT team breaks down your door and shoots your dog).

So they mandate "standards". ODF is an ISO standard and a reasonably good one (with the flaws being addressed, like getting all the mind-numbing details on precisely what result a spreadsheet formula will return, if that has not already been done).

So Microsoft has a problem. "Office" is not a de jure standard. They need some way to allow governments to pretend it would be something acceptable in legal terms without making Office truly free. (The publisher of an electrical standard used for building codes had it become public domain instead of being able to charge high fees for the copyright - you cannot copyright the law itself, and that book was by statute declared to be "the law").

So here is where OOXML, ECMA, and ISO converge. Take whatever bilgeML a particular Office snapshot spit out, Write up a very long but imprecise or best guess description of the sputum, submit it with lots of bribes in various forms so ECMA would approve it, take that rubber stamped sputum to ISO and try to get it rammed through on the "fast track" so it will become ISO standard standard sputum.

The last part isn't complete, but if it is, then Mr. Bureaucrat can just buy Microsoft Office and say it is an international standards compliant computer program. Even if the current snapshot shreds data produced by the old snapshot, or even data produced to the exact ISO OOXML specification.

Yet this may still backfire. First they like to disclaim liability with the EULA, but government bodies can't easily avoid it - if the Judge says use Office, and you do, but it destroys evidence, the judge can't hold you in contempt. They will have to either eat it or sue Microsoft.

Lest you think this impossible, consider the sub-prime bonds and auctions that are failing now causing major financial headaches for cities and states. The big brokerages and banks that sold the "safe" mortgage paper are trying to claim they didn't do anything wrong.

If the governmental IT has "sub prime" technology, it might come back to haunt the vendor.

Microsoft standards.

Any programmer that has ever dealt with a Microsoft standard knows what a nightmare it is. They are poorly conceived, poorly documented, and shifting. Something as simple as a CSV file can be an adventure when trying to maintain compatibility with Microsoft products.

Microsoft should just adopt open standards. The time for benefiting from closed standards is past. There are to many options now and they can't be squashed with FUD or by buying them out. People are no longer content to be locked into a little corner - they want to share everything and without jumping through hurdles. As always, the best way to make money is by giving the customer what they want. Don't wage war on the hand that feeds you.

The Cause of Bluray/HD-DVD non-compatibility.

The root cause of the HD-DVD vs Bluray battle is of course the fact that specifications/standards are allowed to be patented. If patents for specifications weren't allowed, than the Bluray vs HD-DVD battle would never have occured because what caused it was the fact that there is money to be made by forcing people to use one specification rather than the other. If there was no money to be made, then Sony and Toshiba would have sat down together both agreed to adopt a single format that had the best aspects of both formats in true open standards tradition.


Which is why the Free Software community should have come up with a physical-media-agnostic format for high definition media presentation.



Like OGG Theora?

Dates and Microsoft...

The funny thing is that the Date issue also fully applies to the date functions built-in to the Windows OS (System Date and File Date). Microsoft has been playing date games for a long time. It's only natural that any standard they produce would have the same flaws...

And as noted, not even MS Office 2007 implements OOXML - rather, it implements a superset of OOXML. And Microsoft has shown no signs that it is willing to force any version of Office to correctly implement OOXML per the standard - i.e. the sub-set of the real OOXML. They're simply not interested.

They're also not interested in making the mathematical functions work according to mathematics - and continue buggy implementations there too.

Pigs will fly and Hell will freeze over before Microsoft complies. But at their present rate, they'll likely disappear before then too. ;-)

Back to top