September 20, 2005

OpenDocument office suites lack formula compatibility

Author: Marco Fioretti

Since the OpenDocument standard for office files covers spreadsheets, the productivity suites based on it, such as OpenOffice.org (OOo) and KOffice, can share spreadsheets between applications -- theoretically, that is. In practice, there are still a couple of obstacles for this kind of interoperability among OpenDocument spreadsheet processors, regardless of their origin and license. One is the issue of macros, as I discussed earlier. A bigger one is that, while formatting of spreadsheets is covered by OpenDocument, the actual formulas that make them useful aren't.

Figure 1, a spreadsheet created using OpenOffice.org Calc 1.9.117 on Fedora Core 4, shows the problem. The cells from A1 to A4 contain some random numbers. The content of cell A5 is calculated according to this formula, also shown in the corresponding text entry window:


=SUM(A1;A3;CEILING(A4;3;2)) + LOG(A1+10;10) - SQRT(A4)

Figure 1. A spreadsheet created using OpenOffice.org Calc 1.9.117 - click to enlarge

While it's a dumb formula, it's a perfect one to illustrate the problem. Saving the spreadsheet in OpenDocument format (.ods extension) and opening it with KSpread from KOffice 1.4.1 yields the result shown in Figure 2. Even if KSpread opens the file without a hitch, the formula we typed in cell A5 is not recognized as a formula, but treated as normal text. To perform the same calculation in KSpread we have to rewrite the formula as shown in cell A6 of the same figure:


=SUM(A1;A3;CEILING(A4)) + LOG(A1+10) - SQRT(A4)

Figure 2. OpenOffice.org formula in KSpread - click to enlarge

Things get worse if you save the spreadsheet with the extra formula in KSpread and open it again with OOo Calc (Figure 3). The formula in cell A5 is not written back by KSpread in a format still recognizable by Calc, and the one in cell A6 (correct for KSpread) doesn't make sense to its OOo counterpart.

The problem we have here is that the formula language is not standard yet. Some, but not all, functions are called in different ways or, when the name is the same, expect different parameters. This is not the fault of the OOo or KOffice developers; it's simply that formulas are not covered by the OpenDocument standard. Luckily, unlike the aforementioned problem with macros, developers are already working on some solutions.

Figure 3. Opening the spreadsheet again in OpenOffice.org Calc - click to enlarge

Another standard? From where?

As far as spreadsheet formulas are concerned, OpenDocument simply states that they can only be written, in their own namespace, in the table:formula attribute of each cell. Nothing more.

The lack of a grammar for these objects in OpenDocument is not a mistake or an oversight. The OASIS Technical Committee in charge of this standard explicitly said last January that "while ... interoperability on that level would be of great benefit to users, we do not believe that this is in the scope of the current specification. Especially since it is not specifically related to the actual XML format the specification describes."

Even outside the Committee there is the same opinion: OpenDocument must only be about structure and how to represent content. It should define which standards are acceptable for each kind of data, not (re)create them all. This philosophy allows other specs to evolve at their own pace, without slowing down OpenDocument acceptance with users who don't need them.

What work must be done?

If we looked only at KOffice and OOo, part of the problem could be solved by sharing code between Calc and KSpread. This is possible and has already happened with some statistical functions. Unless a royalty free standard is also formally defined, however, this only moves the interoperability problem to the next application.

Making the same spreadsheet formulas work in the same way in all (present and future) OpenDocument processors means, at least:

  • Listing all function names.
  • Documenting what each function does and the parameters it accepts.
  • Creating test spreadsheets so the developers can check, without ambiguities, whether their code works.
  • Defining if and how localized function names (in the formulas, not in the user interface) should be handled.
  • Agreeing on, and adopting, all of the above.

Enter OpenFormula

Many developers last year were lamenting the lack of a formula specification in OpenDocument. After the Technical Committee confirmed that they would not add such a specification to the standard, one of the reviewers, David A. Wheeler, started to work on the issue. The result is OpenFormula, a format created to fill this hole. OpenFormula defines formulas in a way that is highly compatible with OpenDocument and OOo, and allows users of any application to exchange spreadsheets without formula interoperability problems.

Of course, nothing prevents the usage of OpenFormula in other areas besides OpenDocument. The specification uses a traditional mathematical notation instead of XML, for two reasons. The first is to keep formulas as easily readable (and writable!) by humans as possible. The other is to make interaction easier with other tools that expect formulas to be written in traditional notation.

Like many open source tools, OpenFormula aims to do one thing well, rather than trying to solve every possible problem in its field. For example, OpenFormula does not support formula display, since standards such as MathML already exist for this.

OpenFormula is still in the alpha stage, but it's already quite useful. Interested developers will find links to the draft specification, test spreadsheets, and Lex and Yacc utilities for testing the syntax of formulas on the OpenFormula Web site.

Even if you have a completely different idea on how to solve this problem, OpenFormula is still be an excellent starting point to get familiar with the dirty details of formula standardization. As a matter of fact, Wheeler tells me that he's already in discussions with others who are interested in taking his work further. What's more, inspired by my friendly nagging, Wheeler has opened the OpenFormula project to the world on SourceForge, so others can participate.

Conclusion

Sooner or later, a solution to the formula incompatibility problem will be found. Ideally, someone will solve it sooner, and without any intellectual property problems. The capability to exchange spreadsheets freely is essential for free desktops in the business and education markets, and it should not be limited by artificial restrictions.


Thanks to D. A. Wheeler and to the KOffice and OOo developers for their input for this article.

Click Here!