April 20, 2001

Open XML and the librarians

Author: JT Smith

- by Jack Bryar -
Open Source Business -

I've recently had a revelation. I've decided that
far too few librarians consider themselves geeks. You never see them
posting on Slashdot. They never get featured in Red Herring. And yet, the
corporate librarians of the world should step out, and proudly proclaim their
geekiness. They are the center of one of the most important business developments
in information technology. The Open Source community should celebrate
them, too, because librarian culture is one that celebrates the free exchange
of information, and corporate librarians are likely to be the ones that
make sure this new development doesn't get tied up as someone's
proprietary intellectual property.

If you've been too busy coding to pay attention, you
may have missed an important trend in the one piece of I.T. that's still
hot. Product developers and venture finance companies still care about
business-to-business e-commerce, or at least the tools that could
simplify the processes associated with B2B.

Among the "big problems" of B2B: trying to conduct
ever more complex transactions in a more automated manner. At one time
all this was a pretty simple matter. Large volume B2B transaction data
was limited to little more than product numbers, prices and order

Today, e-commerce is far more complex and there have
been a lot of problems. A company like Yahoo or eBay can find itself
with business partners it hadn't expected, and find itself selling
adult videos without intending to. News producers have had their
content disassembled and copyrights violated. News and information posted on
public Web sites sometimes found few human eyeballs, but got lots of attention
from "bots" scraping a meaningful paragraph or hyperlink, and
depositing the results into a proprietary database. Electronic business customers
have found that the information they shared with their vendors tended
to drift elsewhere, without anyone intending it. B2B and other web
transaction businesses need to know as much about the business partner as possible,
in order to provide the best service. But such sharing is perilous.
With whom is the information shared? What control does a business partner or
a consumer have over that information once it's been shared for the
first time?

In order to safely conduct more complex transactions,
particularly those involving extensive information exchanges, companies
and their clients need to exercise greater levels of discretionary
control over content. At the same time the relative costs of these transactions
compel firms to try to manage this process in an increasingly automated
fashion. So who determines what needs to be shared, or what must be
firewalled away to preserve the rights of the corporation, or those of third
parties? Who sets the boundaries if the process is entirely automated? Who
determines who has what rights to what types of information, for how long, under
what circumstances? Who can be trusted? A corporate partner? A
third party info-mediary

These issues are becoming critical to every
element of e-commerce, particularly B2B transactions
. Newer firms
like Savvion, Viquity and Netfish, and older electronic data exchange pros like Sterling commerce think there's a market for automating and managing business processes.

But how do business customers respond to such offers?
It seems to me that automated business transaction systems require
three things: limited interoperability, to ensure that systems can work
together but that confidential data is properly labled and contained within the
relationship; open architecture, to allow participants to have full
confidence in the systems they are using; and referees, to make sure that systems
properly understand the nature of the data they are exposing and to set
rules about meaning.

The open architecture is coming, if reluctantly. Some
vendors are still pushing systems that are relatively closed models.
But most would-be customers are pushing back, and are championing a variety
of Open Source search-and-retrieval systems and are looking to generate
open architectures, based on extensions of XML. Open committees are
springing up everywhere, trying to extend XML to such applications as the mark-up
of financial reports, the labeling of medical records, and the
structuring of e-business transactions.

For e-merchants, the most important of these architectures
is probably ebXML. ebXML is a proposed B2B framework that enables
business-to-business transactions and other forms of collaboration
through automated sharing of Web-based business services. The framework includes specifications for a SOAP-based message service, for developing what the sponsors call a "collaboration protocol profile," for a common articulation of a given business process
methodology, as well as registry and repository data. Full specs are to
be published next month.

For this system to work, however, there needs to be
common agreement on what terms mean, and how to categorize given sets
of information. It means developing a way of letting company A's computer
know that the part number or research article found in company B's
database is, essentially the same thing as what it was looking for, and the
degree to which it isn't. It also means cooperative data mining. Not all
content fits easily inside a database. Much of the most important information
may be generated on the spot. Such systems can only work if everyone
everyone agrees on what terms mean, and what categories a given data set belongs

Categorization and classification; this is the very
heart of library science. And as a result, librarians are hot.

Librarians understand how to cluster and classify
documents or databases into logical hierarchical groupings. Librarians
understand that such classification systems are critical to finding
stuff, whether it is a set of related books in a college library or a set of
related technical concepts found among the emails in a company's research
department. Locating content using typical search and retrieval technology doesn't
work that well. Critical content is missed, and inappropriate content
shows up with distressing regularity. By comparison, identifying content that
has been properly organized is a snap -- so simple a computer can do it.

So the librarian's discipline -- categorization, storage,
and retrieval -- are critical elements to the development of this next
generation of B2B tools. And in many instances, it is librarians who are driving
the development of these open XML based tools.

This is fortunate. Librarians also come from the right
culture to be trusted with this kind of development; by and large they
believe in information exchange -- in sharing, not hoarding. They
believe in cooperation to get the job done right, and so far, at least, the drive
toward an open content management architecture hasn't been hijacked by the
usual corporate suspects. The librarians could use a little help, from members of the
Open Source community. There are dozens
of committees
driving XML and its extensions to a variety of
content management applications ranging from news
to taxonomy
. Most have relatively open memberships and would be
open to informed input from volunteers.

NewsForge editors read and respond to comments posted on our discussion page.


  • Open Source
Click Here!