POI – Java access to Microsoft Office-format files

69

Author: Daniel Rubio

POI is an Apache Foundation project designed to let programmers access Microsoft’s OLE 2 Compound Document Format from a Java environment. OLE 2 format is quite ubiquitous since it’s the one used in Microsoft Office files. In the following article we will be exploring the various components available in POI for accessing these documents through Java.

POI comprises a handful of projects, all of which have their own objectives. POIFS was the first sub-project created for POI and provides POI’s underpinnings, given that it’s used under the hood of the other POI APIs to deal with the OLE file format. Its use is generic in nature, since it does not define a specific Office format to deal with, like an xls (Excel) or doc (Word) file, but a global means of accessing OLE 2 formats.

HWPF is the set of Java APIs that allows you to read and write Microsoft Word 97(-XP) documents. HSSF is its counterpart for Excel 97(-XP), used for accessing and modifying spreadsheet information from a Java environment. HPSF is a complementary project to both the aforementioned POI components; it lets programs read and alter properties in an Office document — specifically, parameters like date, title, and author, all of which do not form an integral part of the actual document as defined by the OLE 2 format. Finally, POI-Utils is a general set of classes that do not fit with the overall scope of POI, but are often helpful used in combination with POIs core components.

To see how we can use POI, let’s create a simple spreadsheet like the one illustrated in the following graphic:

To do this we will use the HSSF API provided with POI. The following snippet could be placed inside a Servlet or any other Java code, so long as the POI libraries are accessible:

           // The following code is based on classes from the following libraries 
	   // org.apache.poi.hssf.usermodel.*;
           // java.io.*; 

	    HSSFWorkbook document = new HSSFWorkbook();
	    
	    HSSFSheet sheet = document.createSheet("CD-Catalog");
	    
	    // Create an initial row with cell 
	    HSSFRow row = sheet.createRow((short)0);
	    row.createCell((short)0).setCellValue("CD Catalog");
	    
	    // Create lines for the catalog 
	    HSSFRow row2 = sheet.createRow((short)1);
	    HSSFRow row3 = sheet.createRow((short)2);
	    HSSFRow row4 = sheet.createRow((short)3);

	    // Define rows and initial cells with product name 
	    row2.createCell((short)0).setCellValue("CD-R 80 Min");
	    row3.createCell((short)0).setCellValue("CD-R 21 Min (Mini)");
	    row4.createCell((short)0).setCellValue("Business Card CD");
	    
	    // Define cells with prices
	    row2.createCell((short)1).setCellValue(0.39);
	    row3.createCell((short)1).setCellValue(0.57);
	    row4.createCell((short)1).setCellValue(0.63);	    

	    // Write the output to a file system
	    FileOutputStream spreadsheet = new FileOutputStream("catalog.xls");
	    document.write(spreadsheet);
	    spreadsheet.close();

All of the classes used in the generation of the spreadsheet are located in the org.apache.poi.hssf.usermodel library included in POIs distribution. The first object definition in the form of a HSSFWorkbook instance is used to define the actual document. Later on, we associate a specific sheet through the HSSFSheet class to this same document instance.

The HSSFRow class along with its methods is used to execute the majority of the workload, which is defining the actual rows, cells, and values. Ending the sequence, we define a FileOutputStream, which is used to write the document to the file system.

Now that we have generated a spreadsheet from scratch, let’s use POI to modify an existing document. The following code modifies the previous spreadsheet, inserting the current date and placing it in a blue-colored cell:


           // The following code is based on classes from the following libraries 
           // org.apache.poi.hssf.usermodel.*;
           // org.apache.poi.hssf.util.*;
           // org.apache.poi.poifs.filesystem.*;
           // java.io.*;
           // java.util.*;

	    POIFSFileSystem poiStream = new POIFSFileSystem(new 
               FileInputStream("catalog.xls"));
	    HSSFWorkbook document = new HSSFWorkbook(poiStream);
	    
	    HSSFSheet sheet = document.getSheetAt(0);

	    HSSFRow row = sheet.getRow(0);
	    
            // Generate today's date and place it in a cell
	    HSSFCell cell = row.createCell((short)1);
	    cell.setCellValue(new Date());
	    
            // Define a style for both the date and cell background 
	    HSSFCellStyle style = document.createCellStyle();
	    style.setDataFormat(HSSFDataFormat.getBuiltinFormat("m/d/yy"));
	    style.setFillBackgroundColor(HSSFColor.BLUE.index);
	    cell.setCellStyle(style);

	    // Rewrite the spreadsheet to the File System 
	    FileOutputStream modSpreadsheet = new FileOutputStream("catalog.xls");
	    document.write(modSpreadsheet);
	    modSpreadsheet.close();

For reading an OLE-based document we use the POIFSFileSystem class, then access the document with the getter methods provided in the corresponding HSSFSheet and HSSFRow classes. We later generate a date value and place it inside an HSSFCell instance. To format this last cell, we create an HSSFCellStyle instance, and through its setDataFormat and setFillBackgroundColor methods give it a specific date format and blue background. Finally, we rewrite the spreadsheet back to the file system using a FileOutputStream class.

We leave you to explore POI’s more advanced capabilities, such as its powerful word processing API for Microsoft documents and its enhanced formatting features for Microsoft spreadsheets, all of which give you the ability to create or alter these common formats from a Java programming environment.

Daniel Rubio is the principal consultant at Osmosis Latina, a firm specializing in enterprise software development, training, and consulting based in Mexico.

Category:

  • Java