July 28, 2011

Chatting with Peter Tait of Lucid Imagination

Back in January I had the opportunity to test drive LucidWorks Enterprise, a search engine for internal networks. The cross-platform search engine was flexible, stable, easy to install and came backed by a friendly support staff. In short, it was a good experience which demonstrated how useful (and straight forward) running one's own search engine can be.

Peter TaitAt the time I had hoped to do a question and answer session with one of the Lucid executives, but, schedules being what they are, it was difficult to set aside a time when everyone could get together. I'm happy to report Peter Tait, Lucid Imagination's Chief Marketing Officer, and I managed to make contact and he graciously gave his time to talk about the company, the technology and the future. Peter has over twenty years of experience in the industry and, prior to joining Lucid, he held management and executive management positions at BEA, Citrix, Documentum and EMC.

JS: First, for people who aren't familiar with Lucid Imagination and your products, could you give us a little background on the company and LucidWorks Enterprise?

PT: Lucid Imagination is the first commercial company exclusively dedicated to Apache Lucene and Solr open-source technology. The company was founded in August 2007 by Marc Krellenstein, Yonik Seeley, Grant Ingersoll, and Erik Hatcher.

In addition to these four, the founding team included several key contributors and committers to the Lucene project, as well as experts in enterprise search application development. So it was always a combination of open source and traditional enterprise search expertise. The company is headquartered in San Mateo, California.

As an active participant in the enormous community using Lucene and Solr, Lucid Imagination offered commercial-grade support, training, high-level consulting and certified distributions of Lucene and Solr. For any open source technology to be successful in the enterprise, there must be companies willing and able to provide enterprise-class support (and support contracts). We now have over 150 customers, representing many industries and many countries.

The company's website is meant to serve as a knowledge source for the Lucene community, with information and resources to help developers build and deploy Lucene- and Solr-based solutions in a more efficient and cost-effective manner. It's a key goal for us that we make the site as rich, and as useful, as possible for search developers, but we still have lots to do. Our long-term success depends first and foremost on the success of the community.

As another way to help support the Lucene and Solr community, we also host Lucene Revolution, the world's biggest open source search conference. The recent San Francisco event attracted almost 500 attendees, and the energy level at the conference was fantastic.

In late 2010, the company began shipping LucidWorks Enterprise, a new search platform designed to accelerate and simplify the development of highly accurate, scalable, and cost-effective search applications. LucidWorks is built on Apache Lucene and Solr.

Like any other search engine, it works by indexing several kinds of documents and providing ways for a user to search them. It uses Lucene and Solr to handle the core indexing and query processing tasks, and leverages the latest advancements in those projects. It is actually based on the 4.x code base that is ahead of the current 3.x releases. LucidWorks Enterprise also builds on the work of the open-source community by adding crawling features, a robust REST API, an easy-to-use administration interface, and other features.

Lucid Diagram

A software diagram showing the relationship between Lucene, Solr, and LucidWorks Enterprise.

The Apache Solr/Lucene core provides the indexing and searching functionality on which LucidWorks Enterprise is built. As an application developer, you can access this functionality in the same way that you access a traditional Solr installation. This includes field definition, document analysis, faceting, and basic query interpretation. The Apache Solr/Lucene core can be used as a standalone installation, if you want to work with it directly.

On top of the open source core is LucidWorks Enterprise, which has been designed to take the pain out of running an Apache Solr-based search engine by providing programmatic or user-interface-level access to features that are normally difficult to work with directly, such as field definition or data source creation and scheduling. It does this in several ways: The LucidWorks Enterprise Administration User Interface provides configuration and management tools for almost every feature of LWE, including document acquisition, security, and field definitions.

LucidWorks Enterprise provides a customizable Search User Interface that includes advanced features such as query completion, "find similar" searches, and integration with click scoring. Click Scoring enables LucidWorks Enterprise to adjust search results based on user actions: it automatically adjusts search results according to which ones users click most, and more so if the user's query is similar to the query for which the documents were selected before. The REST API provides programmatic access to almost all configuration and management functions within LucidWorks Enterprise.

LucidWorks Enterprise provides end-to-end SSL security, as well as the ability to limit access to specific results based on a user's identity or group affiliation. Enterprise Alerts enable the search application to notify a user when new results have been found for a query.

Most of the functionality provided by LucidWorks Enterprise comes from the LWE Core component, which manages all of these processes and features so administrators can concentrate on building and managing their own applications rather than the underlying search engine. The four main components (Core, Admin UI, Search UI and Alerts) can be run together on a single server or deployed on separate servers if desired.

JS: What sorts of things does LucidWorks index? Document files, obviously, but I understand you can also search through databases and SharePoint?

PT: LucidWorks Enterprise can index and search databases, file systems, websites, XML, and Microsoft SharePoint sites without additional programming. It can index and search files of many different types, including Microsoft Office, Adobe PDF, and other common file formats. The automated crawler can extract documents from local or remote disks, databases, and websites, or use native XML format. Additional options for file readers and document filters are also available.

LucidWorks Enterprise also includes a connector framework that will allow us to add new connectors as customers request them.

JS: Who makes use of LucidWorks? I believe you have some big-name clients in the tech sector?

PT: Lucid Imagination has over 150 customers, including household names like AT&T, Sears, Ford, Verizon, Cisco, Zappos, Raytheon, The Guardian, The Smithsonian Institution, Salesforce.com, The Motley Fool, Qualcomm, Taser, eHarmony, and many others around the world. These are the companies that have, over the last four years, looked to us for help and support in building and deploying search-based applications built on Lucene and Solr. Lucidworks Enterprise is a new product. We have customers in production, but we have not yet published any customer case studies or identified the companies that have the product deployed.

JS: Large organizations are usually compartmentalized. How do you ensure users can only find data they are supposed to find?

PT: Obviously, security and access control are important in any commercial environment, and adding functionality to support this need is a high priority for us. Generally, enterprise-level application designers must take into account four main security considerations for any search application: network access to the various components of the service, authentication of users, authorization to use various parts of the user interface and authorization to view certain documents. LucidWorks Enterprise implements security for each of these functions. For example, an administrator can configure document filters for different roles, limiting what documents appear in search results for users in those roles. For example, you can create a filter that enables users in the finance role to see only documents that satisfy a query of "department:finance". LucidWorks Enterprise is also LDAP-aware, so search application developers can create document level security and admin function security, allowing enforcement of security policies. Lucid Imagination has also added enhancements for optional end-to-end secure communications encryption. Version 1.8, released in early July 2011, adds Windows security through integration with Active Directory for additional document and user-level security.

JS: On which operating systems do you support running LucidWorks?

PT: LucidWorks Enterprise runs on any platform supporting Java, so you can use Linux, Solaris, Windows and Mac OS and the indexes are portable across platforms. The product includes integrated client API's for JavaScript, Ruby, Rails, PHP, Java, Python, Perl, Forrest/Cocoon, C#/.Net and ColdFusion.

JS: LucidWorks is free to try and free for developers. How much does a license cost to run in a production environment and what do you license by (servers, CPUs, users, data size?)

PT: As you said, LucidWorks Enterprise is free for development and testing, and support is available via our support forums. Production use requires an active deployment subscription. We offer a number of different subscriptions that package support for 4-10 LucidWorks Enterprise servers with regular or 24x7 support and ExpertLink Advisory services for optimum design and tuning of search applications.

The package options are essentially identical to those we offer for supporting Solr-based applications, other than the cost of our product. LucidWorks Enterprise is priced on a per-server basis, with no limits on data sizes or users. Additional servers are $4,000/server per year, so if you want to think about the cost of LucidWorks Enterprise that is probably the simplest number to use.

JS: What can we look forward to in the next release? What new features are upcoming?

PT: Version 1.8, released early in July 2011, adds Windows security through integration with Active Directory for document and user-level security. We think that's a key security enhancement, and it was a high priority for some of our enterprise customers. Features of the next major release, version 2.0, will be discussed closer to the release date, later in 2011.

JS: LucidWorks is based on the open source project Solr. Is LucidWorks Enterprise also open source?

PT: LucidWorks Enterprise includes a number of open source components, including Apache Lucene and Solr, but it is not currently open source. Over time, we will look for ways to contribute technology and code from LucidWorks Enterprise back to the community, because we think that's important. In the meantime, we will continue to dedicate significant resources to the Apache Lucene project to ensure that Lucene and Solr remain the best technology for building the search applications of the future. That's critical.

Thank you, Mr Tait, for your time.

Click Here!