Apache Big Data Preview: Q&A with Pivotal’s Roman Shaposhnik


Pivotal WhiteOnTealSpun off from VMware and EMC in 2013, Pivotal Software, Inc. “represents the nexus between next-generation data-driven application architecture and approaches to transforming the enterprises into modern software companies,” says Roman Shaposhnik, Director of Open Source at Pivotal. “Work there feels like a unique opportunity, like, when I was at Sun Microsystems, I felt the creation of Java was a new way to develop software for the Internet.”

Shaposhnik will be participating in the keynote panel “ODP: Advancing Open Data for the Enterprise” at the upcoming Apache Big Data conference. As a preview to the event, we spoke with Shaposhnik about some of Pivotal’s products and the company’s support of open source.  

Companies like EMC and VMware tend to buy companies. Yet here, they spun one out. Why do you feel that was the right way to go?

For a company addressing next-generation data-driven application development, it has to be a standalone company.

Having Pivotal as an independent legal entity means we can have a very different relationship with open source software, and allows us to pursue our vision of a platform. For example, Cloud Foundry, which was developed originally at VMware is one of the cornerstone technologies of Pivotal. VMware’s focus is on virtualization technology. EMC’s is on storage. So it works out very well.

Do Cloud Foundry and VMware’s products co-exist?

Yes. They’re very complementary.

Pivotal offers a PaaS solution that delivers a set of services. These services need to run on a server-side infrastructure, like VMware, OpenStack, etc… although we prefer VMware. And data needs to be stored somewhere — like on EMC, although we can run on general-purpose storage.

What does Pivotal offer?

Pivotal Cloud Foundry, Pivotal Big Data Suite, and Pivotal Labs.

Pivotal Cloud Foundry is a comprehensive cloud-native platform, on which you can develop and run the kind of user-facing, big data apps that many companies are moving to — and lets you develop and iterate fast, deploying new features, versions and apps hourly if you want to, not just only once per month or quarter.

Pivotal Cloud Foundry is a product, including support, based on the Cloud Foundry open source project from the Cloud Foundry Foundation. Pivotal is one of the Foundation’s Platinum members, along with IBM, HP, and others.

Pivotal Labs helps companies move from traditional large, slow develop, test, deploy cycles to a new Agile model, to take advantage of a platform like Pivotal Cloud Foundry.

And Pivotal Big Data Suite lets companies handle and work with “big data,” in quantities and ways that legacy, silo’d, expensive tools couldn’t — or not affordably, or not fast enough.

Who’s using these products and services, and why?

Companies like Uber, Netflix, and many “business-model-disruptive” startups are succeeding by creating user-facing apps that be iterated from development to deployment quickly and frequently, can handle lots of users, and quickly extract information from large amounts of data. Large, established, traditional businesses like the Walmarts, Targets, taxi companies, and others are having to respond to this disruption be changing how they make and use IT — like the disruptive startups, transforming themselves into software-defined companies.”

Cloud Foundry is a tool for doing this, and Pivotal Labs helps companies change so they can take full advantage of these new tools and methods.

Pivotal-Roman Shaposhnik-2

Can you tell us about Pivotal’s support of open source?

Almost all of what we have and do is open source. A few pieces are still proprietary — mostly where the software comes from partnerships and relationships, areas where there are patent implications, and other things we have no control over. We are trying to rectify this, and push everything we can into open source. But the core tech is all OSS.

Tell us about Pivotal’s HAWQ technology. What does HAWQ stand for?

It stands for real SQL is finally Hadoop-native (laughs). Or, you could say it stands for Hadoop with Query. That’s my opinion at least.

What is HAWQ?

Pivotal HAWQ is essentially is a Hadoop-native advanced SQL analytics database.

“Hadoop-native” means that it is native to how Hadoop manages data… HAWQ is integrated with the Hadoop Distributed File System (HDFS)… and also with all other members of the Hadoop ecosystem, any way of representing data on Hadoop.

Being native means that HAWQ is well integrated with Hadoop’s resource management: YARN… YARN is how HADOOP partitions resources between different analytics frameworks, let them co-exist on the same compute cluster, and access data locked within the same cluster. HAWQ is fully YARN-aware.

Advanced SQL means we are fully SQL compliant. As a result, for example, HAWQ lets you hook up TABLEAU, which is the one of the most popular Business Intelligence tools available today… this means we can help BI users migrate to from their legacy systems to cloud-native environments.

Unlike other tools used by BI analysts, HAWQ is also useful for data scientists. HAWQ supports MADLIB, so they can do things like run linear regressions on the data, do feature extraction on unstructured data, etc… with an environment that’s familiar not just to BI users but also to data scientists. This makes it easier to bring (sell) HAD into the enterprise, because the APIs stay familiar.

To whom is HAWQ useful and where?

HAWQ is good for anyone trying to expand a traditional massively-parallel processing (MPP) database — verticals like financial, oil and gas, big retail, telco. Anybody who needs MPP power combined with Hadoop ecosystem. And anybody already using Hadoop should be trying HAWQ on the same cluster, if analytics rather than transactions are your priority.

Tell us more about the Open Data Platform.

ODP is an exciting initiative, letting different players in Hadoop space advance the state of Hadoop as a holistic platform.

It’s exciting because it allows vendors to solve a very important problem: make sure that Hadoop wins in the enterprise. Think back to the “Unix wars,” where the only winner was Microsoft.

ODP is a non-profit collaborative project so vendors can work together on the core platform, getting it to the level of enterprise readiness, so Hadoop can “leap the chasm” to be useful and ubiquitous in data centers, like Oracle is today. ODP invites all the vendors to grow the market, and then see who owns shares of that much bigger marketplace.

What’s your favorite open source tool?

I still play with the Plan9 operating system that came from Bell Labs. I love it because it’s small and agile enough to fit into weird how like my Raspberry Pi. It’s a tool for all the weird things I do in my spare time.

I use Mutt, which is one of the first OSS projects I contributed to, back at Sun.

Vi or Emacs?

Vi. But on Plan9, I’d rather use Acme.