Alluxio recently joined the Presto Foundation. We talked to Alluxio CEO, Steven Mih to better understand the Presto Foundation community.
Swapnil Bhartiya: Let’s start with a bit about the Presto Foundation. What is it about? What does it do?
Steven Mih: The Presto Foundation is a project hosted under the Linux Foundation. It was created last year by companies like Facebook, Twitter, Alibaba and Uber. Alluxio is an open source project that is commonly used with Presto, the open source distributed SQL query engine, as well as other projects like Spark and TensorFlow. We support all these different frameworks. And since this was a foundation that was open to all, we decided to join it as one of the companies involved in that foundation.
Swapnil Bhartiya: If you look at the goals of the foundation, what value does Alluxio bring to it?
Steven Mih: The Linux foundation projects are all about open source, it’s helping grow the communities of these projects. With the Presto Foundation being hosted under the Linux Foundation, we work in an open source way to help develop the community and increase the adoption of the Presto project.
Alluxio is often used under Presto, so the value we bring is around accelerating the data to that. We recently developed a preview which now allows users to transform the data into the format that Presto is looking for. So we’re pretty excited about those things and we’ll be talking about that at PrestoCon that’s coming up at the end of March (now cancelled due to Covid-19).
Swapnil Bhartiya: Can you also explain how people, companies, developers use Alluxio with Presto and also give examples of some of the major use cases?
Steven Mih: One of the big use cases is that Presto is designed to query anything anywhere. It has connectors to different data sources, which can be in remote places. That’s where Alluxio is co-installed with Presto workers which allows users to make that data to be available and local. The result of that is extremely high performance.
In today’s customer environment, they oftentimes are doing more multi-cloud or hybrid and they have data in different sources. There could be data on prem. They can’t necessarily get to the cloud yet, or vice versa. There may be S3 buckets somewhere that they need access to. Alluxio makes all of that seamless for the Presto users.
Swapnil Bhartiya: Can you elaborate that a bit?
Steven Mih: You can now have a much local and higher performing system because the data is now cached locally to the Presto clusters. What it means for data in remote places is that the data infrastructure becomes a lot simpler. Without Alluxio with Presto, you’d have to copy that data and make different silos. The copies of that data need to be synchronized; it needs to be maintained. Users end up having a pretty big data wrangling challenge.
We call it the PAS stack, Presto, Alluxio and S3. That stack is becoming much more common now as users can add S3 to it, they can add HDFS to it in remote places and it just operates at a much higher level as if it’s local and very high performance. On top of that, we’ve added even more to this in our developer preview. We’ve added a catalog service as well as transform operations and we are really excited about how that adds to the picture.