Steven J. Vaughn-Nichols writes on ZDNet:
Once more, at The Linux Foundation‘s virtual Open Source Summit, VMware‘s Chief Open Source Officer, Dirk Hohndel, and Linux’s creator, Linus Torvalds had a wide-ranging conversation about Linux development.
The illustrious pair started with Hohndel asking about the large size of the recent Linux kernel 5.8 initial release. Hohndel wondered if it might have been so big because developers were staying home thanks to the coronavirus. Torvalds, who always worked at home, said, “I suspect 5.8 might be [so large] because of people staying inside but it might also be, it’s just happened that several different groups ended up coming at roughly the same time, with new features in 5.8.”
Harbor is an open-source cloud native registry project that stores, signs, and scans content. Harbor was created by a team of engineers at VMware China. The project was contributed to CNCF for wider adoption and contribution. Recently the project announced its 2.0 release. Swapnil Bhartiya, the founder of TFiR.io, sat down with Michael Michael, Harbor maintainer and VMware’s Director of Product Management, to talk about Harbor, community and the latest release.
Here is a lightly edited transcript of the interview:
Swapnil Bhartiya: Let’s assume that you and I are stuck in an elevator and I suddenly ask you, “What is Harbor?” So please, explain what it is.
Michael Michael: Hopefully you’re not stuck in the elevator for long; but Harbor essentially is an open source cloud-native registry. Think of this as a repository where you can store and serve all of your cloud-native assets, your container images, your Helm charts, and everything else you need to basically build cloud native applications. And then some putting posts on top of that, some very good policy engines that allow you to enforce compliance, make sure your images that you’re serving are free from vulnerabilities and making sure that you have all the guardrails in place so an operator can manage this registry and delivery it to his developers in a self-service way.
Swapnil Bhartiya: Harbor came out of VMware China. So I’m also curious that what was the problem that the team saw at that point? Because there were a lot of projects that were doing something similar, that you saw unique that Harbor was created?
Michael Michael: So essentially the need there was, there wasn’t really a good way for an enterprise to have a hosted registry that has all of the enterprise capabilities they were looking for, while at the same time being able to have full control over the registry. Like a lot of the cloud providers have their own registry implementation, there’s Docker Hub out there, or you can go and purchase something at a very expensive price point. But if you’re looking for an open source solution that gives you end to end registered capabilities, like your developers can push images and pull images, and then your operators can go and put a policy that says, Hey, I want to allow this development team to create a project, but not using more than a terabyte of storage. None of those solutions had that, so there was a need, a business need here to develop a registry. And on top of that, we realized that it wasn’t just us that had the same need, there was a lot of users and enterprises out there in the cloud native ecosystem.
Swapnil Bhartiya: The project has been out for a while and based on what you just told me, I’m curious what kind of community the product has built around itself and how the project has evolved? Because we will also talk about the new release version 2.0 but before that, I want to talk about the volitional project and the community around it.
Michael Michael: Project has evolved fairly well over the years we have increased our contributors. The contribution statistics are that CNCF is creating are showing that we’re growing our community. We now have maintainers in the project from multiple organizations and there are actually three organizations that have more than one maintainer on the project. So it’s kind of showing you that they’re, the ecosystem has picked up. We are adding more and more functionality into Harbor, and we’re also making Harbor pluggable. So there are areas of Harbor where we’re saying, Hey, here’s the default experience with Harbor, but if you want to extend the experience based on the needs of your users go ahead and do that and here’s an easy way to implement an interface and do that. That has really increased the popularity of Harbor. That means two things, we can give you a batteries-included version of Harbor from the community and then we’ll give you the option to extend that to fit the needs of your organization.
And more importantly, if you have made investments in other tooling, you can plug and play Harbor in that. When I say other tooling, I mean, things like CI/CD systems, those systems are primarily driving the development life cycle. So for example, you go from source code to container image to something that’s stored in a registry like Harbor. The engine that drives the pipeline, that workflow in a lot of ways is a CI/CD engine. So how do you integrate Harbor well with such systems? We’ve made that a reality now and that has made Harbor easier to put in an organization and get it adopted with existing standards and existing investments.
Swapnil Bhartiya: Now let’s talk about the recently announced 2.0. Talk about some of the core features, functionalities that you are excited about in this release.
Michael Michael: Absolutely, there’s like three or four features that really, really excite me. A long time coming is the support for OCI. The OCI is the Open Container Initiative and essentially it’s creating a standardized way to describe what an image looks like. And we in Harbor 2.0 we are able to announce that we have full OCI supporting Harbor. What does that mean for users? In previous releases of Harbor you could only put into Harbor two types of artifacts; a container image and a Helm chart. It satisfies a huge number of the use cases for customers, but it’s not enough in this new cloud native ecosystem, there are additional things that as a developer, as an operator, as a Kubernetes administrator, you might want to push into a repository like Harbor and have them also adopt a lot of the policy engine that Harbor provides.
Give you a few examples, single bundles, the cloud native application, a bundle. You could have OPA files, you could have singularity and other OCI compliant files. So now Harbor tells you that, Hey, you have any file type out there? If it’s OCI compliant, you can push it to Harbor, you can pull it from Harbor. And then you can add things like coders and retention policies and immutability policies and replication policies on top of that. The thing about that now, just by adding a few more types of supported artifacts into Harbor, those types immediately get to use the full benefit of Harbor in terms of our entire policy engine and the compliance that do offer to administrators of Harbor.
Swapnil Bhartiya: What does OCI compliance mean for users? Because by being compliant, you have to be more strict about what you can and cannot do. So can you talk about that? And also how does that also affect the existing users, should they have to worry about something or it doesn’t really matter?
Michael Michael: Existing users shouldn’t have to worry about this, there’s fully backward compatibility that can still push their container images, which are OCI compliant. And if you’re using a Helm Chart before, you can still push it into Charts Museum, which is a key component of Harbor, but you can now also put a Helm Chart as an OCI file. So for existing users, not much difference, backward compatibility, we still support them. The users are brothers here, we’re not going to forget them. But what it means now is actually, it’s not more strict this is a lot more open. If you’re developing artifacts that are OCI compliant and they’re following the standard way of describing an image and a standard way of actually executing an image at run time; now Kubernetes is also OCI compliant at the run time. Then you’re getting the benefits of both worlds. You get Harbor as the repository where you can store your images and you also get a run time engine that’s OCI compliant that could potentially execute them. The really great benefit here for the users.
A couple of other features that Harbor 2.0 Brings are super, super exciting. The first one is the introduction of Trivy by Aqua Security, as the batteries included built-in scanner in Harbor. Previously, we use Claire as our built-in scanner and with the release of Harbor called 1.10 that came out in December 2019, we introduced what we call a pluggable framework, think of this as a way that security vendors like Aqua and Encore can come in and create their own implementation of a security scanner to do static analysis on top of images that are deployed in Harbor.
So we still included Claire as a built-in scanner and then we added additional extension points. Now we actually liked Trivy that much our community and our users love Trivy it’s the ability to enforce and to study analysis on top of multiple operating systems on top of multiple application managers, it’s very well aligned with the vision that you have from a security standpoint in Harbor. And now we added Trivy as the built-in scanner in Harbor, we ship with it now. A great, great achievement and kudos to the Aqua team for delivering Trivy as an open source project.
Swapnil Bhartiya: That’s the question I was going to ask, but I, once again, I’ll ask the same thing again, that, what does it mean for users who were using Claire?
Michael Michael: If you’re using Claire before and you want to continue using Claire, by all means, we’re going to continue updating Claire, Claire is already included in Harbor. There’s no changes in the experience. However, if you’re thinking that Trivy is a better scanner for you, and by the way, you can use them side by side so you can compare the scanning results from each scanner. And if Trivy is a better option for you, we enabled you to make that choice. Now the way Harbor works is that you have a concept of multitenancy and we isolate a lot of the settings and the policy in the organization of images and on a per-project basis. So what does that mean? You can actually go into Harbor and you can define a project and you can say for this project I want Claire to be the built-in scanner.
And then Claire will scan all your projects in that, all the files in that project. And you can use a second project and say, well, I now want Trivy to be the scanner for this project. And then Trivy of you will scan your images. And if you have the same set of images, you can compare them and see which scanner works best based on your needs as an organization and as a user. This phenomenal, right? To give users choice and we give them all the data, but ultimately they have to make the decision on what is the best scanner for them to use based on their scenarios, the type of application images and containers that they use and the type of libraries in they use those containers.
Swapnil Bhartiya: Excellent. Before we wrap this up, what kind of roadmap you have for Harbor, of course, it’s an open source project. So there’s no such thing as when the 2.0 release is coming out. But when we look at 2020, what are the major challenges that you want to address? What are the problems you want to solve and what does the basic roadmap look like?
Michael Michael: Absolutely, I think that one of the things that we’ve been trying to do as a maintainer team for Harbor is to kind of create some themes around the release is kind of put a blueprint down in terms of what is it that we’re trying to achieve? And then identify the features that make sense in that theme. And we’re not coming up with this from a vacuum, we’re talking to users, we’re talking to other companies where we have KubeCon events in the past where we had presentations and individuals came to us asking us sets of questions. We have existing users that give us feedback. When we gather all of that, one of the things that we came up with as the next thing for our release is what you call image distribution. So we have three key features that we’re trying to tackle in that area.
The first one is how can Harbor act as a proxy cache? To enable organizations that are either deploying Kubernetes environments at the edge and they want a local Harbor instance to proxy or mirror images from the mothership like your main data center and where networking is at the premium. Maybe some of the Kubernetes nodes are not even connected to the network and they want to be a support to pull images from Harbor and then Harbor pulls the images from the upstream data center. Very, very important feature. Continuing down the path of image distribution. We’re integrating Harbor with both Dragonfly by Alibaba and Project Kraken by Uber to facilitate peer to peer distribution mechanisms for your container images. So how can we efficiently distribute images at the edge in multiple data centers in branch offices that don’t have a good network or thick network pipe between them? And how can Harbor make sure that the right images land at the right place? Big, big features that we’re trying to work with the community. And obviously we’re not doing this alone, we’re working with both Kraken and the Dragonfly communities to achieve that.
And last, the next feature that we have is what you call garbage collection without downtime. Traditionally, we do garbage collection and this is kind of the process where you get to reclaim some of the files and layers of, basically container images that are no longer in use.
Think of an organization that pushes and pulls thousands of images every day; they re-tag them, they create new versions. Sometimes you end up with layers that are no longer used, in order for those layers to be reclaimed at the storage and by the system, their registry in needs to be locked down as in nobody can be pulling or pushing images to it. In Harbor 2.0 we actually made a significant advancement where we track all the layers and the metadata of images in our database rather than depending on another tool or product to do it. So now this actually paves a road so that in the future, we could actually do garbage collection with zero downtime where Harbor can identify all the layers that are no longer in use, go reclaim them. And then that will have zero adverse impact or downtime to the users are pushing and pulling content. Huge, huge features and that’s the things that we’re working on in the future.
Swapnil Bhartiya: Awesome, thank you Michael for explaining things in detail and talking about Harbor. I look forward to talk to you again. Thank you.
Michael Michael: Absolutely. Thank you so much for the opportunity.
Cloud Foundry Foundation recently announced the launch of Paketo Buildpacks for cloud native developers and operators. But what’s the difference between Cloud Foundry’s Packet and the Buildpacks announced by CNCF? What does it mean for developers who are already using buildpacks? What kind of community is Cloud Foundry look at building around Paketo? What does the roadmap look like?
To get answers to these questions and deep dive into Paketo Buildpacks, Swapnil Bhartiya, Founder of TFiR.io spoke with Chip Childers, Executive Director, Cloud Foundry Foundation and Kashyap Vedurmudi, Product Manager at VMware.
Here is a lightly edited transcript of the interview:
Swapnil Bhartiya: Today we have two guests, Kashyap Vidurmudi, product manager at VMware, and Chip Childers, executive director of Cloud Foundry. Today we are going to talk about the recently announced Paketo Buildpacks. I don’t want to get into that old debate about Docker files versus Buildpacks, but there are two things that I do want to talk about before we talk about Paketo in specific – compliance and security. How does Paketo Buildpacks solve these two problems?
Kashyap Vidurmudi: So, we have a couple of things. We are constantly shipping Buildpacks just whenever upstream security vulnerability comes out, a new language family version, things like that. So Buildpacks make it much easier especially for enterprise users just to continuously make sure that their apps stay up to date, and secure, and compliant. So this is I think a huge value proposition of what Buildpacks offer versus using Docker files to run your apps and to build your apps and production.
Chip Childers: The history of the Cloud Foundry project is, it’s been using Buildpack since nearly the beginning of its inception, originally at VMware, right, before it took it to journey to pivotal and then the CFF. So Buildpacks have demonstrated their value when used with a platform that’s able to implement them effectively, a few times, right? In particular, I’m thinking about the OpenSSL Heartbleed vulnerability. I found that to be a great example of when languages and runtimes don’t embed too many things in their distribution statically, then you’re able to use the Buildpack process to roll out security patches to these really important underlying libraries very quickly.
Chip Childers: As an example, Kashyap said that the buildpack project with Paketo Buildpacks, they’ve always been keeping up to date with all their critical vulnerabilities or high vulnerabilities from all the languages and frameworks that get pulled together. We had the OpenSSL update rolled out to the whole ecosystem and it managed to percolate through all the platforms that had the CF Buildpacks embedded in them very quickly, like in a matter of days. And it was really smooth. The only hiccup back then was that no JS actually included the OpenSSL library in its own distribution. So I think it was about a month or so after Heartbleed that they split that out and then Buildpacks could be more effective at helping to support some of these underlying libraries.
Swapnil Bhartiya: Thanks for explaining that. If I’m not wrong, last year, CNCF also announced a Buildpack project. What is the difference between what CNCF is doing there versus what you guys are trying to do?
Kashyap Vidurmudi: That’s a great question and probably the biggest question we’ve been getting asked with this whole launch. So the CNCF Cloud Native Buildpacks project, they built the underlying specification and tooling needed to build a Cloud Native compliant Buildpack. Or the Paketo Buildpacks project is just a set of language family implementations on top of these Cloud Native Buildpack specifications. So we build implementations when we launched the other day, we have Java, node.js, PHP, .NET Core, and probably a couple of others that I’m missing, Buildpack implementations on top of that spec.
Swapnil Bhartiya: And why do you call it Paketo Buildpacks, the specific reasons for this naming?
Kashyap Vidurmudi: That’s a great question as well. To be completely honest with you, our whole engineering team went through about two different naming exercises just to generate different names for Buildpacks. At a team lunch, a couple of months ago, someone came up with the Paketo, which translates to Greek and… Sorry, it translates to package in Greek. What we really liked about it was Kubernetes translates to pilot and Greek, and we liked that with Paketo translating a package in Greek. We can come off with the association that Paketo packages your apps as container images that any Cloud Native platforms similar to Kubernetes can work as straight. So the name stuck at the end.
Swapnil Bhartiya: Talk a bit about the collaboration between Cloud Foundry and VMware for this project.
Chip Childers: I want to start probably by saying, the kind of Buildpack project is a Cloud Foundry Foundation project, right? And so what that means is it’s the same engineers and contributors that are working on the traditional Cloud Foundry. Buildpacks are building the Paketo Buildpacks collection, right? So you get all their past experience as a community building and maintaining, and keeping up to date these new Cloud Native Buildpack compliant things. One of the goals of the project team, which I’m sure Kashyap could share a little bit more about as well, is that traditionally the Cloud Foundry Buildpack collection has seen the majority of the effort that was put into maintaining it coming from pivotal.
There were certainly a lot of casual contributors, but it was something, that pivotal bore the full burden on. And we think that it’s incredibly important that now that the Cloud Native Buildpacks spec can be used in many different platforms. That a lot of participants rally around this because it’s an opportunity to get really high-quality Buildpack code brought into whichever platform you’re using, whether it’s Tecton, or it’s Google Cloud run, or whether it’s the CF [inaudible 00:07:06] distribution of Cloud Foundry. There are going to be a lot of end-users that should be able to amplify the feedback loop back to the project team. And we’re very open to new contributors there.
Swapnil Bhartiya: What kind of community are you planning to build around these Paketo Buildpacks and what will be the resources available for the community to build and consume these Buildpacks?
Kashyap Vidurmudi: I think just to add on a little bit to what Chip said, the community is super important for us with this whole Paketo Buildpacks launch. I think what we’re looking for ideally is a mix of vendors helping us out similar to what Cloud Foundry Foundation has had in the past, as well as individual contributors. And what’s super exciting to see is we just launched a couple of days ago and we’re already seeing a bunch of people reaching out, and trying out Paketo Buildpacks, and interested in contributing. We’re seeing that maybe people might be interested in helping us develop a Python, Paketo Buildpacks, which is really cool to see. To answer the second part of your question around a marketplace or some ecosystem, I think in the future, that would be super cool to have something like that. In the short term, what we’re doing is we have with this concept of builder images where a builder is effectively a set of Buildpacks, Paketo Buildpacks that are packaged in there. So we ship our builders onto a GCR registry that users can then use to consume our Buildpacks.
Swapnil Bhartiya: Is there any specific Buildpacks that will be available or you’ll be focusing on to start with?
Kashyap Vidurmudi: Yeah. When we launched the other day, we officially have Java, node.js, .NET Core, PHP, and Nginx Paketo Buildpacks available at the moment. We’re currently just getting started around a Ruby Paketo Buildpacks and looking into publishing some official project-wide roadmap in the future to show what’s coming next.
Chip Childers: I think that’s another really good opportunity for people to get involved. As you said, there’s been interest organically in helping to add Python as a Buildpack. There’s a very long tail of different languages and frameworks that are used in the enterprise context. And so Paketo Buildpacks was going out the door with a set of Buildpacks that basically solved the majority of enterprise development use cases, right? Python is used very heavily, but it’s a little bit less than Java, right? And so the tail starts to drop a little bit. But there’s a lot of opportunity in those languages and frameworks that the Paketo Buildpacks project team hasn’t created on their own. But those same patterns can be followed for languages that might be maybe less used.
As the community grows around, not just the Cloud Native Buildpacks spec, right, because anyone can build a Buildpack to that spec. But I think the practices of the Paketo Buildpacks project lend themselves to quality distribution of a Buildpack, right? If you search and get up for Buildpacks, even if you’re just looking at the past version of the way Buildpacks work, you find thousands of them, right? But some of them are stale, some of them are, they have work. And I think the more important than exactly which Buildpacks are offered today is that the Paketo Buildpacks project is an opportunity for people to come together around the discipline of building quality Buildpacks and then maintaining them over time.
Kashyap Vidurmudi: Yeah, exactly. That’s a really good point. And I think that over the next coming weeks to months, we’re really focused on improving a lot of our documentation to help enable things like this. We have a couple of tutorials right now just to help users create a Paketo style Buildpack and lots of tools and things like that out there. So my end goal and just sure Chip agrees with this, which is, I’d love to see a user just coming in with very little Buildpack experience and be able to build, say, a Rust Cloud Native Buildpack or something like that very simply and easily and support that. And that’s the end goal of where we want to go in terms of enabling the community to build Buildpacks easily.
Swapnil Bhartiya: So what happens to the existing Buildpacks that people are already using?
Kashyap Vidurmudi: For Cloud Foundry Buildpacks, we’re going to continue providing support for CF workloads into the foreseeable future. So what we did is we built a concept of a compatibility layer on top of every one of our Paketo Buildpacks, which allow us to ship a Cloud Foundry compatible Cloud Native Buildpack. And that enables your CF workflows to continue to work with Paketo Buildpacks.
Chip Childers: I think one of the things to understand, and this is where it gets a little bit confusing, right? Buildpacks as a concept has a fairly long history. So it started at Heroku. CF was emulating Heroku, right? It was the open source alternative to Heroku and it implemented Buildpacks in order to have that support. And for a while, they were largely compatible, right? You could take a Heroku Buildpack and you could use that in a Cloud Foundry context or you could do the reverse. And so that worked for a while. The two platforms, right, Cloud Foundry and opensource community. And then Heroku as a product or a platform as a service, that’s all proprietary, they started to diverge, right? So the compatibility within the ecosystem started to break down.
When the CNCF Cloud Native Buildpacks project kicked off, to me that was actually one of the most important moments in the platform as a service space in a number of years. Because it represented a reconvergence of streams of work and sets of experiences with different end-users that made a ton of sense for everyone. But what that means though, is that the CMB spec is, it’s a new way to build Buildpacks, right? So all that historical work for the CF community building that shim is important, but it’s really critical to understand that a Cloud Native Buildpack, compliant Buildpack is different from a traditional Heroku or Cloud Foundry, older version Buildpack. They’re implemented differently. And so it’s a new generation of them. And that’s where a new ecosystem because there are multiple platforms that don’t support their use, is really going to kick in here.
Swapnil Bhartiya: Kashyap, you mentioned there’ll be a lot of resources documentation that would be coming up. What are the resources that are available at this moment that people can either read or go to that to get more aware of the project at the same time, how they can get involved with the project?
Kashyap Vidurmudi: Yeah. So right now we have a couple of tutorials out there just around how to get started with Paketo Buildpacks as well as how to go ahead and create your own Paketo Buildpacks. In terms of getting started and helping out and getting involved, I think the best way to get started right now is to join us on Slack, our Slack is Slack.paketo, P-A-K-E-T-O.io, or visit our website and go through the content. The website is P-A-K-E-T-O.io.
Swapnil Bhartiya: Chip and Kashyap, thank you so much for taking time out of your schedule and talking to us today about this project. Good luck with that project and thank you once again.