Harbor is an open-source cloud native registry project that stores, signs, and scans content. Harbor was created by a team of engineers at VMware China. The project was contributed to CNCF for wider adoption and contribution. Recently the project announced its 2.0 release. Swapnil Bhartiya, the founder of TFiR.io, sat down with Michael Michael, Harbor maintainer and VMware’s Director of Product Management, to talk about Harbor, community and the latest release.
Here is a lightly edited transcript of the interview:
Swapnil Bhartiya: Let’s assume that you and I are stuck in an elevator and I suddenly ask you, “What is Harbor?” So please, explain what it is.
Michael Michael: Hopefully you’re not stuck in the elevator for long; but Harbor essentially is an open source cloud-native registry. Think of this as a repository where you can store and serve all of your cloud-native assets, your container images, your Helm charts, and everything else you need to basically build cloud native applications. And then some putting posts on top of that, some very good policy engines that allow you to enforce compliance, make sure your images that you’re serving are free from vulnerabilities and making sure that you have all the guardrails in place so an operator can manage this registry and delivery it to his developers in a self-service way.
Swapnil Bhartiya: Harbor came out of VMware China. So I’m also curious that what was the problem that the team saw at that point? Because there were a lot of projects that were doing something similar, that you saw unique that Harbor was created?
Michael Michael: So essentially the need there was, there wasn’t really a good way for an enterprise to have a hosted registry that has all of the enterprise capabilities they were looking for, while at the same time being able to have full control over the registry. Like a lot of the cloud providers have their own registry implementation, there’s Docker Hub out there, or you can go and purchase something at a very expensive price point. But if you’re looking for an open source solution that gives you end to end registered capabilities, like your developers can push images and pull images, and then your operators can go and put a policy that says, Hey, I want to allow this development team to create a project, but not using more than a terabyte of storage. None of those solutions had that, so there was a need, a business need here to develop a registry. And on top of that, we realized that it wasn’t just us that had the same need, there was a lot of users and enterprises out there in the cloud native ecosystem.
Swapnil Bhartiya: The project has been out for a while and based on what you just told me, I’m curious what kind of community the product has built around itself and how the project has evolved? Because we will also talk about the new release version 2.0 but before that, I want to talk about the volitional project and the community around it.
Michael Michael: Project has evolved fairly well over the years we have increased our contributors. The contribution statistics are that CNCF is creating are showing that we’re growing our community. We now have maintainers in the project from multiple organizations and there are actually three organizations that have more than one maintainer on the project. So it’s kind of showing you that they’re, the ecosystem has picked up. We are adding more and more functionality into Harbor, and we’re also making Harbor pluggable. So there are areas of Harbor where we’re saying, Hey, here’s the default experience with Harbor, but if you want to extend the experience based on the needs of your users go ahead and do that and here’s an easy way to implement an interface and do that. That has really increased the popularity of Harbor. That means two things, we can give you a batteries-included version of Harbor from the community and then we’ll give you the option to extend that to fit the needs of your organization.
And more importantly, if you have made investments in other tooling, you can plug and play Harbor in that. When I say other tooling, I mean, things like CI/CD systems, those systems are primarily driving the development life cycle. So for example, you go from source code to container image to something that’s stored in a registry like Harbor. The engine that drives the pipeline, that workflow in a lot of ways is a CI/CD engine. So how do you integrate Harbor well with such systems? We’ve made that a reality now and that has made Harbor easier to put in an organization and get it adopted with existing standards and existing investments.
Swapnil Bhartiya: Now let’s talk about the recently announced 2.0. Talk about some of the core features, functionalities that you are excited about in this release.
Michael Michael: Absolutely, there’s like three or four features that really, really excite me. A long time coming is the support for OCI. The OCI is the Open Container Initiative and essentially it’s creating a standardized way to describe what an image looks like. And we in Harbor 2.0 we are able to announce that we have full OCI supporting Harbor. What does that mean for users? In previous releases of Harbor you could only put into Harbor two types of artifacts; a container image and a Helm chart. It satisfies a huge number of the use cases for customers, but it’s not enough in this new cloud native ecosystem, there are additional things that as a developer, as an operator, as a Kubernetes administrator, you might want to push into a repository like Harbor and have them also adopt a lot of the policy engine that Harbor provides.
Give you a few examples, single bundles, the cloud native application, a bundle. You could have OPA files, you could have singularity and other OCI compliant files. So now Harbor tells you that, Hey, you have any file type out there? If it’s OCI compliant, you can push it to Harbor, you can pull it from Harbor. And then you can add things like coders and retention policies and immutability policies and replication policies on top of that. The thing about that now, just by adding a few more types of supported artifacts into Harbor, those types immediately get to use the full benefit of Harbor in terms of our entire policy engine and the compliance that do offer to administrators of Harbor.
Swapnil Bhartiya: What does OCI compliance mean for users? Because by being compliant, you have to be more strict about what you can and cannot do. So can you talk about that? And also how does that also affect the existing users, should they have to worry about something or it doesn’t really matter?
Michael Michael: Existing users shouldn’t have to worry about this, there’s fully backward compatibility that can still push their container images, which are OCI compliant. And if you’re using a Helm Chart before, you can still push it into Charts Museum, which is a key component of Harbor, but you can now also put a Helm Chart as an OCI file. So for existing users, not much difference, backward compatibility, we still support them. The users are brothers here, we’re not going to forget them. But what it means now is actually, it’s not more strict this is a lot more open. If you’re developing artifacts that are OCI compliant and they’re following the standard way of describing an image and a standard way of actually executing an image at run time; now Kubernetes is also OCI compliant at the run time. Then you’re getting the benefits of both worlds. You get Harbor as the repository where you can store your images and you also get a run time engine that’s OCI compliant that could potentially execute them. The really great benefit here for the users.
A couple of other features that Harbor 2.0 Brings are super, super exciting. The first one is the introduction of Trivy by Aqua Security, as the batteries included built-in scanner in Harbor. Previously, we use Claire as our built-in scanner and with the release of Harbor called 1.10 that came out in December 2019, we introduced what we call a pluggable framework, think of this as a way that security vendors like Aqua and Encore can come in and create their own implementation of a security scanner to do static analysis on top of images that are deployed in Harbor.
So we still included Claire as a built-in scanner and then we added additional extension points. Now we actually liked Trivy that much our community and our users love Trivy it’s the ability to enforce and to study analysis on top of multiple operating systems on top of multiple application managers, it’s very well aligned with the vision that you have from a security standpoint in Harbor. And now we added Trivy as the built-in scanner in Harbor, we ship with it now. A great, great achievement and kudos to the Aqua team for delivering Trivy as an open source project.
Swapnil Bhartiya: That’s the question I was going to ask, but I, once again, I’ll ask the same thing again, that, what does it mean for users who were using Claire?
Michael Michael: If you’re using Claire before and you want to continue using Claire, by all means, we’re going to continue updating Claire, Claire is already included in Harbor. There’s no changes in the experience. However, if you’re thinking that Trivy is a better scanner for you, and by the way, you can use them side by side so you can compare the scanning results from each scanner. And if Trivy is a better option for you, we enabled you to make that choice. Now the way Harbor works is that you have a concept of multitenancy and we isolate a lot of the settings and the policy in the organization of images and on a per-project basis. So what does that mean? You can actually go into Harbor and you can define a project and you can say for this project I want Claire to be the built-in scanner.
And then Claire will scan all your projects in that, all the files in that project. And you can use a second project and say, well, I now want Trivy to be the scanner for this project. And then Trivy of you will scan your images. And if you have the same set of images, you can compare them and see which scanner works best based on your needs as an organization and as a user. This phenomenal, right? To give users choice and we give them all the data, but ultimately they have to make the decision on what is the best scanner for them to use based on their scenarios, the type of application images and containers that they use and the type of libraries in they use those containers.
Swapnil Bhartiya: Excellent. Before we wrap this up, what kind of roadmap you have for Harbor, of course, it’s an open source project. So there’s no such thing as when the 2.0 release is coming out. But when we look at 2020, what are the major challenges that you want to address? What are the problems you want to solve and what does the basic roadmap look like?
Michael Michael: Absolutely, I think that one of the things that we’ve been trying to do as a maintainer team for Harbor is to kind of create some themes around the release is kind of put a blueprint down in terms of what is it that we’re trying to achieve? And then identify the features that make sense in that theme. And we’re not coming up with this from a vacuum, we’re talking to users, we’re talking to other companies where we have KubeCon events in the past where we had presentations and individuals came to us asking us sets of questions. We have existing users that give us feedback. When we gather all of that, one of the things that we came up with as the next thing for our release is what you call image distribution. So we have three key features that we’re trying to tackle in that area.
The first one is how can Harbor act as a proxy cache? To enable organizations that are either deploying Kubernetes environments at the edge and they want a local Harbor instance to proxy or mirror images from the mothership like your main data center and where networking is at the premium. Maybe some of the Kubernetes nodes are not even connected to the network and they want to be a support to pull images from Harbor and then Harbor pulls the images from the upstream data center. Very, very important feature. Continuing down the path of image distribution. We’re integrating Harbor with both Dragonfly by Alibaba and Project Kraken by Uber to facilitate peer to peer distribution mechanisms for your container images. So how can we efficiently distribute images at the edge in multiple data centers in branch offices that don’t have a good network or thick network pipe between them? And how can Harbor make sure that the right images land at the right place? Big, big features that we’re trying to work with the community. And obviously we’re not doing this alone, we’re working with both Kraken and the Dragonfly communities to achieve that.
And last, the next feature that we have is what you call garbage collection without downtime. Traditionally, we do garbage collection and this is kind of the process where you get to reclaim some of the files and layers of, basically container images that are no longer in use.
Think of an organization that pushes and pulls thousands of images every day; they re-tag them, they create new versions. Sometimes you end up with layers that are no longer used, in order for those layers to be reclaimed at the storage and by the system, their registry in needs to be locked down as in nobody can be pulling or pushing images to it. In Harbor 2.0 we actually made a significant advancement where we track all the layers and the metadata of images in our database rather than depending on another tool or product to do it. So now this actually paves a road so that in the future, we could actually do garbage collection with zero downtime where Harbor can identify all the layers that are no longer in use, go reclaim them. And then that will have zero adverse impact or downtime to the users are pushing and pulling content. Huge, huge features and that’s the things that we’re working on in the future.
Swapnil Bhartiya: Awesome, thank you Michael for explaining things in detail and talking about Harbor. I look forward to talk to you again. Thank you.
Michael Michael: Absolutely. Thank you so much for the opportunity.