In part 1 of our package repositories series, important terms like packages, metadata, dependencies, and upstreams were explained. In this part 2, we will take it further, diving into trends within the software landscape that have changed what developers and organizations want from a package repository.
In recent years we’ve seen a push to use managed services in the cloud, automation, supply chain security. These practices and challenges have influenced what package repositories and package management means in 2021 and what it will mean for the future of software delivery.
Cloud-Native
The movement towards the cloud is one of the most significant changes in computing for organizations over the last ten years. At a minimum, cloud infrastructure and development has created new package formats such as Docker, Terraform, Helm Charts.
Much more than that, developers and organizations don’t just want package management software- they want a ‘managed’ package management service. A managed package management service will eliminate the cost of supporting an in-house system while improving the reliability of accessing those packages that can scale as they grow.
Organizations and developers don’t want to worry about infrastructure, patching, upgrades, replications, or scaling. They want their package repository service to have high availability and be managed and accessed on the cloud securely. In order for a package management solution to exploit the flexibility, scalability, and resilience of cloud computing, it needs to be architectured to be cloud-native.
Automation
Continuous integration and continuous delivery (CI/CD) is a method to frequently deliver builds by introducing automation into the stages of software development. The whole purpose of this is to release quality code faster.
CI achieves continuous flow for code. CD achieves continuous flow for delivery. But what glues them together? Continuous packaging (CP) is the term to describe maximizing process and flow in software packaging using automation. Without CP, CI/CD is missing continuous flow for the process of packaging (creating, fetching, inspecting, and managing packages). CP means that assets are always traceable, deployable, and built in the same way.
The process of packaging includes creating packages, assembling external packages, inspecting/managing artifacts, token creation, downloading, installing artifacts, event logging, and metadata extraction. For CP to work, developers, CI/CD systems, and scanning tools need to be able to interact with the process of packaging easily and programmatically using well-documented APIs, CLIs, and integrations.
Adding CP to your software process avoids the ad hoc construction or retrieval of assets, and gives a traceable and visible history of promotion from the source (developers and external) right through to delivery (whether internal or external).
Distributed Teams
Distributed teams were always quite common in Software Development, but Covid has supercharged its adoption even in small companies.
How does this affect package repositories? Before joining Cloudsmith, I worked in a few distributed teams where I experienced serious lag when pushing/pulling packages compared to my colleagues in other regions. A typical problem would be having a limited number of licenses for our private repository- the private repository might be deployed on servers in the US, but not in Europe. It was frustrating, affected collaboration, and slowed down testing and building.
It’s not acceptable for some teams to experience low latency while other geographically distributed teams have to put up with significant delays. Package repository tools in the past dealt with this by implementing global replications on servers, but this becomes difficult to manage and troubleshoot as the number of regions increases. Package repositories that are cloud-native deal with this problem more elegantly as they can use techniques such as PDNs with edge caching to store commonly used packages as close to the users as possible - anywhere in the world.
Emphasis on Supply Chain Security
The software supply chain (SSC) is all of the steps that go into deploying or distributing your software from the initial development stage, to testing, packaging, and deployment. It includes your code, scripts, environmental variables, IDEs, plugins, source code repositories, CI/CD tools, scanning tools, and of course package repositories. The attack surface for the software supply chain is vast. Recent attacks like SolarWinds and CodeCov, for example, prompted efforts to improve the security of software supply chains. Where you push and pull your software artifacts from is intrinsic to securing the supply chain and it has highlighted the importance of package repositories.
Robust Security
First things first- package repositories need strong security features to prove they are trustworthy:
- Robust access control with 2FA for distribution and development
- Event logs
- High availability
- All communication and storage should be encrypted in-transit and at-rest
A Single Source of Truth
Private repositories that support many formats provide one single place to track, manage, distribute and understand all software pulled into your stack. A central trusted store forces you to apply processes and controls to that ingress/egress of software packages.
Provenance of Packages
Package repositories can secure your packages and interrogate the provenance of packages:
- Package metadata includes information on dependencies, licenses, versions, who wrote the code, results from vulnerability scans, information from CI tools. Package repositories need to extract, store and surface all of this data as it is intrinsic to resolving the provenance of software packages.
- Attest (prove to outside parties) to the provenance of all the software assets and their dependencies, by signing and verifying every package uploaded.
- Provide event logs on package usage.
- Provide upstreams for outside packages hosted elsewhere to protect from outages from 3rd party repositories
- Provide all of the packages needed in a Software Bill of Materials (SBOM)
Automation
Package repositories should promote automation by applying Continuous Packaging (CP) techniques to integrate programmatically with CI, CD, and scanning tools. Automating as much of the software supply chain as possible and making automation easy can significantly reduce the possibility of human error, improve quality, traceability and help make builds more reproducible.
Your package repository can help you build trust in your software supply chain by giving you visibility and control over every single package in your software in an automated way- the single source of truth for all your software artifacts. Even in situations where the supply chain has been compromised, if you have visibility and control, you’ll be in a much better place to identify the who, how, where, why and what of what is affected, plus a much greater potential of fixing the issue or minimizing impact.
Languages with Community-Based Package Management
Before the adoption of community-based package managers, public language repositories, e.g., PEAR for PHP, were slow to include new packages and subject to a review board populated by a few of the language's elder statesmen. Languages with community-based package management, e.g., npm for javascript and PyPI for Python, make publishing and consuming packages easy. This ease of use has made them popular and accelerated the development process and use of OSS, but it has introduced some security issues.
Popular package repositories such as npm, PyPI, RubyGems, Go, and others have been impacted by malicious attacks such as dependency confusion, or typosquatting. In addition, these public repositories that host the packages can’t guarantee uptime; private repositories with upstreams can protect against outages. These issues are related to the previous section on securing the supply chain.
Node and NPM were the first time I had used a community-hosted OSS package repository. When vetting new NPM packages, I was always worried about adding an unmaintained package or code that could damage the wider project- Is it enough to check the git link, license type, date last updated, the number of downloads, and the listed dependencies? Not really. There needs to be a way to trust that a package and its dependencies are not malicious in an automated, reproducible way.
Signing can be used to build up trust in packages but we discussed in Part 1 of this blog series how signing OSS packages has problems. Work is being done to sign OSS packages in a transparent way. But currently, community-hosted OSS packages are not commonly signed. In the absence of using a trusted signed OSS package, package repositories can scan OSS packages for known vulnerabilities and extract metadata information like version, who wrote the code, results from scans, or license information which can provide insight into the provenance of the software package.
Design Patterns
Design patterns such as REST encouraged developing a strong interface for other programs to use over HTTP. RESTful services made using other web services easy and more reliable. Each web service could potentially be written in a different language as long as the interface was maintained.
More recently, the microservices design pattern gave more teams or individual software developers the confidence to use new languages to develop new services within the same product. One of the possible downsides of the microservices design pattern is that it can produce many packages in different formats. Having many package formats is only a downside if your package repository doesn’t support your chosen package type and you need to manage another repository.
Modern package repositories need to be able to manage and host multiple package formats.
What do I want from a Package Repository?
Package repositories had to evolve as software development changed and has been influenced by cloud adoption, DevOps, OSS, changing software practices, new security threats, and the rise of new package formats.
So, what do I want from a Package repository tool? I want it to:
- Store all formats of my packages for languages, OSs, and containers.
- Allow me to distribute packages to customers
- Be easy to automated with and integrate with CI/CD and security tools
- Provide strong and intuitive security access controls
- Help me attest to the packages in the software supply chain
- Have no loss of speed no matter where my team is
- Oh, and be simple to use with great docs and support.
To do this I need a package management solution that:
- Is entirely cloud-native
- Is universal - can host any package around the world
- Can work with dependencies located in other repositories and help make what goes into your software more transparent
- Applies Continuous packaging techniques to improve your CI/CD pipeline
- Is a central, trusted store that forces you to apply process and controls to that ingress/egress of software packages
- Is built by a company that values support.