It’s a fairly common perception that a package repository is basically a file share or file storage, and perhaps for some of the most simple implementations, this is a reasonable analogy.
However, when thinking of Cloudsmith, this analogy misses a lot of important details that make Cloudsmith package repositories rather unique.
Synchronization - what is it good for?
If you have used Cloudsmith, you may have noticed that when you publish/upload a package to Cloudsmith or fetch a package from a public package repository (Like Maven Central or PyPi) into Cloudsmith, the first thing that happens is that the package enters a “Synchronizing” state. What exactly is synchronizing?Put simply, synchronization is where Cloudsmith processes the uploaded/ingested package. But that still belies a lot of the detail. What possible processing could a package need? Well, quite a lot actually!
Some of the steps that synchronization/package processing involves are:
- Initialization - Setting up the initial state, tools and environment for synchronization.
- Retrieving - Getting the package files from the upload storage location.
- Assembling - Extracting the package files and package assembly (layers and configs for Docker images, for example)
- Malware Scanning - Once we have the complete package file set, we scan the files for trojans, malicious content etc.
- Parsing - Verify and generate package checksums and signatures, as well as parse and verify package metadata and licenses.
- Final Synchronization - Local and Distributed Storage synchronization.
These processes are automatic, require no user interaction and run asynchronously on Cloudsmith's global infrastructure. They are a large part of what empowers Cloudsmith users to implement effective package controls. As they say, knowledge is power - and it’s by the process of synchronization that we gain the knowledge of packages.
How does this help me?
Once a package has been synchronized, we can then use the metadata generated to apply things like Vulnerability, Licence and Package Deny policies, create a scoped access token, add tags to the package, or fire a webhook for specific packages/versions. The data generated from synchronization drives a lot of the subsequent actions and workflows that you can perform.
Also, you may encounter occasions where a package fails synchronization:
This is typically a good thing (contrary to initial impressions!) because it can alert you to a problem with the package itself such as invalid/missing/incorrect metadata (the package not meeting the specification for the package type, for example), the presence of Malware in the package, or that you are attempting to upload/publish a package that already exists in the repository (as above). Package synchronization is an essential step in verifying the “correctness” of a package and It’s always better to catch things earlier in your processes than later, as the cost of remediation rises dramatically the later issues are identified.
In summary:
Cloudsmith Package Repositories do far more than just store your packages, and they have a lot more functionality than just storing your packages in an AWS S3 bucket or Azure Blob Storage, or spinning up a simplistic instance of a package repository. Packages in Cloudsmith repository are so much more than just “bits on disk”, and treating them as such is really doing them a disservice!