Blog

What happens when you upload a Package?

Oct 23 2024/Cloudsmith/2 min read
Picture of Dan McKinney
by Dan McKinney
Packages in Cloudsmith repository are so much more than just “bits on disk”, and treating them as such is really doing them a disservice! 

It’s a fairly common perception that a package repository is basically a file share or file storage, and perhaps for some of the most simple implementations, this is a reasonable analogy. 

However, when thinking of Cloudsmith, this analogy misses a lot of important details that make Cloudsmith package repositories rather unique. 

Synchronization - what is it good for? 

If you have used Cloudsmith, you may have noticed that when you publish/upload a package to Cloudsmith or fetch a package from a public package repository (Like Maven Central or PyPi) into Cloudsmith, the first thing that happens is that the package enters a “Synchronizing” state. What exactly is synchronizing?Put simply, synchronization is where Cloudsmith processes the uploaded/ingested package. But that still belies a lot of the detail. What possible processing could a package need? Well, quite a lot actually! 

Some of the steps that synchronization/package processing involves are: 

  • Initialization - Setting up the initial state, tools and environment for synchronization. 
  • Retrieving - Getting the package files from the upload storage location.
  • Assembling - Extracting the package files and package assembly (layers and configs for Docker images, for example)
  • Malware Scanning - Once we have the complete package file set, we scan the files for trojans, malicious content etc. 
  • Parsing - Verify and generate package checksums and signatures, as well as parse and verify package metadata and licenses.
  • Final Synchronization - Local and Distributed Storage synchronization. 

These processes are automatic, require no user interaction and run asynchronously on Cloudsmith's global infrastructure. They are a large part of what empowers Cloudsmith users to implement effective package controls. As they say, knowledge is power - and it’s by the process of synchronization that we gain the knowledge of packages. 

How does this help me? 

Once a package has been synchronized, we can then use the metadata generated to apply things like Vulnerability, Licence and Package Deny policies, create a scoped access token, add tags to the package, or fire a webhook for specific packages/versions. The data generated from synchronization drives a lot of the subsequent actions and workflows that you can perform.  

Also, you may encounter occasions where a package fails synchronization:

This is typically a good thing (contrary to initial impressions!) because it can alert you to a problem with the package itself such as invalid/missing/incorrect metadata (the package not meeting the specification for the package type, for example), the presence of Malware in the package, or that you are attempting to upload/publish a package that already exists in the repository (as above). Package synchronization is an essential step in verifying the “correctness” of a package and It’s always better to catch things earlier in your processes than later, as the cost of remediation rises dramatically the later issues are identified.

In summary: 

Cloudsmith Package Repositories do far more than just store your packages, and they have a lot more functionality than just storing your packages in an AWS S3 bucket or Azure Blob Storage, or spinning up a simplistic instance of a package repository.  Packages in Cloudsmith repository are so much more than just “bits on disk”, and treating them as such is really doing them a disservice! 

Get our next blog straight to your inbox