PkgToSoftwareBOM.jl

Produces a Software Bill of Materials (SBOM) describing your Julia Pkg environment. SBOM is in the SPDX format
Author SamuraiAku
Popularity
16 Stars
Updated Last
6 Days Ago
Started In
March 2023

PkgToSoftwareBOM.jl

GitHub Actions codecov PkgToSoftwareBOM Downloads

This package produces a Software Bill of Materials (SBOM) describing your Julia environment. At this time, the SBOM produced is in the (SPDX) format. Contributions to support other SBOM formats are welcome.

I created PkgToSoftwareBOM.jl to help the Julia ecosystem get prepared for the emerging future of software supply chain security. If we want to see Julia adoption to continue to grow, then we need to be able to easily create SBOMs to supply to the organizations using Julia packages.

PkgToSoftwareBOM interfaces with the standard library Pkg to fill in the SBOM data fields. Information filled out today includes:

  • A complete package dependency list including
    • versions in use
    • where the package can be downloaded from
    • SPDX verification code
    • determines the declared license and scans all source files for additional licenses present
  • A complete artifact list
    • artifact version resolved to the target platform
      • target platform may be changed by an advanced user
    • where the artifact can be downloaded from
    • download checksum
    • determines the declared license and scans all source files for additional licenses present

Future versions may be able to fill in additional fields including copyright text.

PkgToSoftwareBOM defaults to using the General registry but can use other registries and even mutiple registries as the source(s) of package information.

What is an SBOM?

An SBOM is a formal, machine-readable inventory of software components and dependencies, information about those components, and their hierarchical relationships. These inventories should be comprehensive – or should explicitly state where they could not be. SBOMs may include open source or proprietary software and can be widely available or access-restricted.

For a further information about SBOMs, their importance and how they can be used please see the Software Bill of Materials website maintained by the National Telecommuications and Information Administration

Why do you care about SBOMs?

SBOMs are an important component of developing software security practices. US Presidential Executive Order EO 14028 established SBOMs as one method by which the federal government will establish the provenence of software in use. Commercial organizations are also using SBOMs for the same reason.

What does an SBOM look like?

The file PkgToSoftwareBOM.spdx.json at the root of this package is a Developer SBOM of this package.

See examples of User Environment SBOMs in the folder examples

Installation

Type ] add PkgToSoftwareBOM and then hit ⏎ Return at the REPL. You should see

pkg> add PkgToSoftwareBOM

How to I use PkgToSoftwareBOM.jl ?

To use this package, just type

using PkgToSoftwareBOM

PkgToSofwareBOM automatically exports the package SPDX which defines the SBOM datatypes and functions for reading and writing. Please see the SPDX documentation for full documentation.

There are two use cases envisioned:

  • Users: Create an SBOM of your current environment. Submit this file to your organization
  • Developers: Create an SBOM to be included with your package source code. This becomes your official declaration of what your package dependencies, copyright, license, and download location.

!!! Important note about stability and license scanning !!!

PkgToSoftwareBOM uses LicenseCheck.jl to scan package and artifact directories for license file information. LicenseCheck has been known to occasionally crash when run on Apple Silicon, see Issue #11. I have observed it happening every time when run within VSCode with the julia-vscode extension. There are some early indications this issue may be resolved in Julia 1.11 when it is released, but it is not certain yet.

If you wish to disable license scanning for stability reasons, use the keyword licenseScan when creating a spdxCreationData object (see examples below)

spdxCreationData(licenseScan= false)

User Environment SBOM

To create an SBOM of your entire environment type:

sbom= generateSPDX()

If you wish to not include PkgToSoftwareBOM and SPDX (or some other package) in your SBOM:

sbom= generateSPDX(spdxCreationData(rootpackages= filter(p-> !(p.first in ["PkgToSoftwareBOM", "SPDX"]), Pkg.project().dependencies)));

To write the SBOM to file:

writespdx(sbom, "myEnvironmentSBOM.spdx.json")

Developer SBOM

A developer SBOM will contain information that PkgToSoftwareBOM cannot determine on its own, such as the package's license and the developer's name. To add this information to the SBOM requires calling generateSPDX() with non-default parameters.

The first thing a developer must determine is the name of their SBOM file. The reason is that PkgToSoftwareBOM computes a checksum of your source code and saves it in the SBOM file. Since the SBOM is included in the package's source tree PkgToSoftwareBOM must know what the file's name is so that it (and others who wish to re-compute the checksum themselves) can skip the file during the calculation. A suggested format for the SBOM filename is MyPackageName.spdx.json

By default, PkgToSoftwareBOM will exclude the .git folder in your package development directory from the checksum calcuclation. PkgToSoftwareBOM does not process the directions of `.gitignore' files nor does it ignore untracked files. It is recommended that you make sure to commit all your code and restore the repo to a pristine state before running PkgToSoftwareBOM

% git clean -fdx
% git status
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Now we need to make sure that Pkg is aware that you have bumped the package version. Pkg does not detect the change to version in Project.toml automatically. While you are developing you generally don't care about this, but it is necessary to get the correct version information into the SBOM

pkg> update myPackage
    Updating registry at `~/.julia/registries/General.toml`
    Updating `~/JuliaWork/myDevArea/Project.toml`
  [6254a0f9] ~ myPackage v0.1.0 `~/.julia/dev/myPackage`  v0.1.1 `~/.julia/dev/myPackage`
    Updating `~/JuliaWork/myDevArea/Manifest.toml`
  [6254a0f9] ~ myPackage v0.1.0 `~/.julia/dev/myPackage`  v0.1.1 `~/.julia/dev/myPackage`

Once you have the filename chosen and your repository cleaned up, activate an environment that includes your under development package

julia> cd("path/to/dev_area")
(@v1.8) pkg> activate .

To create your SBOM, start by creating an spdxPackageInstructions object which contains SBOM data specific to the package

# Indicate who you wish to credit as the Originator of the package. For Julia developers, this is generally
# whoever controls the repository the released code is downloaded from. The originator may be a person or an organization
myName= SpdxCreatorV2("Person", "John Doe", "email@loopback.com")  # email may be an empty string if desired
myOrg= SpdxCreatorV2("Organization", "Open-Source Org", "email2@loopback.com")

# For the complete list of available license codes for an SBOM, see the official SPDX License List
#       https://spdx.org/licenses/
# Common Licenses are "MIT", "BSD-3-Clause", "GPL-2.0-or-later"
myLicense= SpdxLicenseExpressionV2("MIT")


myPackage_instr= spdxPackageInstructions(
       spdxfile_toexclude= ["MyPackageName.spdx.json"],
       originator= myName,  # Could be myOrg if appropriate
       declaredLicense= myLicense,
       copyright= "Copyright (c) 2022 John Doe <email@loopback.com> and contributors",
       name= "MyPackageName")

The next step is to create an spdxCreationData object which contains data for the top-level SBOM structure

using UUIDs
using Pkg
# Indicate who you wish to credit as creator of this SBOM, whether it is a single person
# or an organization or both. You may credit multiple people and organizations as necessary.
# Including emails in the creator declaration is optional
# Since PkgToSoftwareBOM is filling in most of the document, you can credit the tool as one of the creators as well
myName= SpdxCreatorV2("Person", "John Doe", "email@loopback.com")  # email may be an empty string if desired
myOrg= SpdxCreatorV2("Organization", "Open-Source Org", "email2@loopback.com")
myTool= SpdxCreatorV2("Tool", "PkgToSoftwareBOM.jl", "")

devRoot= filter(p-> p.first == "MyPackageName", Pkg.project().dependencies) # A developer SBOM has a single package at its root

# SPDX namespace provides a unique URI identifier for the SBOM. Best practice, which PkgToSoftwareBOM supports, is to
# provide a URL to this SBOM in the package repository or to a project homepage.
# PkgToSoftwareBOM will append a unique UUID so that the namespace is truly unique.
myNamespace= "https://github.com/myUserName/myPackage.jl/myPackage.spdx.json"

active_pkgs= Pkg.project().dependencies;
SPDX_docCreation= spdxCreationData(
       Name= "MyPackageName.jl Developer SBOM",
       Creators= [myName, myOrg, myTool],
       CreatorComment= "Optional field for general comments about the creation of the SPDX document",
       DocumentComment= "Optional field for to provide comments to the consumers of the SPDX document",
       NamespaceURL= myNamespace,
       rootpackages= devRoot,
       packageInstructions= Dict{UUID, spdxPackageInstructions}(active_pkgs[myPackage_instr.name] => myPackage_instr)  # Your package instructions created above go here
       );

Now you can create the SBOM and write it to your development directory.

sbom= generateSPDX(SPDX_docCreation)
writespdx(sbom, "path/to/package/source/MyPackageName.spdx.json")

One case that PkgToSoftwareBOM does not support properly today is when a previous version of the developer's package does not exist in the registry. In that case, the SBOM will list the path to the local copy of the package code, instead of the URL of the repository. This may be fixed in a later version.

Optional Modes

PkgToSoftwareBOM has keywords that can be invoked with spdxCreationData(). These keywords modify the contents of the SBOM in ways that are useful in particular situations

Use a package server as the DownloadLocation

The package developer's GitHub (or other) repository is the canonical source for the package code. By default, this repository is used to populate the field DownloadLocation in each package description.

But in everyday use, very few people actually download from there. Instead Pkg defaults to using the package server maintained by JuliaLang (https://pkg.julialang.org) or another package server specified by ENV["JULIA_PKG_SERVER"]. A package server maintains compressed tarballs of released source code for packages tracked by a registry. So you can argue that the SBOM should reflect that in the name of accuracy.

Also not every analyst would find it useful to be directed to the repo and then be expected to figure out how to use git to extract the correct version. A straight download location could be easier for them.

The user can change the DownloadLocation to the package server through the use of the keyword use_packageserver when creating a spdxCreationData object (see example below)

spdxCreationData(use_packageserver= true)

When this keyword is used, PkgToSoftwareBOM will determine if each package has a valid package server URL and use it if available. If the JuliaLang package server is used, then the package Supplier field will be updated to reflect that.

Organization:  JuliaLang  ()

If a valid package server URL cannot be determined, then the repository link will be used.

In all cases, the repository URL is documented in the HomePage field of the package description.

Find the source code repository for artifacts used in JLLs

In the general case, it is impossible to find the source code of an artifact solely from the Julia package it is used in.

However probably the majority of artifacts in use today are wrapped inside Julia Linked Library (JLL) packages generated by BinaryBuilder.jl. This tool builds the artifacts using Yggdrasil the Julia community build tree. BinaryBuilder then wraps the artifact inside an autogenerated Julia Linked Library (JLL) package. That registered package contains known sentence patterns and hyperlinks in its README back to the branch in Yggdrasil that generated the artifacts. PkgToSoftwareBOM can extract this information from the README and create an entry in the SBOM showing that the artifact was GENERATED_FROM Yggdrasil.

The user can optionally invoke this capability through the use of the keyword find_artifactsource when creating a spdxCreationData object (see example below)

spdxCreationData(find_artifactsource= true)

If PkgToSoftwareBOM cannot determine the source of an artifact, an entry will not be created.

How does PkgToSoftwareBOM support mulitple registries?

The majority of users and developers only ever use the General registry and that is what PkgToSoftwareBOM defaults to to find package information.

If you would like to use a different registry or search multiple registries, you just call generateSPDX with two arguments.

For example to create a User Environment SBOM using the General registry and another registry called "PrivateRegistry", type:

sbom= generateSPDX(spdxCreationData(), ["PrivateRegistry", "General"]);

The second argument is a list of all the registries you would like to use. If you have a package that exists in both registries (for example, you've cloned the respository to your local network and you want to list that as the download location), PkgToSoftwareBOM will use the information from the first registry in the list that has valid information and ignore all subsequent registries

How does PkgToSoftwareBOM determine what the license of the package or artifact is?

PkgToSoftwareBOM scans the entire julia package or artifact for license information. If the scanning locates a file containing a recognized software license, the license is recorded in the LicenseInfoFromFiles property of the SBOM package description but does not record which file(s) the license was found in. The license scan follows these rules (LicenseCheck.jl, version 0.2.2)

  • All plaintext files less than 450 KB are scanned

During that search PkgToSoftwareBOM looks for an overall package license in the following locations:

  • For Julia packages, in the package root directory
  • For artifacts, in the root directory and in the directory share/licenses

If files with a valid license are found in the expected location, PkgToSoftwareBOM declares the file where the license takes up the greatest percentage of the total file to be the package license, as you would expect a package license to contain only the license text and nothing else.

How does PkgToSoftwareBOM target hardware platforms other than the one it is running on?

Advanced users may wish to create an SBOM in which the artifacts are targeted to a different platform than the one that PkgToSoftwareBOM is running on. For example, create an SBOM for an x86 linux installation from an M1 Macbook.

To do this, the user must first create a platform object describing the target platform. For example, to create a platform object for the hardware you are currently running on:

using Base.BinaryPlatforms
myplatform= HostPlatform()

Creating a platform object for other hardware is left as an exercise for the advanced user.

To pass the platform object to PkgToSoftwareBOM, use the keyword TargetPlatform when creating an spdxCreationData object

SPDX_docCreation= spdxCreationData(TargetPlatform= myplatform)

Used By Packages

No packages found.