bdio

2.1 SPECIFICATION

Abstract

Black Duck I/O defines a vocabulary and mechanism by which information about software can be transferred between solutions inside and outside of the Black Duck ecosystem.

Introduction

TODO

Actors

The actors in a BDIO system are the “producers”, “publishers”, “consumers” and “processors”. Producers and consumers are concerned with the syntactical structure (e.g. “is the document well formatted according to the specification?”); while publishers and processors are concerned with the semantics of the data itself (e.g. “are all the file nodes connected to a project?”).

Model

Classes

Annotation
https://blackducksoftware.github.io/bdio#Annotation
A descriptor for a BDIO entity.
Component
https://blackducksoftware.github.io/bdio#Component
A component may also be known as a “dependency” or “artifact”. Essentially it is a single BOM entry. A component is the link between two projects (only one of which may be present in the current BDIO context). The link may not be fully defined: only partial information about linkage may be known given the evidence at hand. In addition to establishing a link between two projects, a component can contain additional metadata pertaining to the details of the link: for example, the specific licensing terms used or how a project is using another project (e.g. is linked project used for building, only at runtime or for testing).
A component is also a useful stand-in for a project when it is known the other project exists, but only limited details are available in the current context. For example, it may be useful to create a component for every GAV encountered during processing, those components may be used for linking vulnerabilities even if the full project for that GAV does not exist in the current context.
Container
https://blackducksoftware.github.io/bdio#Container
A container represents a stand-alone software package, including any system software needed for execution.
Dependency
https://blackducksoftware.github.io/bdio#Dependency
A dependency can be added to a project or a component to indicate that it depends on another component.
File
https://blackducksoftware.github.io/bdio#File
A file is used to represent the metadata pertaining to an entry in a (possibly virtual) file system. Files can used to represent any type of file system entry, including regular files, symlinks and directories. The inclusion of directories is optional, i.e. you do not need to include a full directory structure, if no metadata is captured for a directory, then it does not need to be included. All sizes should be represented in bytes (not blocks).
FileCollection
https://blackducksoftware.github.io/bdio#FileCollection
A file collection is used to describe an arbitrary group of files that cannot be better described using another more appropriate construct (like a project).
License
https://blackducksoftware.github.io/bdio#License
A license represents the specific terms under which the use of a particular project (or component) is governed. A project may be linked to multiple licenses with complex relationships between them. A component may similarly be linked to multiple licenses, however none of the relationships may be disjunctive: this ensures that the component unambiguously references the selected license terms. Components which do not reference licenses are assumed to accept the default (and unambiguous) licensing terms of the version of the project they reference.
Note
https://blackducksoftware.github.io/bdio#Note
A note represents the outcome of a specific calculation on part of a file. Notes can be simple (such as inclusion of a content range), or more complex (such as the output of a processing algorithm).
Project
https://blackducksoftware.github.io/bdio#Project
A project represents a software package, typically in source form. For example, a BDIO project should be used to describe each Maven POM file or each Protex project. Projects convey different metadata from “components” (the later of which is just a BOM entry for a project); for example, a project may declare multiple license terms to choose from whereas a component must specify exactly which terms were selected; a project may have many versions, but a component references exactly one. It is always true that a project and component can coexist for the same entity: for example there can be only one “log4j” project while there can be many components describing the usage of “log4j” for other projects.
Repository
https://blackducksoftware.github.io/bdio#Repository
A repository is a collection of software metadata and possibly binary artifacts. Generally speaking a repository is a collection of projects, however it may be useful to enumerate contents using component objects.
Vulnerability
https://blackducksoftware.github.io/bdio#Vulnerability
A vulnerability represents a specific weakness in a project. It is often convenient to reference vulnerabilities from specific project versions or the components linked to those versions. Vulnerabilities may be found through simple look ups based on well know project metadata (e.g. “this version of this project is known to have this vulnerability”); however they may also be discovered through means such as static analysis of source or object code.

Object Properties

affected
https://blackducksoftware.github.io/bdio#hasAffected
Indicates a component or project is affected by a particular vulnerability.
Domain: Vulnerability
Range: Component, Project
base
https://blackducksoftware.github.io/bdio#hasBase
Points to a project’s base directory.
Domain: Container, FileCollection, Project, Repository
Range: File
canonical
https://blackducksoftware.github.io/bdio#hasCanonical
Used to indicate two objects represent the same thing and directs you to the preferred representation.
Domain: Component, License, Vulnerability
Range: <any>
declaredBy
https://blackducksoftware.github.io/bdio#declaredBy
Indicates a component was declared by a specific file.
Domain: Dependency
Range: File
dependency
https://blackducksoftware.github.io/bdio#hasDependency
The list of dependencies.
Domain: Component, Container, FileCollection, Project, Repository
Range: Dependency
dependsOn
https://blackducksoftware.github.io/bdio#dependsOn
Indicates the dependent component.
Domain: Dependency
Range: Component
description
https://blackducksoftware.github.io/bdio#hasDescription
Allows association of arbitrary comments and descriptions.
Domain: Component, Container, Dependency, File, FileCollection, License, LicenseGroup, Project, Repository, Vulnerability
Range: Annotation
evidence
https://blackducksoftware.github.io/bdio#hasEvidence
Indicates a component was discovered using evidence from a specific file.
Domain: Dependency
Range: File
license
https://blackducksoftware.github.io/bdio#hasLicense
The license being used. This can be used in with other license relationships to create complex license expressions.
For root objects, the license defines the terms under which the project may be licensed, for a component, the license defines the term under which usage of the component is licensed.
Domain: Component, Container, Dependency, LicenseGroup, Project
Range: License, LicenseGroup
licenseConjunctive
https://blackducksoftware.github.io/bdio#hasLicenseConjunctive
A simultaneously required license being used. This can be used in with other license relationships to create complex license expressions.
Domain: Component, Container, Dependency, LicenseGroup, Project
Range: License, LicenseGroup
licenseDisjunctive
https://blackducksoftware.github.io/bdio#hasLicenseDisjunctive
A choice of licenses being used. This can be used in with other license relationships to create complex license expressions.
Domain: Component, Container, Dependency, LicenseGroup, Project
Range: License, LicenseGroup
licenseException
https://blackducksoftware.github.io/bdio#hasLicenseException
Identifies an exception to the terms of the license.
Domain: License
Range: License
licenseOrLater
https://blackducksoftware.github.io/bdio#hasLicenseOrLater
The minimal license being used. This can be used in with other license relationships to create complex license expressions.
Domain: Component, Container, Dependency, LicenseGroup, Project
Range: License
note
https://blackducksoftware.github.io/bdio#hasNote
Lists the notes applicable to a file.
Domain: File
Range: Note
parent
https://blackducksoftware.github.io/bdio#hasParent
Points to a file’s parent. Typically this relationship is implicit; producers do not need to supply it.
Domain: File
Range: File
previousVersion
https://blackducksoftware.github.io/bdio#hasPreviousVersion
Links a project version to it’s previous version.
Domain: Project
Range: Project
subproject
https://blackducksoftware.github.io/bdio#hasSubproject
Establishes that a project has a subproject or module relationship to another project.
Domain: Project
Range: Project

Data Properties

archiveContext
https://blackducksoftware.github.io/bdio#hasArchiveContext
The archive context of the file or directory.
Domain: File
Range: Default
buildDetails
https://blackducksoftware.github.io/bdio#hasBuildDetails
The URL used to obtain additional details about the build environment.
Domain: @graph
Range: Default
buildNumber
https://blackducksoftware.github.io/bdio#hasBuildNumber
The build number captured from the build environment.
Domain: @graph
Range: Default
buildOptions
https://blackducksoftware.github.io/bdio#hasBuildOptions
The argument vector of the process that produced a file.
Domain: File
Range: Default
byteCount
https://blackducksoftware.github.io/bdio#hasByteCount
The size (in bytes) of a file.
Domain: File
Range: Long
captureInterval
https://blackducksoftware.github.io/bdio#hasCaptureInterval
The time interval (start and end instant) over which the published data was captured.
Note that due to it’s nature the capture interval may not be known when the named graph metadata is recorded; publishers may choose to include an additional final entry consisting entirely of metadata for this purpose.
Domain: @graph
Range: Default
captureOptions
https://blackducksoftware.github.io/bdio#hasCaptureOptions
The argument vector of publisher process used to capture the data.
Domain: @graph
Range: Default
comment
https://blackducksoftware.github.io/bdio#hasComment
A comment used to annotate a BDIO entity.
Domain: Annotation
Range: Default
contentType
https://blackducksoftware.github.io/bdio#hasContentType
The content type of a file.
Domain: File
Range: ContentType
context
https://blackducksoftware.github.io/bdio#hasContext
The namespace specific base context used to resolve a locator. Typically this is just a URL, however any specification understood by the namespace specific resolver is acceptable.
Domain: Component, License, Project, Repository, Vulnerability
Range: Default
creationDateTime
https://blackducksoftware.github.io/bdio#hasCreationDateTime
The date and time creation of an entity occurred.
Domain: @graph, Annotation, File, Vulnerability
Range: DateTime
creator
https://blackducksoftware.github.io/bdio#hasCreator
The user and/or host who created the BDIO document. The host portion must be prefixed with an “@” sign.
Domain: @graph, Annotation
Range: Default
deepDirectoryCount
https://blackducksoftware.github.io/bdio#hasDeepDirectoryCount
The number of directories that are descendants of the given node.
Domain: File
Range: Long
deepFileCount
https://blackducksoftware.github.io/bdio#hasDeepFileCount
The number of descendant files for the given node.
Domain: File
Range: Long
distanceFromInnerRoot
https://blackducksoftware.github.io/bdio#hasDistanceFromInnerRoot
The distance from inner root to the given node.
Domain: File
Range: Long
distanceFromRoot
https://blackducksoftware.github.io/bdio#hasDistanceFromRoot
The distance from root to the given node.
Domain: File
Range: Long
encoding
https://blackducksoftware.github.io/bdio#hasEncoding
The character encoding of a file. It is required that producers store the encoding independent of the content type’s parameters.
Domain: File
Range: Default
fileSystemType
https://blackducksoftware.github.io/bdio#hasFileSystemType
The file system type of file. Represented as a content-type-like string indicating the type file.
Domain: File
Range: Default
fingerprint
https://blackducksoftware.github.io/bdio#hasFingerprint
The fingerprints of a file.
Domain: File
Range: Digest
homepage
https://blackducksoftware.github.io/bdio#hasHomepage
The homepage associated with the entity.
Domain: Component, License, Project, Vulnerability
Range: Default
identifier
https://blackducksoftware.github.io/bdio#hasIdentifier
The namespace specific locator for a component. Also known as an “external identifier”.
Domain: Component, License, Project, Vulnerability
Range: Default
lastModifiedDateTime
https://blackducksoftware.github.io/bdio#hasLastModifiedDateTime
The date and time a file was last modified.
Domain: File, Vulnerability
Range: DateTime
linkPath
https://blackducksoftware.github.io/bdio#hasLinkPath
The symbolic link target of a file.
Domain: File
Range: Default
name
https://blackducksoftware.github.io/bdio#hasName
The display name of the entity.
Domain: @graph, Component, License, Project, Vulnerability, Repository
Range: Default
namespace
https://blackducksoftware.github.io/bdio#hasNamespace
The namespace a component exists in. Also known as a “forge” or “system type”, this defines how many different fields should be interpreted (e.g. identifiers, versions and scopes are defined within a particular namespace).
Note that namespace values are not part of the BDIO specification. There are BDIO recommendations, however it is ultimately up to the producer and consumer of the BDIO data to handshake on the appropriate rules.
Domain: Component, Container, Dependency, License, Project, Repository, Vulnerability
Range: Default
nodeName
https://blackducksoftware.github.io/bdio#hasNodeName
The name of the node.
Domain: File
Range: Default
parentId
https://blackducksoftware.github.io/bdio#hasParentId
The parent id of the given node.
Domain: File
Range: Long
path
https://blackducksoftware.github.io/bdio#hasPath
The hierarchical path of a file relative to the base directory.
Domain: File
Range: Default
platform
https://blackducksoftware.github.io/bdio#hasPlatform
The platform (e.g. operating system) the data was captured for. This is generally lower level information then can be found in the resolver, e.g. while the resolve might contain tool specific specifiers, the platform would be used to describe the operating system running the tool.
Domain: Component, Container, File, Project, Repository, Vulnerability
Range: Products
project
https://blackducksoftware.github.io/bdio#hasProject
Name of the project this BDIO document is associated with.
Domain: @graph
Range: Default
projectGroup
https://blackducksoftware.github.io/bdio#hasProjectGroup
Name of the project group this BDIO document is associated with.
Domain: @graph
Range: Default
projectVersion
https://blackducksoftware.github.io/bdio#hasProjectVersion
Name of the project version this BDIO document is associated with.
Domain: @graph
Range: Default
publisher
https://blackducksoftware.github.io/bdio#hasPublisher
The tool which published the BDIO document.
Domain: @graph
Range: Products
range
https://blackducksoftware.github.io/bdio#hasRange
The ranges of file content a note applies to. Multiple ranges can be specified, however the units must be distinct (e.g. “bytes” and “chars”).
Domain: Dependency, Note
Range: ContentRange
requestedVersion
https://blackducksoftware.github.io/bdio#hasRequestedVersion
The namespace specific version range that resulted in a component being included.
Domain: Dependency
Range: Default
resolver
https://blackducksoftware.github.io/bdio#hasResolver
The tool which resolved the namespace specific locator.
Domain: Component, License, Project, Repository, Vulnerability
Range: Products
rights
https://blackducksoftware.github.io/bdio#hasRights
The statement of rights for a specific file. Generally this will be a copyright statement like “Copyright (C) 2016 Black Duck Software Inc.”.
Domain: Note
Range: Default
scope
https://blackducksoftware.github.io/bdio#hasScope
The namespace specific scope of a dependency as determined by the resolution tool used to define the dependency. For example, if a dependency came from an npm package’s “devDependencies” field, then the scope should be “devDependencies”.
Domain: Dependency
Range: Default
shallowDirectoryCount
https://blackducksoftware.github.io/bdio#hasShallowDirectoryCount
The number of directories that are direct children of given node.
Domain: File
Range: Long
sourceBranch
https://blackducksoftware.github.io/bdio#hasSourceBranch
The SCM branch name from the build environment.
Domain: @graph
Range: Default
sourceRepository
https://blackducksoftware.github.io/bdio#hasSourceRepository
The URI representing the SCM location from the build environment.
Domain: @graph
Range: Default
sourceRevision
https://blackducksoftware.github.io/bdio#hasSourceRevision
The SCM revision identifier from the build environment.
Domain: @graph
Range: Default
sourceTag
https://blackducksoftware.github.io/bdio#hasSourceTag
The SCM tag name from the build environment.
Domain: @graph
Range: Default
uri
https://blackducksoftware.github.io/bdio#hasUri
The uri of the file or directory.
Domain: File
Range: Default
vendor
https://blackducksoftware.github.io/bdio#hasVendor
The name of the vendor who provides a project or component.
Domain: Component, Project
Range: Default
version
https://blackducksoftware.github.io/bdio#hasVersion
The display version of the entity. Must reference a single version.
Domain: Component, Project
Range: Default

Datatypes

ContentRange
https://blackducksoftware.github.io/bdio#ContentRange
An HTTP Content Range string.
ContentType
https://blackducksoftware.github.io/bdio#ContentType
An Http Content Type string.
DateTime
http://www.w3.org/2001/XMLSchema#dateTime
ISO date/time string.
Default
""
Unrestricted string value.
Digest
https://blackducksoftware.github.io/bdio#Digest
A string that encapsulates an algorithm name and an unrestricted digest value.
Long
http://www.w3.org/2001/XMLSchema#long
Natural number.
Products
https://blackducksoftware.github.io/bdio#Products
An HTTP User Agent string.

Semantic Rules

When processing the BDIO model, there are several defined behaviors that should uniformly apply to ensure interoperability. Often it is impractical for publishers to produce fully normal BDIO model, therefore several relationships are expected to be “implicit”, processors MUST handle BDIO data the same regardless of their presence. The published BDIO data in conjunction with any implicit relationships constitutes a connected graph.

The Root

The “root” of the BDIO data is a top-level object (one of: “Project”, “Container”, “Repository” or “FileCollection”). For a project, the root project cannot be claimed as a sub-project or previous version by any other project. Publishers MUST NOT produce BDIO data with multiple roots; processors MAY elect an arbitrary object as the root or fail outright when the root could be ambiguous.

Implicit Relationships

Missing File Parents

The relationship between a file node and it’s parent need not be explicitly defined provided that the resolved absolute hierarchical path can be used to unambiguously identify the parent file.

Missing Project Dependencies

The relationship between a component and the root object need not be explicity defined: any component that is not associated with a dependency is assumed to be a dependency of the root object. Processors MUST NOT assume any default values for the implicit dependency node necessary to describe the connection between the component and the root object.

Namespaced Properties

Interpretation

BDIO properties which are subject to namespacing must be interpreted using rules specific to the namespace itself. It is soley the responsibility of the producer and consumer to negotiate namespace tokens and the corresponding property interpretation.

Inheritance

When a BDIO node does not explicitly define a namespace it is inferred by following relationships back to a root object (providing the root object supports the “namespace” property); the first encountered explicit namespace definition becomes the effective namespace of the node.

File Data Properties

When describing file or resources in BDIO, publishers SHALL adhere to the guidelines in this section: this ensures data size is minimized and interpretation by processors can be consistent. BDIO files can be used to describe files on a file system (real or virtual), entries within an archive, resources from the web or any other entity which can be described as hierarchal structure of named pieces of data.

File Paths

BDIO files must be represented as a tree-like hierarchy starting with the base file identified by the root object. The file’s “path” property is used to identify the location within the hierarchy. The path MUST be a valid absolute URI. Generally this will be an RFC 8089 “file” URI representing an absolute path to a file on the file system, however it may be any URI. Special considerations are necessary for the URI representing an archive entry: archive entry paths must be encoding using the pattern:

<scheme> ":" <archive-URI> "#" <entry-name>

It is possible to encode archive entry paths at arbitrary “nesting levels” (i.e. archives within archives) by recursively applying this pattern, however, it is important when constructing archive entry path URIs that the URI of the archive itself be encoded (including any “/” characters). As archive nesting levels increase, so will the encodings (e.g. a “/” will initially be encoded as “%2F”, at the next nesting level it will appear as “%252F”). All URI encodings MUST be performed on NFC normalized UTF-8 encoded byte sequences. Trailing slashes and files named “.” or “..” MUST be omitted. The scheme should be used to identify the archive format. Archive entry names SHOULD start with a “/”, however there are cases where the entry name can be used without the leading slash (refer to [Appendix C][#common-file-path-archive-schemes] for additional details).

File Types

Metadata regarding how the content of a file should be (or was) interpreted is split over several data properties in BDIO. The file system type is defined the role a file plays in the structure of the file hierarchy, it should be obtained either directly from the file system or from archive metadata used to preserve the original file system structure. The content type describes how the contents of the file should be interpreted, often times this information is inferred from the file name (e.g. extension matching); however producers may know the type as a consequence of processing the file. Publishers MUST NOT include a content type obtained through simple file name matching as this work can be done by a processor later; if content type is represented in metadata using a different format (e.g. using extended file attributes), or if the content type was determined based on some specific processing performed on the actual contents of the file, it is appropriate to include the content type, even if the result would be the same as computed through standard name mapping. Text based content SHOULD include the encoding (this information MUST NOT be included in the content type); it is possible that the encoding can be determined without a content type. Symbolic links can be recorded using the link path: the value is the same format described for file paths.

Processors MAY imply the file system type according the following rules, publishers MUST NOT generate conflicting data and consumers SHOULD reject data containing conflicts:

  1. If another file references the file as a parent…
    • A byte count or content type implies a file system type of directory/archive
    • Otherwise the implied file system type is directory
  2. A link path implies a file system type of symlink
  3. An encoding implies a file system type of regular/text
  4. Processors MAY apply implementation specific content type or file path hueristics to determine the file system type, otherwise the implied file system type is regular

License Properties

The terms under which a project is licensed and the terms under which a component is used are described using the simple license relationships license and licenseOrLater or the complex license relationships licenseDisjunction, licenseConjunction and licenseException. Additionally, an intermediate LicenseGroup node may be embedded to avoid ambiguity when using several licenses.

When using simple license relationships, they MUST specify a single license; multiple complex license relationships may be used, however if more then two are in use, they MUST specify the same relationship. Publishers MUST NOT mix simple and complex license relationships. If multiple complex relationships are used, exceptions are considered first and conjunction takes precedence over disjunction.

Preservation of Unknown Data

Publishers may choose to include data in a BDIO data set that is not part of the BDIO model, this data will come in the form of JSON-LD types or terms not defined by this specification. Processors MUST preserve this unknown data when handling BDIO, however any data which is not reachable from the root project SHOULD be ignored and does not need to preserved.

Document Format

BDIO data can be transferred using one of four different formats depending on the capabilities of the parties involved and volume of data. Any JSON data being transferred MAY be pretty printed, it makes human consumption easier and has minimal impact on data size when compression is being used. BDIO data MUST be expressed as a named graph, the graph’s label is used to uniquely identify the source of the data.

For all supported formats, characters MUST be encoded using UTF-8. Consumers MUST NOT accept alternate character encodings.

JSON-LD

The default format for BDIO data is JSON-LD 1.0.

Producers MAY use remote JSON-LD contexts. Compliant consumers MAY operate in “offline mode”, restricting access to remote JSON-LD context documents. When operating in this mode, an offline cached copy of the following JSON-LD context documents MUST be made available during processing: https://blackducksoftware.github.io/bdio

Both compliant consumers and producers SHOULD use GZIP content encoding when transferring JSON-LD data. Note that content encoding support for HTTP requests is often not offered by default. Producers MUST create JSON-LD data whose content is strictly less then 16MB; consumers SHOULD fail when presented with a JSON-LD file in excess of this limit. The size limit is applied on presentation and MUST NOT account for effects of JSON-LD normalization though the application of compaction or expansion algorithms.

JSON-LD data should be given the file extension .jsonld and should be transferred using the content type application/ld+json (only in accordance with the JSON-LD Specification).

JSON

While all JSON-LD is technically JSON, it is often convenient to simplify the syntax using JSON-LD compaction. The default context for BDIO documents may be referenced as a remote context using the IRI https://blackducksoftware.github.io/bdio. Compliant consumers SHOULD include an offline copy of this context to avoid network traffic when processing JSON-LD data.

The Link header should be honored on both requests and responses per the JSON-LD specification.

As with JSON-LD, GZIP content encoding SHOULD be used when transferring JSON data. Furthermore, the same 16MB size limit applies to the JSON data as presented to the consumer.

JSON data should be given the file extension .json and should be transferred using either the content type application/json (preferred when using explicit links to context data) or application/vnd.blackducksoftware.bdio+json (when the use of the default BDIO context is implied).

BDIO Document

In addition to transferring raw JSON-LD or a JSON representation of linked data, there are some limitations for which a separate format makes sense. For example, server support for Content-Encoding and Link request headers cannot always be guaranteed. The BDIO Document format is intended to overcome some of these limitations in a non-invasive manor.

A “BDIO Document” is a Zip file consisting of any number of JSON-LD files, or “entries”. Entries are individually compressed per the Zip specification, BDIO Document producers SHOULD use the DEFLATE compression method. The uncompressed size of each entry MUST be strictly less then 16MB. Compliant consumers SHOULD process entries in “appearance order”, that is, the order in which they were added to the Zip archive. Entry names MUST have a “.jsonld” suffix and SHOULD NOT contain the “/” character. Additional non-JSON-LD files MAY be included in the archive.

Each JSON-LD entry MUST be self-contained; producers SHOULD use the “expanded” form but may use other forms so long as full context information is included. Additionally, the same named graph label MUST be used for every entry. Unlike the other formats, compliant consumers are not required to provide any online connectivity or cached context content when processing BDIO Documents. Linked data nodes which span multiple JSON-LD entries MUST have an @id and @type specification in every JSON-LD node.

To simplify BDIO Document processing, several restrictions are placed on the Zip format used as the primary container. BDIO Documents MUST NOT contain header or footer data, that is the file should begin with a local file header and end with the Zip end of central directory record. Additionally BDIO Documents MUST NOT contain entries which are not listed in the central directory, this restriction allows BDIO Documents to be processed using stream based tools.

BDIO Documents should be given the file extension .bdio and should be transferred as binary data using the content type application/vnd.blackducksoftware.bdio+zip.

Appendix A: Namespace Recommendations

It is impractical for this specification to absolutely define all of the available namespaces and their rules, they constantly change as new tools are introduced and as people remember how to use old tools. The following non-normative recommendations for publishers and processors serve only as a guideline to what could be implemented. These recommendations are subject to change and ultimately it is the responsibility of the publishers and processors to agree on the namespace values and the interpretation of the field values.

TODO This section is under development in the Proposed Namespace Values area.

Appendix B: Identifier Guidelines

Selection of proper identifiers is imperative to the proper construction of a BDIO data set.

TODO Suggest identifiers to use in specific situations, include use of “mvn:” and “urn:uuid:” URIs; avoid “data:” and “about:”…

Appendix C: File Data

File System Types

This section is normative.

File System Type Description
regular A regular file, typically of unknown content.
regular/binary A regular file with executable content.
regular/text A regular file with known text content. Should be accompanied by an encoding.
directory A directory entry which may or may not contain children.
directory/archive An archive which may or may not contain children. Can also be treated as a regular file.
symlink A symbolic link. Should be accompanied by a link target.
other/device/block A block device, like a drive.
other/device/character A character device, like a terminal.
other/door A door used for interprocess communication.
other/pipe A named pipe.
other/socket A socket.
other/whiteout A whiteout, or file removal in a layered file system.

NOTE: File system types are compared case-insensitively.

Algorithm Expected Hex String Length Description
md5 32 MD5
sha1 40 SHA-1
sha256 64 SHA-2, SHA-256

NOTE: Algorithms are compared case-insensitively.

Scheme Slash Required Description
zip No ZIP
jar No Java Archive
tar No Tape Archive
rpm No RPM Package Manager
ar No Unix archiver
arj No ARJ archives
cpio No Copy in and out
dump No Unix dump
sevenz No 7-Zip
rar Yes Roshal Archive
xar Yes Extensible Archive
phar Yes PHP Archive
cab Yes Cabinet
unknown Yes Used when the actual scheme is not known

NOTE: File extensions and/or compression formats are not accounted for using the scheme, e.g. a file with the extension “.tgz” or “.tar.gz” still has a scheme of “tar”.

Appendix D: Content Types

Content Type Extension Description
application/ld+json jsonld The content type used when BDIO data is represented using JSON-LD. The context will either be referenced as a remote document or will be explicitly included in the content body. Note that only UTF-8 character encoding is allowed.
application/json json The content type used when BDIO data is represented using plain JSON and the context is specified externally (e.g. using the Link header). Note that only UTF-8 character encoding is allowed.
application/vnd.blackducksoftware.bdio+json json The content type used when BDIO data is represented using plain JSON that should be interpreted using the default BDIO context.
application/vnd.blackducksoftware.bdio+zip bdio The content type used when BDIO data is represented as self-contained JSON-LD stored in a ZIP archive.