Black Duck I/O defines a vocabulary and mechanism by which information about software can be transferred between solutions inside and outside of the Black Duck ecosystem.
TODO
The actors in a BDIO system are the “producers”, “publishers”, “consumers” and “processors”. Producers and consumers are concerned with the syntactical structure (e.g. “is the document well formatted according to the specification?”); while publishers and processors are concerned with the semantics of the data itself (e.g. “are all the file nodes connected to a project?”).
Annotation
https://blackducksoftware.github.io/bdio#Annotation
Component
https://blackducksoftware.github.io/bdio#Component
Container
https://blackducksoftware.github.io/bdio#Container
Dependency
https://blackducksoftware.github.io/bdio#Dependency
File
https://blackducksoftware.github.io/bdio#File
FileCollection
https://blackducksoftware.github.io/bdio#FileCollection
License
https://blackducksoftware.github.io/bdio#License
Note
https://blackducksoftware.github.io/bdio#Note
Project
https://blackducksoftware.github.io/bdio#Project
Repository
https://blackducksoftware.github.io/bdio#Repository
Vulnerability
https://blackducksoftware.github.io/bdio#Vulnerability
affected
https://blackducksoftware.github.io/bdio#hasAffected
Vulnerability
Component
, Project
base
https://blackducksoftware.github.io/bdio#hasBase
Container
, FileCollection
, Project
, Repository
File
canonical
https://blackducksoftware.github.io/bdio#hasCanonical
Component
, License
, Vulnerability
<any>
declaredBy
https://blackducksoftware.github.io/bdio#declaredBy
Dependency
File
dependency
https://blackducksoftware.github.io/bdio#hasDependency
Component
, Container
, FileCollection
, Project
, Repository
Dependency
dependsOn
https://blackducksoftware.github.io/bdio#dependsOn
Dependency
Component
description
https://blackducksoftware.github.io/bdio#hasDescription
Component
, Container
, Dependency
, File
, FileCollection
, License
, LicenseGroup
, Project
, Repository
, Vulnerability
Annotation
evidence
https://blackducksoftware.github.io/bdio#hasEvidence
Dependency
File
license
https://blackducksoftware.github.io/bdio#hasLicense
Component
, Container
, Dependency
, LicenseGroup
, Project
License
, LicenseGroup
licenseConjunctive
https://blackducksoftware.github.io/bdio#hasLicenseConjunctive
Component
, Container
, Dependency
, LicenseGroup
, Project
License
, LicenseGroup
licenseDisjunctive
https://blackducksoftware.github.io/bdio#hasLicenseDisjunctive
Component
, Container
, Dependency
, LicenseGroup
, Project
License
, LicenseGroup
licenseException
https://blackducksoftware.github.io/bdio#hasLicenseException
License
License
licenseOrLater
https://blackducksoftware.github.io/bdio#hasLicenseOrLater
Component
, Container
, Dependency
, LicenseGroup
, Project
License
note
https://blackducksoftware.github.io/bdio#hasNote
File
Note
parent
https://blackducksoftware.github.io/bdio#hasParent
File
File
previousVersion
https://blackducksoftware.github.io/bdio#hasPreviousVersion
Project
Project
subproject
https://blackducksoftware.github.io/bdio#hasSubproject
Project
Project
archiveContext
https://blackducksoftware.github.io/bdio#hasArchiveContext
File
Default
buildDetails
https://blackducksoftware.github.io/bdio#hasBuildDetails
@graph
Default
buildNumber
https://blackducksoftware.github.io/bdio#hasBuildNumber
@graph
Default
buildOptions
https://blackducksoftware.github.io/bdio#hasBuildOptions
File
Default
byteCount
https://blackducksoftware.github.io/bdio#hasByteCount
File
Long
captureInterval
https://blackducksoftware.github.io/bdio#hasCaptureInterval
@graph
Default
captureOptions
https://blackducksoftware.github.io/bdio#hasCaptureOptions
@graph
Default
comment
https://blackducksoftware.github.io/bdio#hasComment
Annotation
Default
contentType
https://blackducksoftware.github.io/bdio#hasContentType
File
ContentType
context
https://blackducksoftware.github.io/bdio#hasContext
Component
, License
, Project
, Repository
, Vulnerability
Default
creationDateTime
https://blackducksoftware.github.io/bdio#hasCreationDateTime
@graph
, Annotation
, File
, Vulnerability
DateTime
creator
https://blackducksoftware.github.io/bdio#hasCreator
@graph
, Annotation
Default
deepDirectoryCount
https://blackducksoftware.github.io/bdio#hasDeepDirectoryCount
File
Long
deepFileCount
https://blackducksoftware.github.io/bdio#hasDeepFileCount
File
Long
distanceFromInnerRoot
https://blackducksoftware.github.io/bdio#hasDistanceFromInnerRoot
File
Long
distanceFromRoot
https://blackducksoftware.github.io/bdio#hasDistanceFromRoot
File
Long
encoding
https://blackducksoftware.github.io/bdio#hasEncoding
File
Default
fileSystemType
https://blackducksoftware.github.io/bdio#hasFileSystemType
File
Default
fingerprint
https://blackducksoftware.github.io/bdio#hasFingerprint
File
Digest
homepage
https://blackducksoftware.github.io/bdio#hasHomepage
Component
, License
, Project
, Vulnerability
Default
identifier
https://blackducksoftware.github.io/bdio#hasIdentifier
Component
, License
, Project
, Vulnerability
Default
lastModifiedDateTime
https://blackducksoftware.github.io/bdio#hasLastModifiedDateTime
File
, Vulnerability
DateTime
linkPath
https://blackducksoftware.github.io/bdio#hasLinkPath
File
Default
name
https://blackducksoftware.github.io/bdio#hasName
@graph
, Component
, License
, Project
, Vulnerability
, Repository
Default
namespace
https://blackducksoftware.github.io/bdio#hasNamespace
Component
, Container
, Dependency
, License
, Project
, Repository
, Vulnerability
Default
nodeName
https://blackducksoftware.github.io/bdio#hasNodeName
File
Default
parentId
https://blackducksoftware.github.io/bdio#hasParentId
File
Long
path
https://blackducksoftware.github.io/bdio#hasPath
File
Default
platform
https://blackducksoftware.github.io/bdio#hasPlatform
Component
, Container
, File
, Project
, Repository
, Vulnerability
Products
project
https://blackducksoftware.github.io/bdio#hasProject
@graph
Default
projectGroup
https://blackducksoftware.github.io/bdio#hasProjectGroup
@graph
Default
projectVersion
https://blackducksoftware.github.io/bdio#hasProjectVersion
@graph
Default
publisher
https://blackducksoftware.github.io/bdio#hasPublisher
@graph
Products
range
https://blackducksoftware.github.io/bdio#hasRange
Dependency
, Note
ContentRange
requestedVersion
https://blackducksoftware.github.io/bdio#hasRequestedVersion
Dependency
Default
resolver
https://blackducksoftware.github.io/bdio#hasResolver
Component
, License
, Project
, Repository
, Vulnerability
Products
rights
https://blackducksoftware.github.io/bdio#hasRights
Note
Default
scope
https://blackducksoftware.github.io/bdio#hasScope
Dependency
Default
shallowDirectoryCount
https://blackducksoftware.github.io/bdio#hasShallowDirectoryCount
File
Long
sourceBranch
https://blackducksoftware.github.io/bdio#hasSourceBranch
@graph
Default
sourceRepository
https://blackducksoftware.github.io/bdio#hasSourceRepository
@graph
Default
sourceRevision
https://blackducksoftware.github.io/bdio#hasSourceRevision
@graph
Default
sourceTag
https://blackducksoftware.github.io/bdio#hasSourceTag
@graph
Default
uri
https://blackducksoftware.github.io/bdio#hasUri
File
Default
vendor
https://blackducksoftware.github.io/bdio#hasVendor
Component
, Project
Default
version
https://blackducksoftware.github.io/bdio#hasVersion
Component
, Project
Default
ContentRange
https://blackducksoftware.github.io/bdio#ContentRange
ContentType
https://blackducksoftware.github.io/bdio#ContentType
DateTime
http://www.w3.org/2001/XMLSchema#dateTime
Default
""
Digest
https://blackducksoftware.github.io/bdio#Digest
Long
http://www.w3.org/2001/XMLSchema#long
Products
https://blackducksoftware.github.io/bdio#Products
When processing the BDIO model, there are several defined behaviors that should uniformly apply to ensure interoperability. Often it is impractical for publishers to produce fully normal BDIO model, therefore several relationships are expected to be “implicit”, processors MUST handle BDIO data the same regardless of their presence. The published BDIO data in conjunction with any implicit relationships constitutes a connected graph.
The “root” of the BDIO data is a top-level object (one of: “Project”, “Container”, “Repository” or “FileCollection”). For a project, the root project cannot be claimed as a sub-project or previous version by any other project. Publishers MUST NOT produce BDIO data with multiple roots; processors MAY elect an arbitrary object as the root or fail outright when the root could be ambiguous.
The relationship between a file node and it’s parent need not be explicitly defined provided that the resolved absolute hierarchical path can be used to unambiguously identify the parent file.
The relationship between a component and the root object need not be explicity defined: any component that is not associated with a dependency is assumed to be a dependency of the root object. Processors MUST NOT assume any default values for the implicit dependency node necessary to describe the connection between the component and the root object.
BDIO properties which are subject to namespacing must be interpreted using rules specific to the namespace itself. It is soley the responsibility of the producer and consumer to negotiate namespace tokens and the corresponding property interpretation.
When a BDIO node does not explicitly define a namespace it is inferred by following relationships back to a root object (providing the root object supports the “namespace” property); the first encountered explicit namespace definition becomes the effective namespace of the node.
When describing file or resources in BDIO, publishers SHALL adhere to the guidelines in this section: this ensures data size is minimized and interpretation by processors can be consistent. BDIO files can be used to describe files on a file system (real or virtual), entries within an archive, resources from the web or any other entity which can be described as hierarchal structure of named pieces of data.
BDIO files must be represented as a tree-like hierarchy starting with the base file identified by the root object. The file’s “path” property is used to identify the location within the hierarchy. The path MUST be a valid absolute URI. Generally this will be an RFC 8089 “file” URI representing an absolute path to a file on the file system, however it may be any URI. Special considerations are necessary for the URI representing an archive entry: archive entry paths must be encoding using the pattern:
<scheme> ":" <archive-URI> "#" <entry-name>
It is possible to encode archive entry paths at arbitrary “nesting levels” (i.e. archives within archives) by recursively applying this pattern, however, it is important when constructing archive entry path URIs that the URI of the archive itself be encoded (including any “/” characters). As archive nesting levels increase, so will the encodings (e.g. a “/” will initially be encoded as “%2F”, at the next nesting level it will appear as “%252F”). All URI encodings MUST be performed on NFC normalized UTF-8 encoded byte sequences. Trailing slashes and files named “.” or “..” MUST be omitted. The scheme should be used to identify the archive format. Archive entry names SHOULD start with a “/”, however there are cases where the entry name can be used without the leading slash (refer to [Appendix C][#common-file-path-archive-schemes] for additional details).
Metadata regarding how the content of a file should be (or was) interpreted is split over several data properties in BDIO. The file system type is defined the role a file plays in the structure of the file hierarchy, it should be obtained either directly from the file system or from archive metadata used to preserve the original file system structure. The content type describes how the contents of the file should be interpreted, often times this information is inferred from the file name (e.g. extension matching); however producers may know the type as a consequence of processing the file. Publishers MUST NOT include a content type obtained through simple file name matching as this work can be done by a processor later; if content type is represented in metadata using a different format (e.g. using extended file attributes), or if the content type was determined based on some specific processing performed on the actual contents of the file, it is appropriate to include the content type, even if the result would be the same as computed through standard name mapping. Text based content SHOULD include the encoding (this information MUST NOT be included in the content type); it is possible that the encoding can be determined without a content type. Symbolic links can be recorded using the link path: the value is the same format described for file paths.
Processors MAY imply the file system type according the following rules, publishers MUST NOT generate conflicting data and consumers SHOULD reject data containing conflicts:
directory/archive
directory
symlink
regular/text
regular
The terms under which a project is licensed and the terms under which a component is used are described using the simple license relationships license
and licenseOrLater
or the complex license relationships licenseDisjunction
, licenseConjunction
and licenseException
. Additionally, an intermediate LicenseGroup
node may be embedded to avoid ambiguity when using several licenses.
When using simple license relationships, they MUST specify a single license; multiple complex license relationships may be used, however if more then two are in use, they MUST specify the same relationship. Publishers MUST NOT mix simple and complex license relationships. If multiple complex relationships are used, exceptions are considered first and conjunction takes precedence over disjunction.
Publishers may choose to include data in a BDIO data set that is not part of the BDIO model, this data will come in the form of JSON-LD types or terms not defined by this specification. Processors MUST preserve this unknown data when handling BDIO, however any data which is not reachable from the root project SHOULD be ignored and does not need to preserved.
BDIO data can be transferred using one of four different formats depending on the capabilities of the parties involved and volume of data. Any JSON data being transferred MAY be pretty printed, it makes human consumption easier and has minimal impact on data size when compression is being used. BDIO data MUST be expressed as a named graph, the graph’s label is used to uniquely identify the source of the data.
For all supported formats, characters MUST be encoded using UTF-8. Consumers MUST NOT accept alternate character encodings.
The default format for BDIO data is JSON-LD 1.0.
Producers MAY use remote JSON-LD contexts. Compliant consumers MAY operate in “offline mode”, restricting access to remote JSON-LD context documents. When operating in this mode, an offline cached copy of the following JSON-LD context documents MUST be made available during processing: https://blackducksoftware.github.io/bdio
Both compliant consumers and producers SHOULD use GZIP content encoding when transferring JSON-LD data. Note that content encoding support for HTTP requests is often not offered by default. Producers MUST create JSON-LD data whose content is strictly less then 16MB; consumers SHOULD fail when presented with a JSON-LD file in excess of this limit. The size limit is applied on presentation and MUST NOT account for effects of JSON-LD normalization though the application of compaction or expansion algorithms.
JSON-LD data should be given the file extension .jsonld
and should be transferred using the content type application/ld+json (only in accordance with the JSON-LD Specification).
While all JSON-LD is technically JSON, it is often convenient to simplify the syntax using JSON-LD compaction. The default context for BDIO documents may be referenced as a remote context using the IRI https://blackducksoftware.github.io/bdio
. Compliant consumers SHOULD include an offline copy of this context to avoid network traffic when processing JSON-LD data.
The Link
header should be honored on both requests and responses per the JSON-LD specification.
As with JSON-LD, GZIP content encoding SHOULD be used when transferring JSON data. Furthermore, the same 16MB size limit applies to the JSON data as presented to the consumer.
JSON data should be given the file extension .json
and should be transferred using either the content type application/json
(preferred when using explicit links to context data) or application/vnd.blackducksoftware.bdio+json
(when the use of the default BDIO context is implied).
In addition to transferring raw JSON-LD or a JSON representation of linked data, there are some limitations for which a separate format makes sense. For example, server support for Content-Encoding
and Link
request headers cannot always be guaranteed. The BDIO Document format is intended to overcome some of these limitations in a non-invasive manor.
A “BDIO Document” is a Zip file consisting of any number of JSON-LD files, or “entries”. Entries are individually compressed per the Zip specification, BDIO Document producers SHOULD use the DEFLATE compression method. The uncompressed size of each entry MUST be strictly less then 16MB. Compliant consumers SHOULD process entries in “appearance order”, that is, the order in which they were added to the Zip archive. Entry names MUST have a “.jsonld” suffix and SHOULD NOT contain the “/” character. Additional non-JSON-LD files MAY be included in the archive.
Each JSON-LD entry MUST be self-contained; producers SHOULD use the “expanded” form but may use other forms so long as full context information is included. Additionally, the same named graph label MUST be used for every entry. Unlike the other formats, compliant consumers are not required to provide any online connectivity or cached context content when processing BDIO Documents. Linked data nodes which span multiple JSON-LD entries MUST have an @id
and @type
specification in every JSON-LD node.
To simplify BDIO Document processing, several restrictions are placed on the Zip format used as the primary container. BDIO Documents MUST NOT contain header or footer data, that is the file should begin with a local file header and end with the Zip end of central directory record. Additionally BDIO Documents MUST NOT contain entries which are not listed in the central directory, this restriction allows BDIO Documents to be processed using stream based tools.
BDIO Documents should be given the file extension .bdio and should be transferred as binary data using the content type application/vnd.blackducksoftware.bdio+zip
.
It is impractical for this specification to absolutely define all of the available namespaces and their rules, they constantly change as new tools are introduced and as people remember how to use old tools. The following non-normative recommendations for publishers and processors serve only as a guideline to what could be implemented. These recommendations are subject to change and ultimately it is the responsibility of the publishers and processors to agree on the namespace values and the interpretation of the field values.
TODO This section is under development in the Proposed Namespace Values area.
Selection of proper identifiers is imperative to the proper construction of a BDIO data set.
TODO Suggest identifiers to use in specific situations, include use of “mvn:” and “urn:uuid:” URIs; avoid “data:” and “about:”…
This section is normative.
File System Type | Description |
---|---|
regular |
A regular file, typically of unknown content. |
regular/binary |
A regular file with executable content. |
regular/text |
A regular file with known text content. Should be accompanied by an encoding. |
directory |
A directory entry which may or may not contain children. |
directory/archive |
An archive which may or may not contain children. Can also be treated as a regular file. |
symlink |
A symbolic link. Should be accompanied by a link target. |
other/device/block |
A block device, like a drive. |
other/device/character |
A character device, like a terminal. |
other/door |
A door used for interprocess communication. |
other/pipe |
A named pipe. |
other/socket |
A socket. |
other/whiteout |
A whiteout, or file removal in a layered file system. |
NOTE: File system types are compared case-insensitively.
Algorithm | Expected Hex String Length | Description |
---|---|---|
md5 | 32 | MD5 |
sha1 | 40 | SHA-1 |
sha256 | 64 | SHA-2, SHA-256 |
NOTE: Algorithms are compared case-insensitively.
Scheme | Slash Required | Description |
---|---|---|
zip |
No | ZIP |
jar |
No | Java Archive |
tar |
No | Tape Archive |
rpm |
No | RPM Package Manager |
ar |
No | Unix archiver |
arj |
No | ARJ archives |
cpio |
No | Copy in and out |
dump |
No | Unix dump |
sevenz |
No | 7-Zip |
rar |
Yes | Roshal Archive |
xar |
Yes | Extensible Archive |
phar |
Yes | PHP Archive |
cab |
Yes | Cabinet |
unknown |
Yes | Used when the actual scheme is not known |
NOTE: File extensions and/or compression formats are not accounted for using the scheme, e.g. a file with the extension “.tgz” or “.tar.gz” still has a scheme of “tar”.
Content Type | Extension | Description |
---|---|---|
application/ld+json |
jsonld |
The content type used when BDIO data is represented using JSON-LD. The context will either be referenced as a remote document or will be explicitly included in the content body. Note that only UTF-8 character encoding is allowed. |
application/json |
json |
The content type used when BDIO data is represented using plain JSON and the context is specified externally (e.g. using the Link header). Note that only UTF-8 character encoding is allowed. |
application/vnd.blackducksoftware.bdio+json |
json |
The content type used when BDIO data is represented using plain JSON that should be interpreted using the default BDIO context. |
application/vnd.blackducksoftware.bdio+zip |
bdio |
The content type used when BDIO data is represented as self-contained JSON-LD stored in a ZIP archive. |