Ayan Sinha Mahapatra
FOSS Maintainer at AboutCode
Actions
Ayan is a core maintainer of ScanCode, the leading FOSS software composition analysis and license scanner. He also contributes to other AboutCode tools, which are a suite of FOSS tools, open data and public instances supported by the community. AboutCode helps you understand your dependencies (declared or hidden), what is the license/origin of the code used, are there security vulnerabilities, is the project maintained etc. He is also a Google Summer of Code (GSoC) mentor for AboutCode.
SBOM Quality and Accuracy: we need more than a simple “SBOM button” for compliance
India, US, EU, and other governments introduced cybersecurity regulations for anyone distributing software. Any software maintainer, contributor, and developer, needs to be aware of their software dependencies and any associated risk, and how to efficiently manage these software components. This is most often – and now regulated – with SBOMs.
Open source compliance – both licensing and security – is simple: Generate SBOMs and checkmarks in the compliance process. But correct compliance requires accuracy and quality. This can be challenging with false positives, undeclared reuse of files and snippets, vulnerability reachability, binary scanning, new manifest and exchange formats, and vulnerability disclosures, and other issues. All developers need to know how to resolve these challenges in compliance pipelines.
In this talk, Ayan will discuss best practices and share open source tools
Improving license compliance at scale for everyone with ScanCode
ScanCode is the leading license detection tool out there, and ScanCode licenseDB is the largest database of software licenses.
- does hash, automation, and sequence matching based analysis
- largest community maintained, curated database of software licenses, license obligations
- Reduced false-positives by using `required phrases`: detecting the important parts of license statements
- Follow references: `see license in LICENSE.txt`
- Support license statements from all major package manifests and metadata collection from all major software ecosystems
- License summarization through important files, file type classification and scanning source/binary packages
- Open data and public instances to scan/provide license data by packageURL
- Highlighting unique license detections across codebase and issues to review
- analysis to figure out deployed part of source
- Comprehensive SBOM generation, attribution generation by templates
ScanCode is also used to perform massive scans and improve license data and SBOM quality across ecosystems:
- by software heritage
- by HuggingFace to create a dataset of permissively licensed code for training LLMs
- by Clearlydefined (used by github)
License and Security compliance super-charged with PackageURL
PackageURL, the leading package identifier used now by all leading SBOM standards, code scanners and all major organizations, was started in ScanCode to identify a package uniquely and get licensing/vulnerability data about those packages from other databases, and communicate about packages used through SBOMs.
AboutCode not only maintains the PackageURL and VERS (version range) spec, tooling and standardization efforts, it also provides a suite of FOSS tools, open and federated data, and public instances to:
- Validate PackageURLs and package existence
- Get origin data and download source/binary archive from packageURLs
- Get license data for PackageURLs by getting metadata, scanning source, binary
- Get known vulnerabilities with exploitability/severity data
- Detect packages and packageURLs from all source code/binary/containers etc
- Working with ecosystems and FOSS foundations to improve data about packages/vulnerabilities
- Import, validate and enrich SBOMs with license/vulnerability data
- Provide purl accuracy benchmarks for comparing code scanning tools to identify support for ecosystems and gaps
A short presentation of the AboutCode stack can show these capabilities.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top