Skip to main content

AI System Compliance and Open Source Management

Beyond managing code generated by AI coding tools, the AI system itself also requires open source management. AI frameworks, pre-trained models, and training datasets all make extensive use of open source and open data.

If your organization already operates ISO/IEC 5230 (license compliance) and ISO/IEC 18974 (security assurance), the same principles should be applied in AI system development stages.


Three areas where open source is used

AI system
├── 1. AI frameworks and libraries
(PyTorch, TensorFlow, Hugging Face Transformers, LangChain etc.)
│ → Apply existing ISO 5230 process as-is

├── 2. Pre-trained models
(Llama, Mistral, Falcon, BERT etc.)
│ → Custom licenses must be checked per model

└── 3. Training datasets
(Common Crawl, Wikipedia, CC-BY datasets etc.)
→ Fulfill open-data license obligations

1. AI Frameworks and Libraries

Apply the ISO/IEC 5230 process just as you do for normal software dependencies. Scan AI code repositories with existing SBOM generation tools (such as syft, cdxgen, and FOSSLight).

FrameworkLicenseCommercial UseKey Obligations
PyTorchBSD 3-ClauseCopyright notice
TensorFlowApache 2.0Copyright notice, modification notice
Hugging Face TransformersApache 2.0Copyright notice
LangChainMITCopyright notice
scikit-learnBSD 3-ClauseCopyright notice

2. Pre-trained Models

Unlike common open source libraries, pre-trained models often use custom licenses. Be careful, because they may include commercial-use restrictions, MAU-based conditions, and derivative model disclosure obligations.

License TypeRepresentative ModelsCommercial UseDerivative Model Disclosure
Apache 2.0Falcon, Mistral 7B
MITGPT-2, GPT-J
Llama Community LicenseLlama 3Conditional (MAU under 700M)
CC-BY 4.0Some academic modelsAttribution required
CC-BY-NCSome research models❌ Non-commercial only
Model licenses must be checked individually

AI model licenses are not standardized. Always verify the model card and license directly in places like the Hugging Face model hub.

  • Whether commercial use is allowed
  • MAU- or revenue-based restriction conditions
  • Fine-tuned derivative model disclosure requirements
  • Model disclosure requirements for AI systems

Include model information in AI SBOM

Build an AI SBOM that includes pre-trained models in the SBOM. Example based on SPDX 3.0 AI Profile:

YAML
- name: 'meta-llama/Llama-3.1-8B'
version: '3.1'
license: 'Llama Community License Agreement'
primaryPurpose: 'inference'
modelCard: 'https://huggingface.co/meta-llama/Llama-3.1-8B'

Existing SBOM tools do not automatically detect model files, so model and dataset entries should be added manually.


3. Training Datasets

If open data licenses apply to datasets used for AI model training, corresponding obligations must be fulfilled.

LicenseAttributionCommercial UseShare-Alike Required
CC0
CC-BY 4.0
CC-BY-SA 4.0
CC-BY-NC 4.0❌ Non-commercial only
  • CC-BY family: Specify dataset sources in the model card or system documentation.
  • CC-BY-SA: Coordinate derivative model licensing treatment with legal in advance.

ISO/IEC 42001 and the role of open source owners

When an organization prepares an ISO/IEC 42001 AI management system, the clauses below directly connect to open source management.

ISO 42001 ClauseOpen Source Owner Role
§5.2 AI PolicyInclude open source usage principles in AI policy
§6.1.2 AI Risk AssessmentIdentify and assess OSS license and vulnerability risks
§7.5 DocumentationEstablish and maintain AI SBOM
§8.5 AI LifecycleReview OSS compliance by development stage
§8.6 AI DataManage dataset licenses
§8.8 External AI ProcurementVerify external open source model supply chains
ISO 42001 certification is a separate process

ISO/IEC 42001 certification covers overall AI system governance and is conducted separately from ISO 5230/18974 conformance. Organizations that already established ISO 5230/18974 systems can reuse the cross-mapped items above to streamline ISO 42001 preparation.


Learn More