Scan retention
CI and webhook automation trigger a scan on every push, pull request, and merge request. Left unbounded, that produces thousands of near-identical snapshots per project. The portal keeps the history useful and the disk bounded with a retention model: only the latest successful scan per target stays live, older snapshots are superseded and reclaimed after a grace window, and scans you explicitly mark as a release are kept forever.
super_admin and team_admin operating a portal that receives automated scans. Familiarity with .env editing and docker-compose restart. For the per-scan lifecycle (queued → running → succeeded), see Scans.
The retention model
Every scan carries a retention target derived from its project and the normalized ref of the branch or PR it ran against. When a newer successful scan lands on the same target, the previous one is superseded.
target = (project_id, normalized_ref)
scan #1 on main ──► live
scan #2 on main ──► live, scan #1 becomes superseded ──► reclaimed after grace
scan #3 on main ──► live, scan #2 becomes superseded ──► reclaimed after grace
- Live — the most recent successful scan for a target. Always queryable; never reclaimed by age alone.
- Superseded — a previously-live scan replaced by a newer success on the same target. Kept for a grace window (
SCAN_RETENTION_SUPERSEDED_GRACE_DAYS, default 7 days) so you can diff or roll back, then reclaimed by the sweep. - Release — a scan whose
metadata.releaselabel is set. Immutable and permanent — the sweep never touches it, regardless of age or supersession. See Keep a scan forever. - Ref-less / failed — scans with no ref target (ad-hoc UI scans) and failed scans are not part of the supersession chain. They are protected by a per-project floor (
SCAN_RETENTION_KEEP_LAST, default 30) and an age ceiling (SCAN_RETENTION_MAX_AGE_DAYS, default 180).
Ref normalization
The retention target uses a normalized form of the ref so that the same logical branch or PR groups together regardless of how CI spells it. The portal normalizes the ref it receives in metadata.ref as follows:
| Incoming ref | Normalized | Notes |
|---|---|---|
refs/heads/main | main | Branch refs drop the refs/heads/ prefix. |
refs/pull/12/merge | pr-12 | GitHub PR merge refs become pr-<number>. |
refs/merge-requests/7/head | mr-7 | GitLab MR refs become mr-<iid>. |
main, release/2.0 | main, release/2.0 | A bare branch name is kept as-is. |
The GitHub Action forwards github.ref (or the PR number) and the GitLab CI template forwards CI_COMMIT_REF_NAME / the MR IID, so you get correct grouping without configuration. The Jenkinsfile snippet forwards BRANCH_NAME the same way.
Retention policy variables
All four keys are read at runtime via os.getenv — edit .env and restart the Celery worker and beat services to apply them. The service names differ by stack: on the production compose (docker-compose.yml) they are worker and beat; on the dev compose (docker-compose.dev.yml) they are celery-worker and celery-beat. See Environment variables → Scan retention for the canonical reference.
# In the portal's .env
SCAN_RETENTION_SUPERSEDED_GRACE_DAYS=7 # keep superseded snapshots this long before reclaim
SCAN_RETENTION_KEEP_LAST=30 # per-project floor for ref-less / failed scans
SCAN_RETENTION_MAX_AGE_DAYS=180 # age ceiling for ref-less / failed scans
| Key | Default | Effect |
|---|---|---|
SCAN_RETENTION_SUPERSEDED_GRACE_DAYS | 7 | Days a superseded snapshot survives before the sweep reclaims it. Raise it to keep more rollback history per branch. |
SCAN_RETENTION_KEEP_LAST | 30 | Minimum ref-less and failed scans kept per project, regardless of age. The sweep never trims below this floor. |
SCAN_RETENTION_MAX_AGE_DAYS | 180 | Among ref-less successful scans and failed/cancelled scans, those older than this (and beyond the keep-last floor) are reclaimed. The live snapshot of a ref and release-labelled scans are exempt — retire, not age, manages those. |
Lowering SCAN_RETENTION_SUPERSEDED_GRACE_DAYS or SCAN_RETENTION_MAX_AGE_DAYS means the next sweep reclaims more snapshots. The sweep is irreversible — reclaimed scans and their findings are gone. Tune up first, observe disk, then tune down.
The retention sweep
Reclamation runs as a Celery beat task every 6 hours, not synchronously on scan completion. Marking a scan superseded happens immediately when a newer success lands; the disk and database rows are reclaimed on the next sweep that finds the snapshot past its grace window.
Each sweep:
- Reclaims superseded snapshots older than
SCAN_RETENTION_SUPERSEDED_GRACE_DAYS. - Among ref-less successful scans and all failed/cancelled scans, keeps the newest
SCAN_RETENTION_KEEP_LASTper project and reclaims the rest that are older thanSCAN_RETENTION_MAX_AGE_DAYS. - Never touches the live snapshot of a ref (managed by retire, not the sweep) or a scan with a
metadata.releaselabel.
Reclaiming a scan deletes its workspace artefacts (source clone, cdxgen SBOM, scancode output) and its database rows (components, licenses, findings). The audit log records a scans delete event per reclaimed scan with the reason — superseded (step 1) or aged (step 2).
Keep a scan forever (release label)
To pin a scan so retention never reclaims it — for example the scan that backs a tagged release — set a metadata.release label when you trigger it. A release-labelled scan is immutable: it is exempt from the grace window, the age ceiling, and the keep-last trim.
curl -sS -X POST \
"https://trustedoss.example.com/v1/projects/${PROJECT_ID}/scans" \
-H "Authorization: Bearer ${TRUSTEDOSS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"kind": "source", "metadata": {"ref": "refs/tags/v2.0.0", "release": "v2.0.0"}}' | jq .
The release value is a free-form label (a version string is conventional). From CI, set it only on the workflow that runs on a tag push, so day-to-day branch and PR scans stay reclaimable while your release scans accumulate as a permanent compliance record.
Because release scans are outside the supersession chain, two releases on the same branch both stay live. That is intentional — you want every shipped version's SBOM on record.
Delete a scan by hand
Use DELETE /v1/scans/{scan_id} to reclaim a single scan immediately rather than waiting for the sweep — for example a scan triggered against the wrong project, or a noisy snapshot you do not want in the history.
curl -sS -X DELETE \
"https://trustedoss.example.com/v1/scans/${SCAN_ID}" \
-H "Authorization: Bearer ${TRUSTEDOSS_API_KEY}" | jq .
To delete a scan that carries a metadata.release label, add ?force=true:
curl -sS -X DELETE \
"https://trustedoss.example.com/v1/scans/${SCAN_ID}?force=true" \
-H "Authorization: Bearer ${TRUSTEDOSS_API_KEY}" | jq .
Authorization and responses
| Condition | Requirement / response |
|---|---|
| Caller role | developer or higher on the owning team. force=true requires team_admin or higher. |
| Other team's scan | 404 Not Found — other teams' scans are existence-hidden. A 404 does not confirm the scan exists. |
Scan is queued or running | 409 Conflict — the RFC 7807 body carries scan_active: true. An active scan cannot be deleted; cancel it first, then delete. |
Scan has a release label, no force | 409 Conflict — the RFC 7807 body carries scan_release_protected: true. Re-issue with ?force=true as a team_admin. |
force=true by a non-team_admin | 403 Forbidden — forcing the delete of a release-protected scan needs team_admin. |
| Deleted | 204 No Content. Artefacts and rows are gone; the audit log records a scans delete event. |
Verify it worked
- Trigger two source scans against the same branch. Fetch the older scan with
GET /v1/scans/{id}— itssuperseded_atis now set. The project's Releases list (GET /v1/projects/{id}/releases) shows only the newer snapshot; the superseded one is hidden there.
- Trigger a scan with a
releaselabel and a second scan on the same branch. The release scan'ssuperseded_atstaysnulland it remains in the Releases list — it is not superseded.
- After a superseded scan passes its grace window, the next 6-hourly sweep removes it. Confirm with the audit log: a
scansdeleteevent with reasonsuperseded.
DELETE /v1/scans/{scan_id}on a non-release, terminal scan returns204and the scan disappears from the history.
Troubleshooting
docker-compose -f docker-compose.yml logs --tail=200 beat | grep scan_retention_done— the last sweep's verdict and per-reason counts (reclaimed_superseded/reclaimed_aged). The resolved policy is logged asscan_retention_policyat the start of each sweep. On the dev compose the service iscelery-beat, notbeat.- Audit log filtered to
scansdelete— what was reclaimed and why.
Two scans on the same branch both stay live
They are not on the same normalized target. Confirm both forwarded the same metadata.ref. A bare branch name (main) and a fully-qualified ref (refs/heads/main) normalize to the same target, but a PR merge ref (refs/pull/12/merge → pr-12) is a distinct target from the base branch — that is intentional.
A scan I expected to be reclaimed is still here
Check, in order:
- It is the live snapshot for its target — live scans are never reclaimed by age until they are superseded or exceed
SCAN_RETENTION_MAX_AGE_DAYS. - It carries a
metadata.releaselabel — release scans are permanent. Inspect the scan's metadata in the UI or API. - It is within the
SCAN_RETENTION_KEEP_LASTfloor — a ref-less or failed scan among the newest N per project is protected regardless of age. - The grace window has not elapsed — a superseded scan survives
SCAN_RETENTION_SUPERSEDED_GRACE_DAYSbefore the sweep takes it.
Disk is filling faster than the sweep reclaims
The sweep runs every 6 hours; a heavy CI day can outpace it. Lower SCAN_RETENTION_SUPERSEDED_GRACE_DAYS so superseded snapshots reclaim sooner, or delete the worst offenders by hand. For workspace-level cleanup of artefacts, see Disk & health → What to do when disk fills up.
409 when deleting a scan
Either the scan is still queued / running (cancel it first — see Cancel a scan) or it has a release label and you did not pass ?force=true. The RFC 7807 body's extension field tells you which: scan_active: true versus scan_release_protected: true.
See also
- Scans — the per-scan lifecycle and cancel flow
- GitHub Actions — forwarding the ref as a retention key
- GitLab CI — forwarding the ref as a retention key
- Disk & health — workspace artefact cleanup
- Audit log — reclamation and delete events
- Environment variables → Scan retention