Skip to content

Commit 2ef884f

Browse files
authored
docs: add concepts of modeling artifacts (#912)
Resolves: #815 Signed-off-by: Lixia (Sylvia) Lei <[email protected]>
1 parent 853e012 commit 2ef884f

File tree

1 file changed

+378
-0
lines changed

1 file changed

+378
-0
lines changed

docs/Modeling-Artifacts.md

Lines changed: 378 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,378 @@
1+
# Modeling Artifacts
2+
3+
In `oras-go` v2, artifacts are modeled as [Directed Acyclic Graphs (DAGs)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) stored in [Content-Addressable Storages (CASs)](https://en.wikipedia.org/wiki/Content-addressable_storage).
4+
5+
In this model, an artifact is represented as a rooted DAG whose root node is an [OCI Manifest](https://github.com/opencontainers/image-spec/blob/v1.1.1/manifest.md). Artifacts may be grouped by an [OCI Index](https://github.com/opencontainers/image-spec/blob/v1.1.1/image-index.md), which is also a rooted DAG.
6+
7+
## Simple Artifact
8+
9+
The following example demonstrates an artifact manifest:
10+
11+
```json
12+
{
13+
"schemaVersion": 2,
14+
"mediaType": "application/vnd.oci.image.manifest.v1+json",
15+
"artifactType": "application/vnd.example+type",
16+
"config": {
17+
"mediaType": "application/vnd.oci.empty.v1+json",
18+
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
19+
"size": 2,
20+
"data": "e30="
21+
},
22+
"layers": [
23+
{
24+
"mediaType": "application/vnd.custom.type",
25+
"digest": "sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c",
26+
"size": 4,
27+
"annotations": {
28+
"org.opencontainers.image.title": "foo.txt"
29+
}
30+
},
31+
{
32+
"mediaType": "application/vnd.custom.type",
33+
"digest": "sha256:7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730",
34+
"size": 4,
35+
"annotations": {
36+
"org.opencontainers.image.title": "bar.txt"
37+
}
38+
}
39+
],
40+
"annotations": {
41+
"org.opencontainers.image.created": "2025-01-23T10:57:27Z"
42+
}
43+
}
44+
```
45+
46+
This manifest indicates that the artifact contains a config blob and two layer blobs. When stored in a CAS, a digest is computed from the manifest content. In this instance, the digest is:
47+
`sha256:314c7f20dd44ee1cca06af399a67f7c463a9f586830d630802d9e365933da9fb`.
48+
49+
The artifact stored in CAS can be represented by the graph below:
50+
51+
```mermaid
52+
graph TD;
53+
54+
Manifest["Manifest<br>(sha256:314c7f...)"]--config-->Config["Config blob<br>(sha256:44136f...)"]
55+
Manifest--layers-->Layer0["Layer blob 0<br>(sha256:b5bb9d...)"]
56+
Manifest--layers-->Layer1["Layer blob 1<br>(sha256:7d865e...)"]
57+
58+
```
59+
60+
This graph is a [Merkle](https://en.wikipedia.org/wiki/Merkle_tree) Directed Acyclic Graph (DAG), where every object is a node uniquely identified by its digest. Since the digests are computed from the content and the content is fixed, every node itself in the graph is immutable.
61+
62+
In this graph, the manifest is the root of the graph, and the config or layer blobs are the leaf nodes referenced by the root.
63+
64+
## Artifact with Subject
65+
66+
When an artifact manifest is signed using tools such as [`notation`](https://github.com/notaryproject/notation), a signature manifest is created and attached to the artifact manifest being signed. The signature manifest references a signature blob and specifies a `subject` field that points to the target artifact manifest.
67+
68+
The following example demonstrates a signature manifest:
69+
70+
```json
71+
{
72+
"schemaVersion": 2,
73+
"mediaType": "application/vnd.oci.image.manifest.v1+json",
74+
"config": {
75+
"mediaType": "application/vnd.cncf.notary.signature",
76+
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
77+
"size": 2
78+
},
79+
"layers": [
80+
{
81+
"mediaType": "application/jose+json",
82+
"digest": "sha256:37f88486592fd90ace303ee38f8d1ff698193e76c76d3c1fef8627a39e677696",
83+
"size": 2090
84+
}
85+
],
86+
"subject": {
87+
"mediaType": "application/vnd.oci.image.manifest.v1+json",
88+
"digest": "sha256:314c7f20dd44ee1cca06af399a67f7c463a9f586830d630802d9e365933da9fb",
89+
"size": 762
90+
},
91+
"annotations": {
92+
"io.cncf.notary.x509chain.thumbprint#S256": "[\"a9c85558943f197f41fe7cf3caf691f7df8d0088be426a33d895560717893962\"]",
93+
"org.opencontainers.image.created": "2025-02-01T09:50:52Z"
94+
}
95+
}
96+
```
97+
98+
This signature manifest indicates that the signature artifact contains one config blob and one layer blob, and its `subject` refers to the digest of the artifact manifest in the [above example](#simple-artifact). This signature manifest is considered a `Referrer` of the artifact manifest.
99+
100+
When stored in the CAS, the digest computed from the signature manifest content is:
101+
`sha256:e5727bebbcbbd9996446c34622ca96af67a54219edd58d261112f1af06e2537c`.
102+
103+
The relationship between the artifact and its signature appears in the graph below:
104+
105+
```mermaid
106+
graph TD;
107+
108+
SignatureManifest["Signature Manifest<br>(sha256:e5727b...)"]--subject-->Manifest
109+
SignatureManifest--config-->Config
110+
SignatureManifest--layers-->SignatureBlob["Signature blob<br>(sha256:37f884)"]
111+
112+
Manifest["Manifest<br>(sha256:314c7f...)"]--config-->Config["Config blob<br>(sha256:44136f...)"]
113+
Manifest--layers-->Layer0["Layer blob 0<br>(sha256:b5bb9d...)"]
114+
Manifest--layers-->Layer1["Layer blob 1<br>(sha256:7d865e...)"]
115+
```
116+
117+
In this model, the signature manifest acts as the root for the combined graph, while the artifact manifest is the root of its own subgraph.
118+
119+
Note that because the config blob of the artifact and the signature is the same, it is stored only once in the CAS and appears as only one node. This is a common case and it's why artifacts are modeled as graphs instead of trees.
120+
121+
## Index of Artifacts
122+
123+
An [OCI Index](https://github.com/opencontainers/image-spec/blob/v1.1.1/image-index.md) can also be created for collecting multiple manifests.
124+
For example, an Index referencing two manifests would look like:
125+
126+
```json
127+
{
128+
"schemaVersion": 2,
129+
"mediaType": "application/vnd.oci.image.index.v1+json",
130+
"manifests": [
131+
{
132+
"mediaType": "application/vnd.oci.image.manifest.v1+json",
133+
"digest": "sha256:314c7f20dd44ee1cca06af399a67f7c463a9f586830d630802d9e365933da9fb",
134+
"size": 762
135+
},
136+
{
137+
"mediaType": "application/vnd.oci.image.manifest.v1+json",
138+
"digest": "sha256:eba50b7b7dfdf6294a375a3376b2b74e3b926c75119f7da04b1c671c7de662c9",
139+
"size": 588
140+
}
141+
]
142+
}
143+
```
144+
145+
When stored in a CAS, the digest computed for this Index is:
146+
`sha256:9c7c6bfa51dac3c9dfeffc7a0a795c30101f1f60afa64739767cedd92f574570`.
147+
148+
The relationship between the Index and the artifacts in the CAS can be modeled as the graph below:
149+
150+
```mermaid
151+
graph TD;
152+
153+
Index["Index<br>(sha256:9c7c6b...)"]--manifests-->Manifest
154+
Index--manifests-->AnotherManifest
155+
156+
Manifest--layers-->Layer0["Layer blob 0<br>(sha256:b5bb9d...)"]
157+
Manifest--layers-->Layer1["Layer blob 1<br>(sha256:7d865e...)"]
158+
Manifest["Manifest<br>(sha256:314c7f...)"]--config-->Config["Config blob<br>(sha256:44136f...)"]
159+
160+
AnotherManifest["Another Manifest<br>(sha256:eba50b)"]--config-->Config
161+
AnotherManifest--layers-->Layer2["Layer blob 2<br>(sha256:a94890...)"]
162+
163+
```
164+
165+
In this graph, the Index serves as the root of the overall graph, with each manifest defining the root of its corresponding artifact subgraph.
166+
167+
## Graph Concepts
168+
169+
A complex DAG may integrate artifacts, their referrers, and the Indexes that reference them. The following example demonstrates such a graph:
170+
171+
```mermaid
172+
graph TD;
173+
174+
I0["Index i0"]--manifests-->M0
175+
I0--manifests-->M1
176+
177+
M2["Manifest m2"]--config-->Blob0
178+
M2--layers-->Blob5["Blob b5"]
179+
M2--subject-->M0
180+
181+
M1["Manifest m1"]--config-->Blob3["Blob b3"]
182+
M1--layers-->Blob4["Blob b4"]
183+
184+
M0["Manifest m0"]--config-->Blob0["Blob b0"]
185+
M0--layers-->Blob1["Blob b1"]
186+
M0--layers-->Blob2["Blob b2"]
187+
```
188+
189+
For any node in the graph, the following definitions apply:
190+
191+
- **Successor:** Any node that is pointed to by a given node. For instance:
192+
- Blob `b0` is a successor of both `m0` and `m2`
193+
- Manifest `m0` is a successor of `m2` and `i0`
194+
195+
- **Predecessor:** Any non-leaf node that directly points to a given node. For instance:
196+
- Manifest `m0` is a predecessor of `b0`, `b1`, and `b2`
197+
- Manifest `m2` is a predecessor of `b0`, `b5`, and `m0`
198+
- Index `i0` is a predecessor of `m0` and `m1`
199+
200+
These definitions apply to nodes of any type—manifests, indexes, or arbitrary blobs.
201+
202+
However, the referrer relationship is different. A manifest (including [Image Manifest](https://github.com/opencontainers/image-spec/blob/v1.1.1/manifest.md) and [Index](https://github.com/opencontainers/image-spec/blob/v1.1.1/image-index.md)) with a `subject` field is considered a referrer of that subject manifest. According to [OCI image-spec v1.1.1](https://github.com/opencontainers/image-spec/blob/v1.1.1/manifest.md), both the `referrer` and the `subject` must be manifests.
203+
204+
So, it is worth noting that:
205+
206+
- `m0` is a `subject` of `m2`, and it is a successor of both `m2` and `i0`
207+
- `m2` is a referrer of `m0`, and it is a predecessor of `m0`, `b0`, and `b5`
208+
209+
Defining functions `Predecessors()`, `Successors()`, and `Referrers()`, the example result would be:
210+
211+
```
212+
Successors(m0) == [b0, b1, b2]
213+
Predecessors(m0) == [m2, i0]
214+
Referrers(m0) == [m2]
215+
216+
Successors(m2) == [m0, b0, b5]
217+
Predecessors(m2) == []
218+
Referrers(m2) == []
219+
220+
Successors(b0) == []
221+
Predecessors(b0) == [m0, m2]
222+
Referrers(b0) == []
223+
```
224+
225+
### Copy
226+
227+
Given the root node of a Directed Acyclic Graph (DAG), the `Copy` function replicates the graph reachable from that root node from one Content-Addressable Storage (CAS) to another. This is achieved by recursively invoking the `Successors()` function to traverse and copy all descendant nodes in a certain order.
228+
229+
Taking the [graph above](#graph-concepts) as an example:
230+
231+
`Copy(m0)` copies the graph rooted by the node `m0`, including `m0` itself and all of its successors `b0`, `b1`, and `b2`.
232+
233+
```mermaid
234+
graph TD;
235+
236+
M0["Manifest m0"]--config-->Blob0["Blob b0"]
237+
M0--layers-->Blob1["Blob b1"]
238+
M0--layers-->Blob2["Blob b2"]
239+
```
240+
241+
`Copy(m2)` copies the graph rooted at the node `m2`, including `m2` itself, its successor `b5`, and the subgraph rooted at `m0`.
242+
243+
```mermaid
244+
graph TD;
245+
246+
M2["Manifest m2"]--config-->Blob0
247+
M2--layers-->Blob5["Blob b5"]
248+
M2--subject-->M0
249+
250+
M0["Manifest m0"]--config-->Blob0["Blob b0"]
251+
M0--layers-->Blob1["Blob b1"]
252+
M0--layers-->Blob2["Blob b2"]
253+
```
254+
255+
`Copy(b0)` copies itself only as it has no successor.
256+
257+
```mermaid
258+
graph TD;
259+
260+
Blob0["Blob b0"]
261+
```
262+
263+
### Extended Copy
264+
265+
As an extension to the `Copy` function, the `ExtendedCopy` function is designed to replicate the entire graph reachable from any given node in a DAG. This method requires that the source CAS supports predecessor finding (i.e., it indexes predecessor relationships when storing the graph).
266+
267+
The predecessor relationship for the [example graph](#graph-concepts) looks like this:
268+
269+
```mermaid
270+
graph TD;
271+
272+
Blob0["Blob b0"]--predecessor-->M0["Manifest m0"]
273+
Blob1["Blob b1"]--predecessor-->M0
274+
Blob2["Blob b2"]--predecessor-->M0
275+
276+
Blob3["Blob b3"]--predecessor-->M1["Manifest m1"]
277+
Blob4["Blob b4"]--predecessor-->M1
278+
279+
Blob5["Blob b5"]--predecessor-->M2
280+
M0--predecessor-->M2
281+
Blob0--predecessor-->M2["Manifest m2"]
282+
283+
M1--predecessor-->I0
284+
M0--predecessor-->I0["Index i0"]
285+
```
286+
287+
With the predecessor finding capability, `ExtendedCopy` recursively calls `Predecessors()` to discover root nodes, then applies `Copy` on each discovered root. For instance:
288+
289+
`ExtendedCopy(b5)` finds out the root node `m2` starting from `b5`, and copies the graph rooted at `m2`:
290+
291+
```mermaid
292+
graph TD;
293+
294+
M2["Manifest m2"]--config-->Blob0
295+
M2--layers-->Blob5["Blob b5"]
296+
M2--subject-->M0
297+
298+
M0["Manifest m0"]--config-->Blob0["Blob b0"]
299+
M0--layers-->Blob1["Blob b1"]
300+
M0--layers-->Blob2["Blob b2"]
301+
```
302+
303+
`ExtendedCopy(m1)` determines the root node `i0` from `m1`, and copies the graph rooted at `i0`:
304+
305+
```mermaid
306+
graph TD;
307+
308+
I0["Index i0"]--manifests-->M0
309+
I0--manifests-->M1
310+
311+
M1["Manifest m1"]--config-->Blob3["Blob b3"]
312+
M1--layers-->Blob4["Blob b4"]
313+
314+
M0["Manifest m0"]--config-->Blob0["Blob b0"]
315+
M0--layers-->Blob1["Blob b1"]
316+
M0--layers-->Blob2["Blob b2"]
317+
```
318+
319+
`ExtendedCopy(b0)` finds multiple root nodes `m2` and `i0` starting from `b0`, then copies the combined graph rooted at `m2` and `i0`:
320+
321+
```mermaid
322+
graph TD;
323+
324+
I0["Index i0"]--manifests-->M0
325+
I0--manifests-->M1
326+
327+
M2["Manifest m2"]--config-->Blob0
328+
M2--layers-->Blob5["Blob b5"]
329+
M2--subject-->M0
330+
331+
M1["Manifest m1"]--config-->Blob3["Blob b3"]
332+
M1--layers-->Blob4["Blob b4"]
333+
334+
M0["Manifest m0"]--config-->Blob0["Blob b0"]
335+
M0--layers-->Blob1["Blob b1"]
336+
M0--layers-->Blob2["Blob b2"]
337+
```
338+
339+
#### Referrers API / Referrers Tag Schema
340+
341+
Many CAS implementations, such as artifact registries, support referrers discovery via the [Referrers API](https://github.com/opencontainers/distribution-spec/blob/v1.1.1/spec.md#listing-referrers) but do not support general predecessor finding.
342+
When interacting with artifact registries, if Referrers API is not available, `oras-go` will fallback to the [Referrers Tag Schema](https://github.com/opencontainers/distribution-spec/blob/v1.1.1/spec.md#referrers-tag-schema) approach, which simulates the behavior of the Referrers API through some client-side efforts.
343+
344+
In these systems, the `Predecessors` function essentially operates as `Referrers`.
345+
346+
The referrer/subject relationship for the [example graph](#graph-concepts) looks like this:
347+
348+
```mermaid
349+
graph TD;
350+
351+
M2["Manifest m2"]--subject-->M0["Manifest m0"]
352+
M0--referrer-->M2
353+
```
354+
355+
When replicating graphs from source artifact registries to another CAS, the limited predecessor finding functionality restricts the set of nodes that can be copied.
356+
357+
For example, `ExtendedCopy(m0)` can only find the root node `m2` starting from `m0` and will copy the graph rooted at `m2`. In this case, `i0` is not reachable from `m0` because there is no referrer/subject relationship between `i0` and `m2`.
358+
359+
```mermaid
360+
graph TD;
361+
362+
M2["Manifest m2"]--config-->Blob0
363+
M2--layers-->Blob5["Blob b5"]
364+
M2--subject-->M0
365+
366+
M0["Manifest m0"]--config-->Blob0["Blob b0"]
367+
M0--layers-->Blob1["Blob b1"]
368+
M0--layers-->Blob2["Blob b2"]
369+
```
370+
371+
`ExtendedCopy(m1)` finds no referrer of `m1`, so it just copies the graph rooted `m1`.
372+
373+
```mermaid
374+
graph TD;
375+
376+
M1["Manifest m1"]--config-->Blob3["Blob b3"]
377+
M1--layers-->Blob4["Blob b4"]
378+
```

0 commit comments

Comments
 (0)