Releases: annotation/mondriaan
v0.8.14
v0.8.13
For debugging
v0.8.12
New TEI source data:
There is now additional back matter: a list of artworks.
The file folder structure of the data is
proeftuin
- letter
- letter
backmatter
- artwork (template: artworklist)
- biblio (template: bibliolist)
There are two new series of edge annotations: link_ref
and link_target
.
They link elements via the ref
and target
atrributes to other elements identified by the
id
attributes. This can happen across files.
The ref
and target
attributes are still present as annotations,
the link_ref
and target_ref
have been generated from this.
The targets of these annotations are pairs: from and to.
The from corresponds to the element that contains the ref
or target
attribute,
the to corresponds to the element that contains the id
attribute.
Nothing new has been done with the spacing and the newlines.
This release has been published with the command A.publish()
, a function in Text-Fabric.
v0.8.11
There is now back matter: a bibliography.
The file folder structure of the data is
proeftuin
- letter
- letter
backmatter
- biblio
Some things have been fixed, mainly stuff with newlines and whitespaces.
-
<choice>
element: this has pure content.
Normally, the children in such elements contain fields, and we add spaces or newlines between them to separated them.
But in running text they may be used to provided alternatives, withut any intention of separate words.
We make a list of such elements, and do not generate spaces there.
So far the list only contains<choice>
-
Spacing after empty elements used to get lost. This happend during the ingest of the tokens coming from the NLP.
That was a bug with a clear fix.
This release has been published with the command A.publish()
, a function in Text-Fabric.
v0.8.10
addrLine has been added to the set of NEWLINE ELEMENTS.
For Team Text: do not forget to change token
to t
in your code. (an earlier change).
This release has been published with the command A.publish()
, a function in Text-Fabric.
v0.8.9
There are now two types of token:
- Atomic node type
t
: not crossing element boundaries - Full node type
token
: as detected by Spacy
When you render text, use the t
type.
When you do NLP, use the token
type.
For Team Text: you may need to change token
to t
in your code.
This release has been published with the command A.publish()
, a function in Text-Fabric.
v0.8.8
Processing instructions needed some fixes, both in the annotations generated for them
and in the tests.
Processing instructions translate to TF in the form of nodes of type ?
target,
and to WATM as annotations of kind pi
with body target (without the ?
).
Prior to this release, processing instructions also generated content: all the text in the pi except
the target was also treated as text.
Now we treat it as attributes and values only.
The documentation has also been updated to describe processing instructions.
This release has been published with the command A.publish()
, a function in Text-Fabric.
All derived features from #18
All work is now based on a new TEI source version: 2023-05-24.
All derived features of #18 have been generated.
However, only the content of the ref/key attributes of rs elements is in the features.
Later we will make an indirection, and grab data from related files, like artwork.xml, bio.xml, and biblio.xml.
Four derived features
There are now four derived features in the data: country, institution, manid, letterid.
Annotations in anno.json have a modified datamodel: there is an extra field to store the namespace: tei, tf, nlp.
Data version 0.8.5
pb elements replaced by tags.
rs key attributes no longer have the http in it, just the digits of the identifier.