Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 1.33 KB

Dev_DataSpecs.md

File metadata and controls

11 lines (7 loc) · 1.33 KB
< Table of Contents How to Implement Enrichment Functions >

How to Implement DataSpecs

ArchiveSpark comes with a base class for Data Specifications, called DataSpec. It accepts two types to be defined: The first is Raw, which is the raw type of metadata to be loaded from disk or a remote source by the load method, e.g., String for raw text. Each loaded metadata record is then passed to the parse method, which has to implement the logic that transforms the raw data into a record of your dataset. This can be any custom class derived from EnrichRoot. These records store and provide access to the metadata as well as include the logic to access the actual data records.

For examples, please have a look at the included DataSpecs, such as HdfsFileSpec or the external IABooksOnArchiveSpark project. For more information on how to deploy and share your DataSpecs, please read Contribute.

< Table of Contents How to Implement Enrichment Functions >