Skip to content

Latest commit

 

History

History
30 lines (23 loc) · 1.67 KB

README.md

File metadata and controls

30 lines (23 loc) · 1.67 KB

ArchiveSpark Documentation

ArchiveSpark is a Java/JVM library, written in Scala, based on Apache Spark, which can be used as an API for easy and efficient access to web archives and other supported datasets, as part of your own project or stand-alone, using Scala's interactive shell or notebook tools, such as Jupyter.

To get familiar with ArchiveSpark, but also for most of the common use cases, we recommend the use with Jupyter. In order to get you started more easily, we provide a pre-packaged and pre-configured Docker container with ArchiveSpark and Jupyter ready to run, just one command away: https://github.com/helgeho/ArchiveSpark-docker

To learn more about ArchiveSpark have a look at our GitHub repository.

Basics / Background

Getting Started

API Docs

Developer Documentation