ArchiveSpark is a Java/JVM library, written in Scala, based on Apache Spark, which can be used as an API for easy and efficient access to web archives and other supported datasets, as part of your own project or stand-alone, using Scala's interactive shell or notebook tools, such as Jupyter.
To get familiar with ArchiveSpark, but also for most of the common use cases, we recommend the use with Jupyter. In order to get you started more easily, we provide a pre-packaged and pre-configured Docker container with ArchiveSpark and Jupyter ready to run, just one command away: https://github.com/helgeho/ArchiveSpark-docker
To learn more about ArchiveSpark have a look at our GitHub repository.