Skip to content
Paul Rogers edited this page Nov 15, 2019 · 4 revisions

Create the Scan Operator for a Storage Plugin

We are now ready to actually read data for our storage plugin. Drill will pass our sub scan from the planner to the execution Drillbit(s) in the form of serialized JSON. Our job is to:

  • Tell Drill how to associate the sub scan with a factory that will create our scan operator.
  • Create the scan operator (which Drill calls a "scan batch").
  • Implement the reader for our data source.

This example uses the enhanced vector framework (EVF) to implement the scan operator. Most existing plugins use the older ScanBatch implementation.

Operator Factory

Drill uses an operator factory (which drill calls a "batch creator") to create the actual execution operator. Drill looks for all classes that extend BatchCreator. Then, of all such classes, Drill looks for the one where the first argument to the getBatch() method matches the class of our sub scan.

public class ExampleScanBatchCreator implements BatchCreator<ExampleSubScan> {

  @Override
  public CloseableRecordBatch getBatch(ExecutorFragmentContext context,
      ExampleSubScan config, List<RecordBatch> children)
      throws ExecutionSetupException {
    Preconditions.checkArgument(children.isEmpty());
    return null;
  }
}

Test the Operator Factory

We can now verify that our sub scan is properly serialized, sent over the wire, deserialized, and used to match our operator factory. Just set a breakpoint in the getBatch() method and run the test case.

Scan Operator

Storage plugins do not provide their own scan operator. Instead, they use an existing operator and simply provide a record reader (old-style ScanBatch) or batch reader (newer EVF).

Batch Reader

Construction of an EVF-based batch reader was already described elsewhere, the work is the same whether the reader is for a format plugin or a storage plugin.

Test

You can now run a full test:

Clone this wiki locally