Skip to content
This repository was archived by the owner on Sep 20, 2024. It is now read-only.

Architecture

Norwin edited this page May 11, 2018 · 15 revisions

Depending on the crawler we choose, a different architecture will be required.

streambased (StormCrawler)

  • ๐Ÿ‘ less coupled components
  • ๐Ÿ‘ highly efficient + scalable
  • ๐Ÿ‘ results available as they come in
  • ๐Ÿ‘Ž setup of this architecture could be tough

Diagram Streaming

batchmode (Nutch)

  • ๐Ÿ‘ just works
  • ๐Ÿ‘Ž results available after a job comes in -> requires a notification mechanism

Diagram Batch

Clone this wiki locally