EbayCrawler is a Spring Boot web crawler.
EbayCrawler will crawl all the links in a specific URL and return tree like data structure that include url, http status, child links.
Before you begin, ensure you have met the following requirements:
- You have installed at least java 8.
To install EbayCrawler, follow these steps:
- clone the project from github.
- clean and install using maven.
To use EbayCrawler, follow these steps:
- run the jar.
- send request with postman
-
For Task 3 Improve the performance of your CrawlLinks API so it can support high crawlingDepth values (100 and more): I add in memory cache that will store all the visited links so if we visit it again we can retrieve the tree from the cache instead of crawl again and again.
-
For more scalable solution we can add redis db to store the visited links in cache db.
-
Another solution that require more time is to implement the crawler in multi-threads.