2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00
2020-12-20 11:11:49 -08:00

Daily Stormer Utilities

This is a little project to help me get out of my funk.

Goals

  • Crawl all dailystormer articles.
  • Save crawl results as JSON files.
  • Be able to run the crawler again to get new articles.
  • This implies that I can keep track of what I've already crawled.
  • Make it possible to do a full text search on those articles.

Questions

Crawling

The recursive crawl to get old articles and the crawl to update should be different. The updater can assume a big crawl has already been done, and just pull articles down from the RSS feed. However, the recursive crawl doesn't have that luxury, because it must find past articles and handle newly discovered tags and categories as new crawl targets.

Languages
JavaScript 66.3%
EJS 32.9%
Shell 0.8%