21 lines
732 B
Markdown
21 lines
732 B
Markdown
# Daily Stormer Utilities
|
|
|
|
This is a little project to help me get out of my funk.
|
|
|
|
## Goals
|
|
|
|
* Crawl all dailystormer articles.
|
|
* Save crawl results as JSON files.
|
|
* Be able to run the crawler again to get new articles.
|
|
* This implies that I can keep track of what I've already crawled.
|
|
* Make it possible to do a full text search on those articles.
|
|
|
|
## Questions
|
|
|
|
### Crawling
|
|
|
|
The recursive crawl to get old articles and the crawl to update should be different. The updater
|
|
can assume a big crawl has already been done, and just pull articles down from the RSS feed. However,
|
|
the recursive crawl doesn't have that luxury, because it must find past articles and handle newly
|
|
discovered tags and categories as new crawl targets.
|