Strategy for how to crawl/index frequently updated webpages?

This is actually a stack overflow question. Quote: Question: I'm trying to build a very small, niche search engine, using Nutch to crawl specific sites. Some of the sites are news/blog sites. If I cra
相关文章
相关标签/搜索