Nutch is an effort to build an open source search engine. It uses Lucene for the search and index component. The fetcher ("robot" or "web crawler") has been written from scratch solely for this project. Nutch has a highly modular...
See more »
Nutch is an effort to build an open source search engine. It uses Lucene for the search and index component. The fetcher ("robot" or "web crawler") has been written from scratch solely for this project. Nutch has a highly modular architecture allowing developers to create plugins for the following activities: media-type parsing, data retrieval, querying and clustering. Doug Cutting is the lead developer. As of June 2005, Nutch has graduated from the Apache Incubator, and is now a subproject of Lucene. It is coded completely in the Java programming language, but data is written in language-independent formats. In June 2003, there was a successful 100 million page demo system. To meet the multimachine processing needs of the crawl and index tasks, the Nutch project has also implemented a MapReduce facility and a distributed file system. These two facilities have been spun out into their own subproject called Hadoop.
See less »
Kaboodle will send you a newsletter and updates from your friends. You can unsubscribe at any time. Kaboodle does not sell or share your email address or personal information with anyone.
Kaboodle requires all users to provide their real date of birth as both a safety precaution and as a means
of preserving the integrity of the site. You will be able to hide this information from your profile if you wish.
Added by 1 people