Multiple-Goal Heuristic Search
Abstract
This paper presents a new framework for anytime heuristic search where the task is to achieve as many goals as possible within the allocated resources. We show the inadequacy of traditional distance-estimation heuristics for tasks of this type and present alternative heuristics that are more appropriate for multiple-goal search. In particular, we introduce the marginal-utility heuristic, which estimates the cost and the benefit of exploring a subtree below a search node. We developed two methods for online learning of the marginal-utility heuristic. One is based on local similarity of the partial marginal utility of sibling nodes, and the other generalizes marginal-utility over the state feature space. We apply our adaptive and non-adaptive multiple-goal search algorithms to several problems, including focused crawling, and show their superiority over existing methods.
Introduction
Internet search engines build their indices using brute-force crawlers that attempt to scan large portions of the Web. Due to the size of the Web, these crawlers require several weeks to complete one scan, even when using very high computational power and bandwidth (Brin & Page, 1998; Douglis, Feldmann, Krishnamurthy, & Mogul, 1997), and they still leave a large part of the Web uncovered (Lawrence & Giles, 1998; Najork & Wiener, 1998). Many times, however, it is necessary to retrieve only a small portion of Web pages dealing with a specific topic or satisfying various user criteria. Using brute-force crawlers for such a task would require enormous resources, most of which would be wasted on irrelevant pages. Is it possible to design a focused crawler that would scan only relevant parts of the Web and retrieve the desired pages using far fewer resources than the exhaustive crawlers? Since the Web can be viewed as a large graph (Cooper & Frieze, 2002; Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins, & Upfal, 2000; Pandurangan, Raghavan, & Upfal, 2002), where pages are nodes and links are arcs, we may look for a solution to the above problem in the field of heuristic graph-search algorithms. A quick analysis, however, reveals that the problem definition assumed by the designers of heuristic search algorithms is inappropriate for focused crawling, which uses an entirely different setup. The crucial difference between heuristic search and focused crawling is the success criterion. In both setups there is a set of goal states. In the heuristic search setup, however, the search is completed as soon as a single goal state is found, while in the focused crawling setup, the search continues to reach as many goal states as possible within the given resources.
Changing the success criterion of existing search algorithms is not enough. Most informed search algorithms are based on a heuristic function that estimates the distance from a node to the nearest goal node. Such heuristics are usually not appropriate for multiple-goal search.
Sponsored link:
dmitry@cs.technion.ac.il
Shaul Markovitch
shaulm@cs.technion.ac.il
Computer Science Department
Technion, Haifa 32000, Israel
Category: Artificial Intelligence
- 8-Valent Fuzzy Logic for Iris Recognition and Biometry
- A Heuristic Search Planner for First-Order MDPs
- A Visual Entity-Relationship Model for Constraint-Based
- Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processe
- Asynchronous Partial Overlay: A New Algorithm for Solving Distributed Constraint Satisfaction Problems

Artificial Intelligence