Main algorithm

  1. Input options and necessary data (from command line)
  2. Start recursive URL retrieval algorithm
       retrieveData(URL, maxDepth, Graph) {
         get current timestamp;
         store timestamp in graph
         retrieveData(URL, maxDepth, Graph, 0);
       }
       store graph on disk
       
       retrieveData(URL, maxDepth, Graph, currentDepth) {
         if (currentDepth < maxDepth) {
           retrieve URL;
           parse HTML, get list of sub-URLs;
           for each sub-URL in list {
             store arc (URL, sub-URL) in graph
             retrieveData(sub-URL, maxDepth, graph, currentDepth + 1);
           }
         }
       }
    



Leo Liberti 2008-01-12