This class can be used to crawl a site and retrieve the the URL of all links. It can retrieve a page of a site and follow all links recursively to retrieve all the site URLs. The class can restrict the crawling to URLs with a given extension and avoids accessing pages listed in the site robots.txt file, or pages set with the no index or no follow meta tags.