OutWit Hub: A datafiend’s best friend.
I have an unnatural propensity towards stashing away data regardless of value. My data folders are much like your crazy relative’s attic: A disgusting mountain of unidentifiable stuff that will never be used. Some call this a problem but Mores Law is logarithmically proving them wrong. My attic is 750 GB’s large. With OutWit Hub I am going to ignore my “problem” and download the internet.
OutWit Hub is a data collection automator that works as a Firefox extension. The application has a multitude of tools that allow you to pick at the internet like it is an estate sale. Let’s say that you are interested in owning all the pictures in your friend’s Google Picasa account. Hub’s Images tool will allow you to crawl through subfolders and download images en masse. The same process can be undertaken on documents. The tool automatically crawls through a predetermined list of pages and downloads documents of interest.
I believe OW Hub’s core power lies in the ability to easily build scripts that will read source code and download coded information in tabular format. Let say that you are using realtravel.com to find hotels. You are able to narrow a search to three stars near downtown. The information that realtravel offers you is valuable but is only accessible through their website. You want a spreadsheet of these hotels for future use offline. Using the source filters you can search the appropriate code to extract the ideal information. When you have built a “scraper” that identifies the correct information (name, rank, price, location (lat,long), reviews) Hub will automate the process of migrating the source code to a tabular format (SQL, Excel, CSV).
Coding is not my forte and honestly this tool was daunting at first. But after just a few hours of tinkering I was able to use it to grow my stash of data. Next time you have an insatiable desire for data locked behind some html code, I recommend giving this tool a try.