I am big fan of Yahoo Pipes (when they are not silently deleting code of mine on bogus copyright complaints :) Anyway for all benefits of using Pipes there is one notably lacking area – scraping data of HTML pages. There are limits on data size and it often requires plenty of regexp to get input into usable form.
Dapp Factory is online tool that excels at flexible scraping of web pages into numerous formats.
What it does
There are two parts to the process:
- Choosing and processing your input, thus creating Dapp.
- Using Dapp to convert input into any of supported output formats.
Choosing input is relatively easy, Dapper loads target web page inside frame and you include/exclude areas you want simply by clicking. Creating rules for dynamic pages may require doing this for few different pages and taking in account search forms and such.
- easy and visual setup of input;
- multiply output formats, from feeds and widgets and plain text programming formats (XML, JSON, CSV) and email alerts;
- APIs for developers to add own output formats and fetch data from Dapps (mostly PHP).
Working with Dapper is easier than with Yahoo Pipes but there is also less control. There are basic filtering capabilities here and there, but mostly you are boxed by what is on page and what functions specific output offers.
Another issue is reliability. Service to free accounts provided on best effort basis. In practice this means getting output can be somewhat slow and occasionally it fails altogether with error message. Caused John’s Background Switcher to crash for example.
Convenient and easy to use service, slightly spoiled by lack of reliability. For myself it fits nicely between simplicity of Page2RSS and complexity of Yahoo Pipes.