Thanks, and excellent points.
I just wanted to know if someone is working this way and if it is a common
use-case.
On Tue, Feb 19, 2013 at 7:39 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Good points sir. Specially the second one. How the splits will get
> generated?
>
> Warm Regards,
> Tariq
>
https://mtariq.jux.com/> cloudfront.blogspot.com
>
>
> On Tue, Feb 19, 2013 at 11:04 PM, Robert Evans <[EMAIL PROTECTED]>wrote:
>
>> I don't know of any input format that will do this out of the box. But
>> it should not be that hard to write one. There are two big issues here.
>>
>>
>> 1. the data you are reading form the API really needs to be static,
>> or you could get some very odd inconsistencies. For example a node dies
>> after a map task has finished and not all of the reducers got the data, so
>> the map task is rerun and some of the reducers have some old data, and some
>> of the reducers have new data. This is the main reason to download the
>> data before processing it. You can work around this by using the input
>> format to run a map only job that then writes the data out to a file before
>> processing it the rest of the way.
>> 2. You need a good way to partition the data from the API. This can
>> be difficult unless the REST API provides a logical way to split this up.
>>
>> --Bobby
>>
>> From: Yaron Gonen <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Date: Tuesday, February 19, 2013 4:49 AM
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: InputFormat for some REST api
>>
>> Hi,
>> Do you know of any InputFormat implemented for some REST api provider?
>> Usually when one needs to process data that is accessible only by REST,
>> one should try to download the data first someone, but what if you cannot
>> download it?
>>
>> thanks
>>
>
>