Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - InputFormat for some REST api


Copy link to this message
-
Re: InputFormat for some REST api
Yaron Gonen 2013-02-19, 19:28
Thanks, and excellent points.
I just wanted to know if someone is working this way and if it is a common
use-case.
On Tue, Feb 19, 2013 at 7:39 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Good points sir. Specially the second one. How the splits will get
> generated?
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Tue, Feb 19, 2013 at 11:04 PM, Robert Evans <[EMAIL PROTECTED]>wrote:
>
>> I don't know of any input format that will do this out of the box.  But
>> it should not be that hard to write one.  There are two big issues here.
>>
>>
>>    1. the data you are reading form the API really needs to be static,
>>    or you could get some very odd inconsistencies. For example a node dies
>>    after a map task has finished and not all of the reducers got the data, so
>>    the map task is rerun and some of the reducers have some old data, and some
>>    of the reducers have new data.  This is the main reason to download the
>>    data before processing it.  You can work around this by using the input
>>    format to run a map only job that then writes the data out to a file before
>>    processing it the rest of the way.
>>    2. You need a good way to partition the data from the API.  This can
>>    be difficult unless the REST API provides a logical way to split this up.
>>
>> --Bobby
>>
>> From: Yaron Gonen <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Date: Tuesday, February 19, 2013 4:49 AM
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: InputFormat for some REST api
>>
>> Hi,
>> Do you know of any InputFormat implemented for some REST api provider?
>> Usually when one needs to process data that is accessible only by REST,
>> one should try to download the data first someone, but what if you cannot
>> download it?
>>
>> thanks
>>
>
>