Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - InputFormat for some REST api


+
Yaron Gonen 2013-02-19, 10:49
+
Robert Evans 2013-02-19, 17:34
Copy link to this message
-
Re: InputFormat for some REST api
Mohammad Tariq 2013-02-19, 17:39
Good points sir. Specially the second one. How the splits will get
generated?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Feb 19, 2013 at 11:04 PM, Robert Evans <[EMAIL PROTECTED]> wrote:

> I don't know of any input format that will do this out of the box.  But it
> should not be that hard to write one.  There are two big issues here.
>
>
>    1. the data you are reading form the API really needs to be static, or
>    you could get some very odd inconsistencies. For example a node dies after
>    a map task has finished and not all of the reducers got the data, so the
>    map task is rerun and some of the reducers have some old data, and some of
>    the reducers have new data.  This is the main reason to download the data
>    before processing it.  You can work around this by using the input format
>    to run a map only job that then writes the data out to a file before
>    processing it the rest of the way.
>    2. You need a good way to partition the data from the API.  This can
>    be difficult unless the REST API provides a logical way to split this up.
>
> --Bobby
>
> From: Yaron Gonen <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Tuesday, February 19, 2013 4:49 AM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: InputFormat for some REST api
>
> Hi,
> Do you know of any InputFormat implemented for some REST api provider?
> Usually when one needs to process data that is accessible only by REST,
> one should try to download the data first someone, but what if you cannot
> download it?
>
> thanks
>
+
Yaron Gonen 2013-02-19, 19:28
+
Alex Thieme 2013-02-19, 19:48