|
|
+
Yaron Gonen 2013-02-19, 10:49
+
Robert Evans 2013-02-19, 17:34
-
Re: InputFormat for some REST apiMohammad Tariq 2013-02-19, 17:39
Good points sir. Specially the second one. How the splits will get
generated? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Feb 19, 2013 at 11:04 PM, Robert Evans <[EMAIL PROTECTED]> wrote: > I don't know of any input format that will do this out of the box. But it > should not be that hard to write one. There are two big issues here. > > > 1. the data you are reading form the API really needs to be static, or > you could get some very odd inconsistencies. For example a node dies after > a map task has finished and not all of the reducers got the data, so the > map task is rerun and some of the reducers have some old data, and some of > the reducers have new data. This is the main reason to download the data > before processing it. You can work around this by using the input format > to run a map only job that then writes the data out to a file before > processing it the rest of the way. > 2. You need a good way to partition the data from the API. This can > be difficult unless the REST API provides a logical way to split this up. > > --Bobby > > From: Yaron Gonen <[EMAIL PROTECTED]> > Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Tuesday, February 19, 2013 4:49 AM > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Subject: InputFormat for some REST api > > Hi, > Do you know of any InputFormat implemented for some REST api provider? > Usually when one needs to process data that is accessible only by REST, > one should try to download the data first someone, but what if you cannot > download it? > > thanks > +
Yaron Gonen 2013-02-19, 19:28
+
Alex Thieme 2013-02-19, 19:48
|