Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Question regarding a custom LoadFunc implementation


Copy link to this message
-
Re: Question regarding a custom LoadFunc implementation
Thanks Bill. Any ideas on how to hide the location of HDFS files from the
end user?

On Tue, Dec 11, 2012 at 9:42 PM, Bill Graham <[EMAIL PROTECTED]> wrote:

> I think the latter would be better. Since the LoadFunc would be decoupled
> from the data exporter you could schedule the exporting independent of the
> loading. We do something similar, without the $query part.
>
>
> On Tue, Dec 11, 2012 at 1:10 AM, Prashant Kommireddi <[EMAIL PROTECTED]
> >wrote:
>
> > I was working on a LoadFunc and needed some ideas/second opinion on the
> > best way to do this:
> >
> >
> >    1. We use an API to download data from database as flat-files.
> >       - A query is given with table name and fields required to extract
> > data
> >       2. Once 1. is done upload data to HDFS
> >    3. Upload the schema file to HDFS
> >    4. LoadFunc to read the schema file and parse data
> >
> > A strict requirement is to hide the details of the location of these HDFS
> > files from the user issuing the pig query. For a user it could look as
> > simple as:
> >
> > A = load 'scheme://SampleTable' using CustomLoader('$query');
> >
> > User here only issues the load statement on table with a query and API
> > calls for importing from database could happen in the background.
> >
> > What would be the best way to do this? Is it better to do the above as
> part
> > of LoadFunc, or would it rather be beneficial to do it separate and
> somehow
> > communicate the location from API import to LoadFunc?
> >
> > Thanks,
> >
> > Prashant
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> [EMAIL PROTECTED] going forward.*
>