Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Question regarding a custom LoadFunc implementation


Copy link to this message
-
Question regarding a custom LoadFunc implementation
I was working on a LoadFunc and needed some ideas/second opinion on the
best way to do this:
   1. We use an API to download data from database as flat-files.
      - A query is given with table name and fields required to extract data
      2. Once 1. is done upload data to HDFS
   3. Upload the schema file to HDFS
   4. LoadFunc to read the schema file and parse data

A strict requirement is to hide the details of the location of these HDFS
files from the user issuing the pig query. For a user it could look as
simple as:

A = load 'scheme://SampleTable' using CustomLoader('$query');

User here only issues the load statement on table with a query and API
calls for importing from database could happen in the background.

What would be the best way to do this? Is it better to do the above as part
of LoadFunc, or would it rather be beneficial to do it separate and somehow
communicate the location from API import to LoadFunc?

Thanks,

Prashant
+
Bill Graham 2012-12-11, 16:12
+
Prashant Kommireddi 2012-12-11, 16:20
+
Bill Graham 2012-12-11, 23:06