I think the latter would be better. Since the LoadFunc would be decoupled
from the data exporter you could schedule the exporting independent of the
loading. We do something similar, without the $query part.
On Tue, Dec 11, 2012 at 1:10 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> I was working on a LoadFunc and needed some ideas/second opinion on the
> best way to do this:
> 1. We use an API to download data from database as flat-files.
> - A query is given with table name and fields required to extract
> 2. Once 1. is done upload data to HDFS
> 3. Upload the schema file to HDFS
> 4. LoadFunc to read the schema file and parse data
> A strict requirement is to hide the details of the location of these HDFS
> files from the user issuing the pig query. For a user it could look as
> simple as:
> A = load 'scheme://SampleTable' using CustomLoader('$query');
> User here only issues the load statement on table with a query and API
> calls for importing from database could happen in the background.
> What would be the best way to do this? Is it better to do the above as part
> of LoadFunc, or would it rather be beneficial to do it separate and somehow
> communicate the location from API import to LoadFunc?
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*