Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Input and output path

Copy link to this message
Re: Input and output path

I am suggesting setting up a whole Hive warehouse. This way your
folders will look like
All the partitions' metadata will be kept in a RDBMS, so when you
query them with Hive it will look like
select * from yourdataset where date = 2012-09-11
and it will be fast

HCatalog is a layer that provides this Hive's functionality to Pig and
MapReduce, so in Pig you can FILTER by those dates.

Best Regards

On Tue, Sep 11, 2012 at 3:29 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 10, 2012 at 4:17 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:
>> Mohit,
>> I guess you could use parameters substitution here
>> http://wiki.apache.org/pig/ParameterSubstitution
>> thanks this works.
>> Also, a note about your architecture:
> Are you suggesting change to the path names or your suggestion is to use
> HCatalog with pig?
>> You can consider using Hive partitions to effectively select
>> appropriate dates in the folder names. But as your tool is Pig, not
>> Hive, you can use HCatalog as a layer
>> Best Regards
>> On Tue, Sep 11, 2012 at 3:11 AM, Mohit Anchlia <[EMAIL PROTECTED]>
>> wrote:
>> > Our input path is something like YYYY/MM/DD/HH/input and we like to write
>> > to YYYY/MM/DD/HH/output . Is it possible to get the input path as a
>> String
>> > and convert it to YYYY/MM/DD/HH/output that I can use in "store into"
>> > clause?