Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Input and output path


+
Mohit Anchlia 2012-09-10, 23:11
+
Ruslan Al-Fakikh 2012-09-10, 23:17
+
Mohit Anchlia 2012-09-10, 23:29
Copy link to this message
-
Re: Input and output path
Mohit,

I am suggesting setting up a whole Hive warehouse. This way your
folders will look like
/user/hive/warehouse/yourdataset/date=2012-09-11
/user/hive/warehouse/yourdataset/date=2012-09-12
...
All the partitions' metadata will be kept in a RDBMS, so when you
query them with Hive it will look like
select * from yourdataset where date = 2012-09-11
and it will be fast

HCatalog is a layer that provides this Hive's functionality to Pig and
MapReduce, so in Pig you can FILTER by those dates.
http://incubator.apache.org/hcatalog/docs/r0.4.0/loadstore.html#Load+Examples

Best Regards

On Tue, Sep 11, 2012 at 3:29 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 10, 2012 at 4:17 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:
>
>> Mohit,
>>
>> I guess you could use parameters substitution here
>> http://wiki.apache.org/pig/ParameterSubstitution
>>
>> thanks this works.
>
>
>> Also, a note about your architecture:
>>
>
> Are you suggesting change to the path names or your suggestion is to use
> HCatalog with pig?
>
>
>> You can consider using Hive partitions to effectively select
>> appropriate dates in the folder names. But as your tool is Pig, not
>> Hive, you can use HCatalog as a layer
>>
>> Best Regards
>>
>> On Tue, Sep 11, 2012 at 3:11 AM, Mohit Anchlia <[EMAIL PROTECTED]>
>> wrote:
>> > Our input path is something like YYYY/MM/DD/HH/input and we like to write
>> > to YYYY/MM/DD/HH/output . Is it possible to get the input path as a
>> String
>> > and convert it to YYYY/MM/DD/HH/output that I can use in "store into"
>> > clause?
>>
+
MiaoMiao 2012-09-13, 02:31
+
Ruslan Al-Fakikh 2012-09-13, 21:04
+
Aniket Mokashi 2012-09-15, 00:01
+
MiaoMiao 2012-09-17, 05:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB