Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Specifying multiple input paths


Copy link to this message
-
Specifying multiple input paths
I;m having a problem loading data from multiple paths in Pig. What I'm
trying to do is to load data from a range of dates, so I would like to
specify an input of two globbed paths:

x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'

Pig doesn't seem to like this though as it's trying to interpret it as
a single path. The best I can do it to use UNION:

x1 = LOAD '2008/05/{26,27,28,29,30,31}'
x2 = LOAD '2008/06/{1,2}'
x = UNION x1, x2

The downside to this is that I want to parameterize my paths, and
having separate script for each number of paths in the input is
cumbersome.

Is there a better way of doing this? Are there any plans to support
multiple paths, and/or PathFilters?

Thanks,

Tom
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB