Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> LOAD multiple files with glob


Copy link to this message
-
Re: LOAD multiple files with glob
Hi,

I dont have a system to test it right now, but I have been passing it using
under parameter -p and it works.

change line to  accept parameters like         avro = load '$INPUT' USING
AvroStorage();

bin/pig -p INPUT="/data/2012/trace_ejb3/2012-**01-0[12].avro" <scriptName>

I think if you dont give double quotes then the expansion is done by OS.

Please let us know if it doesnt work...

On Fri, Nov 23, 2012 at 12:45 PM, Bart Verwilst <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have the following files on HDFS:
>
> -rw-r--r--   3 hdfs supergroup   22989179 2012-11-22 11:17
> /data/2012/trace_ejb3/2012-01-**01.avro
> -rw-r--r--   3 hdfs supergroup  240551819 2012-11-22 14:27
> /data/2012/trace_ejb3/2012-01-**02.avro
> -rw-r--r--   3 hdfs supergroup  324464635 2012-11-22 18:28
> /data/2012/trace_ejb3/2012-01-**03.avro
> -rw-r--r--   3 hdfs supergroup  345526418 2012-11-22 21:30
> /data/2012/trace_ejb3/2012-01-**04.avro
> -rw-r--r--   3 hdfs supergroup  351322916 2012-11-23 00:28
> /data/2012/trace_ejb3/2012-01-**05.avro
> -rw-r--r--   3 hdfs supergroup  325953043 2012-11-23 04:32
> /data/2012/trace_ejb3/2012-01-**06.avro
> -rw-r--r--   3 hdfs supergroup  107019156 2012-11-23 05:58
> /data/2012/trace_ejb3/2012-01-**07.avro
> -rw-r--r--   3 hdfs supergroup   46392850 2012-11-23 06:37
> /data/2012/trace_ejb3/2012-01-**08.avro
> -rw-r--r--   3 hdfs supergroup  361970930 2012-11-23 10:06
> /data/2012/trace_ejb3/2012-01-**09.avro
> -rw-r--r--   3 hdfs supergroup  398462505 2012-11-23 13:44
> /data/2012/trace_ejb3/2012-01-**10.avro
> -rw-r--r--   3 hdfs supergroup  400785976 2012-11-23 17:17
> /data/2012/trace_ejb3/2012-01-**11.avro
> -rw-r--r--   3 hdfs supergroup  400027565 2012-11-23 20:43
> /data/2012/trace_ejb3/2012-01-**12.avro
>
> Using Pig 0.10.0-cdh4.1.2, i try to load those files, and describe them.
>
> REGISTER 'hdfs:///lib/avro-1.7.2.jar';
> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar';
> REGISTER 'hdfs:///lib/piggybank.jar';
>
> DEFINE AvroStorage org.apache.pig.piggybank.**storage.avro.AvroStorage();
>
> avro = load '/data/2012/trace_ejb3/2012-**01-01.avro' USING AvroStorage();
>
> describe avro;
>
>
> This works, same with 2012-01-02.avro.
>
> However, as soon as i want to include multiple files, no dice.
>
> avro = load '/data/2012/trace_ejb3/2012-**01-{01,02}.avro' USING
> AvroStorage();
> gives me:
> 2012-11-23 21:41:07,475 [main] ERROR org.apache.pig.tools.grunt.**Grunt -
> ERROR 2999: Unexpected internal error. null
> Caused by: java.net.URISyntaxException: Illegal character in path at index
> 30: /data/2012/trace_ejb3/2012-01-**{01,02}.avro
>
> avro = load '/data/2012/trace_ejb3/2012-**01-*.avro' USING AvroStorage();
> gives me:
> Schema for avro unknown.
>
> avro = load '/data/2012/trace_ejb3/2012-**01-0[12].avro' USING
> AvroStorage();
> also gives me:
> Caused by: java.net.URISyntaxException: Illegal character in path at index
> 31: /data/2012/trace_ejb3/2012-01-**0[12].avro
>
> What am i doing wrong here? According to http://hadoop.apache.org/docs/**
> r0.21.0/api/org/apache/hadoop/**fs/FileSystem.html#globStatus%**
> 28org.apache.hadoop.fs.Path%29<http://hadoop.apache.org/docs/r0.21.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus%28org.apache.hadoop.fs.Path%29>, this should all be acceptable input?
>
> Thanks in advance!
>
> Kind regards,
>
> Bart
>