Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> LOAD multiple files with glob


+
Bart Verwilst 2012-11-23, 20:45
+
Deepak Tiwari 2012-11-23, 23:41
Copy link to this message
-
Re: LOAD multiple files with glob
Hello,

Thanks for your suggestion!
I switch my avro variable to avro = load '$INPUT' USING AvroStorage();

However I get the same results this way:

$ pig -p INPUT=/data/2012/trace_ejb3/2012-01-02.avro avro-test.pig
which: no hbase in
(:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin)
<snip>
avro: {id: long,timestamp: long,latitude: int,longitude: int,speed:
int,heading: int,terminalid: int,customerid: chararray,mileage:
int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: (id:
long,value: chararray,pkey: chararray)}}
$ pig -p INPUT="/data/2012/trace_ejb3/2012-01-0[12].avro" avro-test.pig
which: no hbase in
(:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin)
<snip>
2012-11-24 14:11:17,309 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error. null
Caused by: java.net.URISyntaxException: Illegal character in path at
index 31: /data/2012/trace_ejb3/2012-01-0[12].avro
$ pig -p INPUT='/data/2012/trace_ejb3/2012-01-0[12].avro' avro-test.pig
which: no hbase in
(:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.6.0_33/bin/:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin)
<snip>
2012-11-24 14:12:05,085 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error. null
Details at logfile: /var/lib/hadoop-hdfs/pig_1353762722742.log
Caused by: java.net.URISyntaxException: Illegal character in path at
index 31: /data/2012/trace_ejb3/2012-01-0[12].avro
Deepak Tiwari schreef op 24.11.2012 00:41:
> Hi,
>
> I dont have a system to test it right now, but I have been passing it
> using
> under parameter -p and it works.
>
> change line to  accept parameters like         avro = load '$INPUT'
> USING
> AvroStorage();
>
> bin/pig -p INPUT="/data/2012/trace_ejb3/2012-**01-0[12].avro"
> <scriptName>
>
> I think if you dont give double quotes then the expansion is done by
> OS.
>
> Please let us know if it doesnt work...
>
>
>
> On Fri, Nov 23, 2012 at 12:45 PM, Bart Verwilst <[EMAIL PROTECTED]>
> wrote:
>
>> Hello,
>>
>> I have the following files on HDFS:
>>
>> -rw-r--r--   3 hdfs supergroup   22989179 2012-11-22 11:17
>> /data/2012/trace_ejb3/2012-01-**01.avro
>> -rw-r--r--   3 hdfs supergroup  240551819 2012-11-22 14:27
>> /data/2012/trace_ejb3/2012-01-**02.avro
>> -rw-r--r--   3 hdfs supergroup  324464635 2012-11-22 18:28
>> /data/2012/trace_ejb3/2012-01-**03.avro
>> -rw-r--r--   3 hdfs supergroup  345526418 2012-11-22 21:30
>> /data/2012/trace_ejb3/2012-01-**04.avro
>> -rw-r--r--   3 hdfs supergroup  351322916 2012-11-23 00:28
>> /data/2012/trace_ejb3/2012-01-**05.avro
>> -rw-r--r--   3 hdfs supergroup  325953043 2012-11-23 04:32
>> /data/2012/trace_ejb3/2012-01-**06.avro
>> -rw-r--r--   3 hdfs supergroup  107019156 2012-11-23 05:58
>> /data/2012/trace_ejb3/2012-01-**07.avro
>> -rw-r--r--   3 hdfs supergroup   46392850 2012-11-23 06:37
>> /data/2012/trace_ejb3/2012-01-**08.avro
>> -rw-r--r--   3 hdfs supergroup  361970930 2012-11-23 10:06
>> /data/2012/trace_ejb3/2012-01-**09.avro
>> -rw-r--r--   3 hdfs supergroup  398462505 2012-11-23 13:44
>> /data/2012/trace_ejb3/2012-01-**10.avro
>> -rw-r--r--   3 hdfs supergroup  400785976 2012-11-23 17:17
>> /data/2012/trace_ejb3/2012-01-**11.avro
>> -rw-r--r--   3 hdfs supergroup  400027565 2012-11-23 20:43
>> /data/2012/trace_ejb3/2012-01-**12.avro
>>
>> Using Pig 0.10.0-cdh4.1.2, i try to load those files, and describe
>> them.
>>
>> REGISTER 'hdfs:///lib/avro-1.7.2.jar';
>> REGISTER 'hdfs:///lib/json-simple-1.1.**1.jar';
>> REGISTER 'hdfs:///lib/piggybank.jar';
>>
>> DEFINE AvroStorage
>> org.apache.pig.piggybank.**storage.avro.AvroStorage();
>>
>> avro = load '/data/2012/trace_ejb3/2012-**01-01.avro' USING
>> AvroStorage();
>>
>> describe avro;
>>
>>
>> This works, same with 2012-01-02.avro.
>>
>> However, as soon as i want to include multiple files, no dice.
+
Russell Jurney 2012-11-24, 19:23
+
Bart Verwilst 2012-11-25, 11:02
+
Cheolsoo Park 2012-11-25, 14:33
+
Bart Verwilst 2012-11-25, 20:25
+
Cheolsoo Park 2012-11-26, 09:45
+
Bart Verwilst 2012-11-26, 13:19
+
Bart Verwilst 2012-11-26, 14:33
+
Bart Verwilst 2012-11-26, 15:50
+
Bart Verwilst 2012-11-26, 12:48
+
Bart Verwilst 2012-11-25, 20:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB