Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> LOAD multiple files with glob


Copy link to this message
-
Re: LOAD multiple files with glob
Hi Bart,

avro = load '/data/2012/trace_ejb3/2012-**01-*.avro' USING AvroStorage();
gives me:
Schema for avro unknown.

This should work. The error that you're getting is not from AvroStorage but
PigServer.

grep -r "Schema for .* unknown" *
src/org/apache/pig/PigServer.java:
 System.out.println("Schema for " + alias + " unknown.");
...

It looks like that you have an error in your Pig script. Can you please
provide your Pig script and the schema of your avro files that reproduce
the error?

Thanks,
Cheolsoo
On Sun, Nov 25, 2012 at 1:02 AM, Bart Verwilst <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I've tried loading a csv with PigStorage(), getting this:
>
>
> txt = load '/import.mysql/trace_ejb3_**2011/part-m-00000' USING
> PigStorage(',');
> describe txt;
>
> Schema for txt unknown.
>
> Maybe this is because of it being a csv, so a schema is hard to figure
> out..
>
> Any other suggestions? Our whole hadoop setup is built around being able
> to selectively load avro files to run our jobs on, if this doesn't work
> then we're pretty much screwed.. :)
>
> Thanks in advance!
>
> Bart
>
> Russell Jurney schreef op 24.11.2012 20:23:
>
>  I suspect the problem is AvroStorage, not globbing. Try this with
>> pigstorage.
>>
>> Russell Jurney twitter.com/rjurney
>>
>>
>> On Nov 24, 2012, at 5:15 AM, Bart Verwilst <[EMAIL PROTECTED]> wrote:
>>
>>  Hello,
>>>
>>> Thanks for your suggestion!
>>> I switch my avro variable to avro = load '$INPUT' USING AvroStorage();
>>>
>>> However I get the same results this way:
>>>
>>> $ pig -p INPUT=/data/2012/trace_ejb3/**2012-01-02.avro avro-test.pig
>>> which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**
>>> java/jdk1.6.0_33/bin/:/usr/**local/bin:/bin:/usr/bin:/usr/**
>>> local/sbin:/usr/sbin:/sbin:/**usr/local/bin)
>>> <snip>
>>> avro: {id: long,timestamp: long,latitude: int,longitude: int,speed:
>>> int,heading: int,terminalid: int,customerid: chararray,mileage:
>>> int,creationtime: long,tracetype: int,traceproperties: {ARRAY_ELEM: (id:
>>> long,value: chararray,pkey: chararray)}}
>>>
>>>
>>> $ pig -p INPUT="/data/2012/trace_ejb3/**2012-01-0[12].avro"
>>> avro-test.pig
>>> which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**
>>> java/jdk1.6.0_33/bin/:/usr/**local/bin:/bin:/usr/bin:/usr/**
>>> local/sbin:/usr/sbin:/sbin:/**usr/local/bin)
>>> <snip>
>>> 2012-11-24 14:11:17,309 [main] ERROR org.apache.pig.tools.grunt.**Grunt
>>> - ERROR 2999: Unexpected internal error. null
>>> Caused by: java.net.URISyntaxException: Illegal character in path at
>>> index 31: /data/2012/trace_ejb3/2012-01-**0[12].avro
>>>
>>>
>>> $ pig -p INPUT='/data/2012/trace_ejb3/**2012-01-0[12].avro'
>>> avro-test.pig
>>> which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**
>>> java/jdk1.6.0_33/bin/:/usr/**local/bin:/bin:/usr/bin:/usr/**
>>> local/sbin:/usr/sbin:/sbin:/**usr/local/bin)
>>> <snip>
>>> 2012-11-24 14:12:05,085 [main] ERROR org.apache.pig.tools.grunt.**Grunt
>>> - ERROR 2999: Unexpected internal error. null
>>> Details at logfile: /var/lib/hadoop-hdfs/pig_**1353762722742.log
>>> Caused by: java.net.URISyntaxException: Illegal character in path at
>>> index 31: /data/2012/trace_ejb3/2012-01-**0[12].avro
>>>
>>>
>>> Deepak Tiwari schreef op 24.11.2012 00:41:
>>>
>>>> Hi,
>>>>
>>>> I dont have a system to test it right now, but I have been passing it
>>>> using
>>>> under parameter -p and it works.
>>>>
>>>> change line to  accept parameters like         avro = load '$INPUT'
>>>> USING
>>>> AvroStorage();
>>>>
>>>> bin/pig -p INPUT="/data/2012/trace_ejb3/**2012-**01-0[12].avro"
>>>> <scriptName>
>>>>
>>>> I think if you dont give double quotes then the expansion is done by OS.
>>>>
>>>> Please let us know if it doesnt work...
>>>>
>>>>
>>>>
>>>> On Fri, Nov 23, 2012 at 12:45 PM, Bart Verwilst <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>  Hello,
>>>>>
>>>>> I have the following files on HDFS:
>>>>>
>>>>> -rw-r--r--   3 hdfs supergroup   22989179 2012-11-22 11:17
>>>>> /data/2012/trace_ejb3/2012-01-****01.avro
>>>>> -rw-r--r--   3 hdfs supergroup  240551819 2012-11-22 14:27