Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig/Avro Question


+
Something Something 2012-02-03, 06:07
+
Russell Jurney 2012-02-03, 06:22
+
Philipp 2012-02-03, 10:23
+
Russell Jurney 2012-02-03, 18:55
+
Russell Jurney 2012-02-03, 18:58
Copy link to this message
-
Re: Pig/Avro Question
Check the code in PigAvroInputFormat; it overrides 'listStatus' from
FileInputFormat so that files not ending
in .avro are filtered.

stan

On Fri, Feb 3, 2012 at 1:58 PM, Russell Jurney <[EMAIL PROTECTED]> wrote:
> btw - the weird thing is... I've read the code.  There isn't a filter for
> .avro in there.  Does Hadoop, or Avro itself (not that I can see it is
> involved) do so?
>
> On Fri, Feb 3, 2012 at 10:55 AM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> Hmmm I applied it, but I still can't open files that don't end in .avro
>>
>> On Fri, Feb 3, 2012 at 2:23 AM, Philipp <[EMAIL PROTECTED]> wrote:
>>
>>> This patch fixes this issue:
>>>
>>> https://issues.apache.org/**jira/browse/PIG-2492<https://issues.apache.org/jira/browse/PIG-2492>
>>>
>>>
>>>
>>> On 02/03/2012 07:22 AM, Russell Jurney wrote:
>>>
>>>> I have the same bug. I read the code... there is no obvious fix.  Arg.
>>>>
>>>> On Feb 2, 2012, at 10:07 PM, Something Something<mailinglists19@**
>>>> gmail.com <[EMAIL PROTECTED]>>  wrote:
>>>>
>>>>  In my Pig script I have something like this...
>>>>>
>>>>> %default MY_SCHEMA '/user/xyz/my-schema.json';
>>>>>
>>>>> %default MY_AVRO 'org.apache.pig.piggybank.**
>>>>> storage.avro.AvroStorage(\'$**MY_SCHEMA\')';
>>>>>
>>>>> my_files = LOAD '$MY_FILES' USING $MY_AVRO;
>>>>>
>>>>>
>>>>>
>>>>> What I have noticed is that when MY_FILES contains only one file, it
>>>>> works fine.
>>>>>
>>>>> %default MY_FILES '/user/xyz/file1.avro'
>>>>>
>>>>>
>>>>> But when I use a comma separated list it doesn't work. e.g.
>>>>>
>>>>> %default MY_FILES '/user/xyz/file1.avro, /user/xyz/file2.avro'
>>>>>
>>>>> Basically, I get a message saying something like 'Schema cannot be
>>>>> found'.
>>>>>
>>>>> Is there a way to make it work with multiple files?  Please let me
>>>>> know.  Thanks.
>>>>>
>>>>>
>>>
>>
>>
>> --
>> Russell Jurney
>> twitter.com/rjurney
>> [EMAIL PROTECTED]
>> datasyndrome.com
>>
>
>
>
> --
> Russell Jurney
> twitter.com/rjurney
> [EMAIL PROTECTED]
> datasyndrome.com
+
Russell Jurney 2012-02-03, 21:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB