Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Multiple files with AvroStorage and comma separated lists


Copy link to this message
-
Re: Multiple files with AvroStorage and comma separated lists
Hi Guys,

Patch finally submitted: https://issues.apache.org/jira/browse/PIG-2492

Best,

stan

P.S. I classified it as an "improvement" rather than a "bug" since I
don't know what the original author(s) intended.

On Tue, Jan 24, 2012 at 9:45 PM, Russell Jurney
<[EMAIL PROTECTED]> wrote:
> Please submit.
>
> Russell Jurney
> twitter.com/rjurney
> [EMAIL PROTECTED]
> datasyndrome.com
>
> On Jan 24, 2012, at 8:22 AM, Stan Rosenberg
> <[EMAIL PROTECTED]> wrote:
>
>> Philipp,
>>
>> I would say that it is a bug.  I ran into the same problem some time
>> ago.  Essentially, AvroStorage does not recognize globs and does not
>> recognize commas, both of which
>> are supported by hadoop's FileInputFormat.  I ended up patching
>> AvroStorage to make it compatible with hadoop's semantics of input
>> paths.  I haven't submitted a patch though.
>> If there is some interest, I'd be more than glad to submit it.
>>
>> Bets,
>>
>> stan
>>
>>
>> On Tue, Jan 24, 2012 at 4:26 AM, Philipp <[EMAIL PROTECTED]> wrote:
>>> Dear Pig users,
>>>
>>> I tried to load several files with AvroStorage by using a comma separated
>>> list. The statement I used is:
>>>
>>> test_data= LOAD 'repo_1/part-r-00000.avro,repo_2/part-r-00000.avro' USING
>>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>
>>> Pig states that no input paths were specified in job. Please see the
>>> stacktrace below.
>>> I tried pig version0.8.1-cdh3u2 and 0.9.1.
>>>
>>> Does anyone observe the same behavior? Is it a bug or a feature?
>>>
>>> Thanks, Philipp
>>>
>>>
>>>
>>>
>>>
>>> /Stacktrace:/
>>>
>>> rg.apache.pig.backend.executionengine.ExecException: ERROR 2118: No input
>>> paths specified in job
>>>    at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
>>>    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>>>    at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>>>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>>>    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>>>    at
>>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>>>    at
>>> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>>>    at java.lang.Thread.run(Thread.java:679)
>>> Caused by: java.io.IOException: No input paths specified in job
>>>    at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:186)
>>>    at
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>>>    at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
>>>    ... 7 more
>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB