Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Multiple files with AvroStorage and comma separated lists


Copy link to this message
-
Re: Multiple files with AvroStorage and comma separated lists
Philipp,

I would say that it is a bug.  I ran into the same problem some time
ago.  Essentially, AvroStorage does not recognize globs and does not
recognize commas, both of which
are supported by hadoop's FileInputFormat.  I ended up patching
AvroStorage to make it compatible with hadoop's semantics of input
paths.  I haven't submitted a patch though.
If there is some interest, I'd be more than glad to submit it.

Bets,

stan
On Tue, Jan 24, 2012 at 4:26 AM, Philipp <[EMAIL PROTECTED]> wrote:
> Dear Pig users,
>
> I tried to load several files with AvroStorage by using a comma separated
> list. The statement I used is:
>
> test_data= LOAD 'repo_1/part-r-00000.avro,repo_2/part-r-00000.avro' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> Pig states that no input paths were specified in job. Please see the
> stacktrace below.
> I tried pig version0.8.1-cdh3u2 and 0.9.1.
>
> Does anyone observe the same behavior? Is it a bug or a feature?
>
> Thanks, Philipp
>
>
>
>
>
> /Stacktrace:/
>
> rg.apache.pig.backend.executionengine.ExecException: ERROR 2118: No input
> paths specified in job
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
>    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>    at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>    at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>    at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>    at java.lang.Thread.run(Thread.java:679)
> Caused by: java.io.IOException: No input paths specified in job
>    at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:186)
>    at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
>    ... 7 more
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB