Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Question on custom store function


Copy link to this message
-
Re: Question on custom store function
You need to set output path to '/Users/felix/Documents/pig/multi_store_output'
in your setStoreLocation().
Alternately for clarity, you could modify your store udf to be more like:
store load_log INTO '/Users/felix/Documents/pig/multi_store_output' using
MyMultiStorage('ns_{0}/site_{1}', '2,1', '1,2');

The reason FileOutputFormat needs a real path is that, at run time hadoop
actually uses a temporary path then move the output to correct path if the
job succeeds.

Raghu.

On Thu, Nov 3, 2011 at 9:45 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Don't use FileOutputFormat? Or rather, use something that extends it and
> overrides the validation.
>
> On Wed, Nov 2, 2011 at 3:19 PM, felix gao <[EMAIL PROTECTED]> wrote:
>
> > If you don't call that funciton. Hadoop is going to throw exception for
> not
> > having output set for the job.
> > something like
> > Caused by: org.apache.hadoop.mapred.InvalidJobConfException: Output
> > directory not set.
> > at
> >
> >
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120)
> > at
> >
> >
> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:87)
> >
> > So i have to set it and then somehow delete it after pig completes.
> >
> >
> >
> >
> > On Wed, Nov 2, 2011 at 3:00 PM, Ashutosh Chauhan <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Then, don't call FileOutputFormat.setOutputPath(job, new
> Path(location));
> > > Looks like I am missing something here.
> > >
> > > Ashutosh
> > > On Wed, Nov 2, 2011 at 14:10, felix gao <[EMAIL PROTECTED]> wrote:
> > >
> > > > Ashutosh,
> > > >
> > > > I problem is I don't wan to use that location at all since I am
> > > > constructing the output location based on tuple input. The location
> is
> > > just
> > > > a dummy holder for me to substitute the right parameters
> > > >
> > > > Felix
> > > >
> > > > On Wed, Nov 2, 2011 at 10:47 AM, Ashutosh Chauhan <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hey Felix,
> > > > >
> > > > > >> The only problem is that in the setStoreLocation function we
> have
> > to
> > > > > call
> > > > > >> FileOutputFormat.setOutputPath(job, new Path(location));
> > > > >
> > > > > Cant you massage location to appropriate string you want to?
> > > > >
> > > > > Ashutosh
> > > > >
> > > > > On Tue, Nov 1, 2011 at 18:07, felix gao <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > I have wrote a custom store function that primarily based on the
> > > > > > multi-storage store function.  They way I use it is
> > > > > >
> > > > > >
> > > > > > store load_log INTO
> > > > > > '/Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}'
> > using
> > > > > > MyMultiStorage('2,1', '1,2');
> > > > > > where {0} and {1} will be substituted with the tuple index at 0
> and
> > > > index
> > > > > > at 1.  Everything is fine and all the data is written to the
> > correct
> > > > > place.
> > > > > >  The only problem is that in the setStoreLocation function we
> have
> > to
> > > > > call
> > > > > > FileOutputFormat.setOutputPath(job, new Path(location)); i have
> > > > > > 'Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}' as
> > my
> > > > > output
> > > > > > location so there is actually a folder created in my fs with
> ns_{0}
> > > > > > and site_{1}.  Is there a way to tell hadoop not to create those
> > > output
> > > > > > directory?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Felix
> > > > > >
> > > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB