Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Question on custom store function


Copy link to this message
-
Re: Question on custom store function
Raghu Angadi 2011-11-04, 22:31
You need to set output path to '/Users/felix/Documents/pig/multi_store_output'
in your setStoreLocation().
Alternately for clarity, you could modify your store udf to be more like:
store load_log INTO '/Users/felix/Documents/pig/multi_store_output' using
MyMultiStorage('ns_{0}/site_{1}', '2,1', '1,2');

The reason FileOutputFormat needs a real path is that, at run time hadoop
actually uses a temporary path then move the output to correct path if the
job succeeds.

Raghu.

On Thu, Nov 3, 2011 at 9:45 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Don't use FileOutputFormat? Or rather, use something that extends it and
> overrides the validation.
>
> On Wed, Nov 2, 2011 at 3:19 PM, felix gao <[EMAIL PROTECTED]> wrote:
>
> > If you don't call that funciton. Hadoop is going to throw exception for
> not
> > having output set for the job.
> > something like
> > Caused by: org.apache.hadoop.mapred.InvalidJobConfException: Output
> > directory not set.
> > at
> >
> >
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120)
> > at
> >
> >
> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:87)
> >
> > So i have to set it and then somehow delete it after pig completes.
> >
> >
> >
> >
> > On Wed, Nov 2, 2011 at 3:00 PM, Ashutosh Chauhan <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Then, don't call FileOutputFormat.setOutputPath(job, new
> Path(location));
> > > Looks like I am missing something here.
> > >
> > > Ashutosh
> > > On Wed, Nov 2, 2011 at 14:10, felix gao <[EMAIL PROTECTED]> wrote:
> > >
> > > > Ashutosh,
> > > >
> > > > I problem is I don't wan to use that location at all since I am
> > > > constructing the output location based on tuple input. The location
> is
> > > just
> > > > a dummy holder for me to substitute the right parameters
> > > >
> > > > Felix
> > > >
> > > > On Wed, Nov 2, 2011 at 10:47 AM, Ashutosh Chauhan <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hey Felix,
> > > > >
> > > > > >> The only problem is that in the setStoreLocation function we
> have
> > to
> > > > > call
> > > > > >> FileOutputFormat.setOutputPath(job, new Path(location));
> > > > >
> > > > > Cant you massage location to appropriate string you want to?
> > > > >
> > > > > Ashutosh
> > > > >
> > > > > On Tue, Nov 1, 2011 at 18:07, felix gao <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > I have wrote a custom store function that primarily based on the
> > > > > > multi-storage store function.  They way I use it is
> > > > > >
> > > > > >
> > > > > > store load_log INTO
> > > > > > '/Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}'
> > using
> > > > > > MyMultiStorage('2,1', '1,2');
> > > > > > where {0} and {1} will be substituted with the tuple index at 0
> and
> > > > index
> > > > > > at 1.  Everything is fine and all the data is written to the
> > correct
> > > > > place.
> > > > > >  The only problem is that in the setStoreLocation function we
> have
> > to
> > > > > call
> > > > > > FileOutputFormat.setOutputPath(job, new Path(location)); i have
> > > > > > 'Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}' as
> > my
> > > > > output
> > > > > > location so there is actually a folder created in my fs with
> ns_{0}
> > > > > > and site_{1}.  Is there a way to tell hadoop not to create those
> > > output
> > > > > > directory?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Felix
> > > > > >
> > > > >
> > > >
> > >
> >
>