Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Use Filename in Tuple


Copy link to this message
-
Re: Use Filename in Tuple
Dexin Wang 2011-02-04, 04:32
Similarly, is it possible to insert some literal values to a tuple stream?

For example, when I invoke my Pig script, I already know what data source is
(say, it's from filename_2011-02-03), so I can just pass it to Pig using
-param, and I want to insert this known file name to the tuple stream. How
can I do that?

Example, I have:

grunt> A = LOAD 'aa' AS (f1, f2);
grunt> DUMP A;
(aa,bb)
(cc,dd)

I want to do something like:

grunt> B = FOREACH A GENERATE f1, "filename-2011-02-03";

Thanks.

On Thu, Feb 3, 2011 at 7:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> In pig 6, you can hook into bindTo() and save the file name.
>
> In pig 8 you have to find your way to the underlying InputSplit via
> PigSplit.getWrappedSplit(), cast it as FileSplit, and call getPath()
> on it.. I think. Haven't done this.
>
> This will totally break if you have splitCombination turned on, of
> course, as pig can silently move to a different file under you, so
> you'd have to turn that off.
>
> D
>
> On Thu, Feb 3, 2011 at 3:52 PM, Kim Vogt <[EMAIL PROTECTED]> wrote:
> > Hey,
> >
> > I have a bunch of files where the filename is significant.  I'm loading
> the
> > files by supplying the top level directory that contains the files.  Is
> > there a way to capture the filename of the file and append to the tuple
> of
> > data that's in that file?
> >
> > -Kim
> >
>