Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Use Filename in Tuple


Copy link to this message
-
Re: Use Filename in Tuple
Similarly, is it possible to insert some literal values to a tuple stream?

For example, when I invoke my Pig script, I already know what data source is
(say, it's from filename_2011-02-03), so I can just pass it to Pig using
-param, and I want to insert this known file name to the tuple stream. How
can I do that?

Example, I have:

grunt> A = LOAD 'aa' AS (f1, f2);
grunt> DUMP A;
(aa,bb)
(cc,dd)

I want to do something like:

grunt> B = FOREACH A GENERATE f1, "filename-2011-02-03";

Thanks.

On Thu, Feb 3, 2011 at 7:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> In pig 6, you can hook into bindTo() and save the file name.
>
> In pig 8 you have to find your way to the underlying InputSplit via
> PigSplit.getWrappedSplit(), cast it as FileSplit, and call getPath()
> on it.. I think. Haven't done this.
>
> This will totally break if you have splitCombination turned on, of
> course, as pig can silently move to a different file under you, so
> you'd have to turn that off.
>
> D
>
> On Thu, Feb 3, 2011 at 3:52 PM, Kim Vogt <[EMAIL PROTECTED]> wrote:
> > Hey,
> >
> > I have a bunch of files where the filename is significant.  I'm loading
> the
> > files by supplying the top level directory that contains the files.  Is
> > there a way to capture the filename of the file and append to the tuple
> of
> > data that's in that file?
> >
> > -Kim
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB