Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Use Filename in Tuple


Copy link to this message
-
Re: Use Filename in Tuple
wow, I almost got it right. Double quote, fails. Single quote, works.

Thanks.

On Thu, Feb 3, 2011 at 9:40 PM, Kim Vogt <[EMAIL PROTECTED]> wrote:

> This should work:
>
> grunt> B = FOREACH A GENERATE f1, 'filename-2011-02-03';
>
> or
>
> grunt> B = FOREACH A GENERATE f1, '$paramName';
>
> -Kim
>
> On Thu, Feb 3, 2011 at 8:32 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
>
> > Similarly, is it possible to insert some literal values to a tuple
> stream?
> >
> > For example, when I invoke my Pig script, I already know what data source
> > is
> > (say, it's from filename_2011-02-03), so I can just pass it to Pig using
> > -param, and I want to insert this known file name to the tuple stream.
> How
> > can I do that?
> >
> > Example, I have:
> >
> > grunt> A = LOAD 'aa' AS (f1, f2);
> > grunt> DUMP A;
> > (aa,bb)
> > (cc,dd)
> >
> > I want to do something like:
> >
> > grunt> B = FOREACH A GENERATE f1, "filename-2011-02-03";
> >
> > Thanks.
> >
> > On Thu, Feb 3, 2011 at 7:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
> >
> > > In pig 6, you can hook into bindTo() and save the file name.
> > >
> > > In pig 8 you have to find your way to the underlying InputSplit via
> > > PigSplit.getWrappedSplit(), cast it as FileSplit, and call getPath()
> > > on it.. I think. Haven't done this.
> > >
> > > This will totally break if you have splitCombination turned on, of
> > > course, as pig can silently move to a different file under you, so
> > > you'd have to turn that off.
> > >
> > > D
> > >
> > > On Thu, Feb 3, 2011 at 3:52 PM, Kim Vogt <[EMAIL PROTECTED]> wrote:
> > > > Hey,
> > > >
> > > > I have a bunch of files where the filename is significant.  I'm
> loading
> > > the
> > > > files by supplying the top level directory that contains the files.
>  Is
> > > > there a way to capture the filename of the file and append to the
> tuple
> > > of
> > > > data that's in that file?
> > > >
> > > > -Kim
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB