Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> index of the lines in file


Copy link to this message
-
Re: index of the lines in file
You can use RANK to append the line number to each record:

A = LOAD 'input.txt' AS (id);
B = RANK A;
C = FILTER B BY id == 'item17';
D = FOREACH C GENERATE rank_A;
DUMP D;

This will return the row number of id 'item17'. RANK is added in Pig 0.11,
which hasn't been released yet. If you want to use it before the release,
you will have to build Pig from branch 0.11 by yourself. Here is the JIRA:
https://issues.apache.org/jira/browse/PIG-2353

Thanks,
Cheolsoo
On Tue, Feb 5, 2013 at 11:04 AM, Johnny Zhang <[EMAIL PROTECTED]> wrote:

> Please correct me if I am wrong. I don't think you can get direct index for
> each row in Pig. HBase is a better candidate doing this.
>
> But you can write your own UDF to add row number of tuples in a bag.
>
> Johnny
>
>
> On Tue, Feb 5, 2013 at 10:54 AM, Dan Yi <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I have a file with one id(string) at each line,
> > Is there easy way given any id I can find its
> > Index in the file, eg the line #?
> >
> > For example, the file looks like this:
> > item1
> > item4
> > itme17
> > itme8
> > ..
> >
> > Given item17, its index in the file is 3
> >
> > thanks
> >
> >
>