Dan Yi 2013-02-05, 18:54
Johnny Zhang 2013-02-05, 19:04
-Re: index of the lines in file
Cheolsoo Park 2013-02-06, 01:13
You can use RANK to append the line number to each record:
A = LOAD 'input.txt' AS (id);
B = RANK A;
C = FILTER B BY id == 'item17';
D = FOREACH C GENERATE rank_A;
This will return the row number of id 'item17'. RANK is added in Pig 0.11,
which hasn't been released yet. If you want to use it before the release,
you will have to build Pig from branch 0.11 by yourself. Here is the JIRA:
On Tue, Feb 5, 2013 at 11:04 AM, Johnny Zhang <[EMAIL PROTECTED]> wrote:
> Please correct me if I am wrong. I don't think you can get direct index for
> each row in Pig. HBase is a better candidate doing this.
> But you can write your own UDF to add row number of tuples in a bag.
> On Tue, Feb 5, 2013 at 10:54 AM, Dan Yi <[EMAIL PROTECTED]> wrote:
> > Hi,
> > I have a file with one id(string) at each line,
> > Is there easy way given any id I can find its
> > Index in the file, eg the line #?
> > For example, the file looks like this:
> > item1
> > item4
> > itme17
> > itme8
> > ..
> > Given item17, its index in the file is 3
> > thanks