|
DIPESH KUMAR SINGH
2012-05-16, 17:41
Thejas Nair
2012-05-18, 01:20
DIPESH KUMAR SINGH
2012-05-18, 18:34
Rajesh Balamohan
2012-05-20, 02:32
Rajesh Balamohan
2012-05-20, 02:36
DIPESH KUMAR SINGH
2012-05-20, 02:45
Rajesh Balamohan
2012-05-21, 01:26
Russell Jurney
2012-05-24, 05:12
Subir S
2012-05-24, 05:19
Russell Jurney
2012-05-24, 05:46
Gianmarco De Francisci Mo...
2012-05-24, 05:55
Subir S
2012-05-27, 09:33
Russell Jurney
2012-05-27, 20:56
Dragan Nedeljkovic
2012-05-27, 21:39
Dmitriy Ryaboy
2012-05-27, 23:01
Russell Jurney
2012-05-28, 00:06
Subir S
2012-05-28, 05:51
Rajesh Balamohan
2012-05-28, 22:51
|
-
Create rdbms like sequence in Pig on Pig RelationDIPESH KUMAR SINGH 2012-05-16, 17:41
I want to create a rdbms like sequence on a Pig relation.
Is there any existing UDF which could do this? I am bit new to pig, Kindly suggest how to proceed? Thanks & Regards, -- Dipesh Kr. Singh
-
Re: Create rdbms like sequence in Pig on Pig RelationThejas Nair 2012-05-18, 01:20
What do you mean by 'rdbms like sequence' ?
Thanks, Thejas On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > I want to create a rdbms like sequence on a Pig relation. > > Is there any existing UDF which could do this? > > I am bit new to pig, Kindly suggest how to proceed? > > > Thanks& Regards,
-
Re: Create rdbms like sequence in Pig on Pig RelationDIPESH KUMAR SINGH 2012-05-18, 18:34
Sorry, if my point was not clear.
I wish to create a sequence on a pig relation. Say For example i have a relation with data: (John, A-1) (Jack, B-2) (Jim, C-1) I want to create sequence i.e to add one more column to the relation, like a counter and keep on increasing the count for each record read. Expected output should be something like this: (If 200 is the start sequence. ) (John, A-1, 201) (Jack, B-2, 202) (Jim, C-1, 203) Could you please suggest to proceed on this? Thanks, Dipesh On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <[EMAIL PROTECTED]> wrote: > What do you mean by 'rdbms like sequence' ? > Thanks, > Thejas > > > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > >> I want to create a rdbms like sequence on a Pig relation. >> >> Is there any existing UDF which could do this? >> >> I am bit new to pig, Kindly suggest how to proceed? >> >> >> Thanks& Regards, >> > > -- Dipesh Kr. Singh
-
Re: Create rdbms like sequence in Pig on Pig RelationRajesh Balamohan 2012-05-20, 02:32
Pig doesn't have that facility yet. Moreover, its not very efficient to do
this in PIG/MR as it requires synchronization. However, if this is unavoidable situation for you, following things can be considered 1. Maintaining the seq number details in zookeeper 2. Having a simple structure in HBase table (seqNumber --> Value). You can get a bucket of values (ex: 1000-2000) from this and use it in your UDF. When the range depletes, you have to query/update HBase table (ex: 3000-4000). There are corner cases which needs to be handled. ~Rajesh.B On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]>wrote: > Sorry, if my point was not clear. > > I wish to create a sequence on a pig relation. > > Say For example i have a relation with data: > (John, A-1) > (Jack, B-2) > (Jim, C-1) > > I want to create sequence i.e to add one more column to the relation, like > a counter and keep on increasing the count for each record read. Expected > output should be something like this: > > (If 200 is the start sequence. ) > (John, A-1, 201) > (Jack, B-2, 202) > (Jim, C-1, 203) > > Could you please suggest to proceed on this? > > Thanks, > Dipesh > > On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <[EMAIL PROTECTED]> > wrote: > > > What do you mean by 'rdbms like sequence' ? > > Thanks, > > Thejas > > > > > > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > > > >> I want to create a rdbms like sequence on a Pig relation. > >> > >> Is there any existing UDF which could do this? > >> > >> I am bit new to pig, Kindly suggest how to proceed? > >> > >> > >> Thanks& Regards, > >> > > > > > > > -- > Dipesh Kr. Singh > -- ~Rajesh.B
-
Re: Create rdbms like sequence in Pig on Pig RelationRajesh Balamohan 2012-05-20, 02:36
If you do not bother about sequence number and the intention is to create
just unique key, you can just use GUID which doesn't require any synchronization at all (all mappers can run in parallel). The approached I suggested in earlier mail comes into picture mainly for sequence number. ~Rajesh.B On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < [EMAIL PROTECTED]> wrote: > Pig doesn't have that facility yet. Moreover, its not very efficient to do > this in PIG/MR as it requires synchronization. > > However, if this is unavoidable situation for you, following things can be > considered > > 1. Maintaining the seq number details in zookeeper > 2. Having a simple structure in HBase table (seqNumber --> Value). You can > get a bucket of values (ex: 1000-2000) from this and use it in your UDF. > When the range depletes, you have to query/update HBase table (ex: > 3000-4000). There are corner cases which needs to be handled. > > > ~Rajesh.B > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < > [EMAIL PROTECTED]> wrote: > >> Sorry, if my point was not clear. >> >> I wish to create a sequence on a pig relation. >> >> Say For example i have a relation with data: >> (John, A-1) >> (Jack, B-2) >> (Jim, C-1) >> >> I want to create sequence i.e to add one more column to the relation, like >> a counter and keep on increasing the count for each record read. Expected >> output should be something like this: >> >> (If 200 is the start sequence. ) >> (John, A-1, 201) >> (Jack, B-2, 202) >> (Jim, C-1, 203) >> >> Could you please suggest to proceed on this? >> >> Thanks, >> Dipesh >> >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <[EMAIL PROTECTED]> >> wrote: >> >> > What do you mean by 'rdbms like sequence' ? >> > Thanks, >> > Thejas >> > >> > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: >> > >> >> I want to create a rdbms like sequence on a Pig relation. >> >> >> >> Is there any existing UDF which could do this? >> >> >> >> I am bit new to pig, Kindly suggest how to proceed? >> >> >> >> >> >> Thanks& Regards, >> >> >> > >> > >> >> >> -- >> Dipesh Kr. Singh >> > > > > -- > ~Rajesh.B > -- ~Rajesh.B
-
Re: Create rdbms like sequence in Pig on Pig RelationDIPESH KUMAR SINGH 2012-05-20, 02:45
Thanks Rajesh.
Is GUID a built in UDF? -- Dipesh On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < [EMAIL PROTECTED]> wrote: > If you do not bother about sequence number and the intention is to create > just unique key, you can just use GUID which doesn't require any > synchronization at all (all mappers can run in parallel). > > The approached I suggested in earlier mail comes into picture mainly for > sequence number. > > ~Rajesh.B > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < > [EMAIL PROTECTED]> wrote: > > > Pig doesn't have that facility yet. Moreover, its not very efficient to > do > > this in PIG/MR as it requires synchronization. > > > > However, if this is unavoidable situation for you, following things can > be > > considered > > > > 1. Maintaining the seq number details in zookeeper > > 2. Having a simple structure in HBase table (seqNumber --> Value). You > can > > get a bucket of values (ex: 1000-2000) from this and use it in your UDF. > > When the range depletes, you have to query/update HBase table (ex: > > 3000-4000). There are corner cases which needs to be handled. > > > > > > ~Rajesh.B > > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < > > [EMAIL PROTECTED]> wrote: > > > >> Sorry, if my point was not clear. > >> > >> I wish to create a sequence on a pig relation. > >> > >> Say For example i have a relation with data: > >> (John, A-1) > >> (Jack, B-2) > >> (Jim, C-1) > >> > >> I want to create sequence i.e to add one more column to the relation, > like > >> a counter and keep on increasing the count for each record read. > Expected > >> output should be something like this: > >> > >> (If 200 is the start sequence. ) > >> (John, A-1, 201) > >> (Jack, B-2, 202) > >> (Jim, C-1, 203) > >> > >> Could you please suggest to proceed on this? > >> > >> Thanks, > >> Dipesh > >> > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <[EMAIL PROTECTED]> > >> wrote: > >> > >> > What do you mean by 'rdbms like sequence' ? > >> > Thanks, > >> > Thejas > >> > > >> > > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > >> > > >> >> I want to create a rdbms like sequence on a Pig relation. > >> >> > >> >> Is there any existing UDF which could do this? > >> >> > >> >> I am bit new to pig, Kindly suggest how to proceed? > >> >> > >> >> > >> >> Thanks& Regards, > >> >> > >> > > >> > > >> > >> > >> -- > >> Dipesh Kr. Singh > >> > > > > > > > > -- > > ~Rajesh.B > > > > > > -- > ~Rajesh.B > -- Dipesh Kr. Singh
-
Re: Create rdbms like sequence in Pig on Pig RelationRajesh Balamohan 2012-05-21, 01:26
I dont think so. However, its a single line java command. You can create
customUDF for this and use in your code. java.util.UUID.randomUUID(); ~Rajesh.B On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]>wrote: > Thanks Rajesh. > > Is GUID a built in UDF? > > > -- > Dipesh > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < > [EMAIL PROTECTED]> wrote: > > > If you do not bother about sequence number and the intention is to create > > just unique key, you can just use GUID which doesn't require any > > synchronization at all (all mappers can run in parallel). > > > > The approached I suggested in earlier mail comes into picture mainly for > > sequence number. > > > > ~Rajesh.B > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < > > [EMAIL PROTECTED]> wrote: > > > > > Pig doesn't have that facility yet. Moreover, its not very efficient to > > do > > > this in PIG/MR as it requires synchronization. > > > > > > However, if this is unavoidable situation for you, following things can > > be > > > considered > > > > > > 1. Maintaining the seq number details in zookeeper > > > 2. Having a simple structure in HBase table (seqNumber --> Value). You > > can > > > get a bucket of values (ex: 1000-2000) from this and use it in your > UDF. > > > When the range depletes, you have to query/update HBase table (ex: > > > 3000-4000). There are corner cases which needs to be handled. > > > > > > > > > ~Rajesh.B > > > > > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < > > > [EMAIL PROTECTED]> wrote: > > > > > >> Sorry, if my point was not clear. > > >> > > >> I wish to create a sequence on a pig relation. > > >> > > >> Say For example i have a relation with data: > > >> (John, A-1) > > >> (Jack, B-2) > > >> (Jim, C-1) > > >> > > >> I want to create sequence i.e to add one more column to the relation, > > like > > >> a counter and keep on increasing the count for each record read. > > Expected > > >> output should be something like this: > > >> > > >> (If 200 is the start sequence. ) > > >> (John, A-1, 201) > > >> (Jack, B-2, 202) > > >> (Jim, C-1, 203) > > >> > > >> Could you please suggest to proceed on this? > > >> > > >> Thanks, > > >> Dipesh > > >> > > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >> > What do you mean by 'rdbms like sequence' ? > > >> > Thanks, > > >> > Thejas > > >> > > > >> > > > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > > >> > > > >> >> I want to create a rdbms like sequence on a Pig relation. > > >> >> > > >> >> Is there any existing UDF which could do this? > > >> >> > > >> >> I am bit new to pig, Kindly suggest how to proceed? > > >> >> > > >> >> > > >> >> Thanks& Regards, > > >> >> > > >> > > > >> > > > >> > > >> > > >> -- > > >> Dipesh Kr. Singh > > >> > > > > > > > > > > > > -- > > > ~Rajesh.B > > > > > > > > > > > -- > > ~Rajesh.B > > > > > > -- > Dipesh Kr. Singh > -- ~Rajesh.B
-
Re: Create rdbms like sequence in Pig on Pig RelationRussell Jurney 2012-05-24, 05:12
How do you invoke java.util.UUID.randomUUID? There is no invoker that
doesn't take an arg? On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < [EMAIL PROTECTED]> wrote: > I dont think so. However, its a single line java command. You can create > customUDF for this and use in your code. > > java.util.UUID.randomUUID(); > > ~Rajesh.B > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH > <[EMAIL PROTECTED]>wrote: > > > Thanks Rajesh. > > > > Is GUID a built in UDF? > > > > > > -- > > Dipesh > > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < > > [EMAIL PROTECTED]> wrote: > > > > > If you do not bother about sequence number and the intention is to > create > > > just unique key, you can just use GUID which doesn't require any > > > synchronization at all (all mappers can run in parallel). > > > > > > The approached I suggested in earlier mail comes into picture mainly > for > > > sequence number. > > > > > > ~Rajesh.B > > > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < > > > [EMAIL PROTECTED]> wrote: > > > > > > > Pig doesn't have that facility yet. Moreover, its not very efficient > to > > > do > > > > this in PIG/MR as it requires synchronization. > > > > > > > > However, if this is unavoidable situation for you, following things > can > > > be > > > > considered > > > > > > > > 1. Maintaining the seq number details in zookeeper > > > > 2. Having a simple structure in HBase table (seqNumber --> Value). > You > > > can > > > > get a bucket of values (ex: 1000-2000) from this and use it in your > > UDF. > > > > When the range depletes, you have to query/update HBase table (ex: > > > > 3000-4000). There are corner cases which needs to be handled. > > > > > > > > > > > > ~Rajesh.B > > > > > > > > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > >> Sorry, if my point was not clear. > > > >> > > > >> I wish to create a sequence on a pig relation. > > > >> > > > >> Say For example i have a relation with data: > > > >> (John, A-1) > > > >> (Jack, B-2) > > > >> (Jim, C-1) > > > >> > > > >> I want to create sequence i.e to add one more column to the > relation, > > > like > > > >> a counter and keep on increasing the count for each record read. > > > Expected > > > >> output should be something like this: > > > >> > > > >> (If 200 is the start sequence. ) > > > >> (John, A-1, 201) > > > >> (Jack, B-2, 202) > > > >> (Jim, C-1, 203) > > > >> > > > >> Could you please suggest to proceed on this? > > > >> > > > >> Thanks, > > > >> Dipesh > > > >> > > > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair < > [EMAIL PROTECTED]> > > > >> wrote: > > > >> > > > >> > What do you mean by 'rdbms like sequence' ? > > > >> > Thanks, > > > >> > Thejas > > > >> > > > > >> > > > > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > > > >> > > > > >> >> I want to create a rdbms like sequence on a Pig relation. > > > >> >> > > > >> >> Is there any existing UDF which could do this? > > > >> >> > > > >> >> I am bit new to pig, Kindly suggest how to proceed? > > > >> >> > > > >> >> > > > >> >> Thanks& Regards, > > > >> >> > > > >> > > > > >> > > > > >> > > > >> > > > >> -- > > > >> Dipesh Kr. Singh > > > >> > > > > > > > > > > > > > > > > -- > > > > ~Rajesh.B > > > > > > > > > > > > > > > > -- > > > ~Rajesh.B > > > > > > > > > > > -- > > Dipesh Kr. Singh > > > > > > -- > ~Rajesh.B > -- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: Create rdbms like sequence in Pig on Pig RelationSubir S 2012-05-24, 05:19
Hope this helps -> http://www.javapractices.com/topic/TopicAction.do?Id=56
and this -> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 Thanks On Thu, May 24, 2012 at 10:42 AM, Russell Jurney <[EMAIL PROTECTED]>wrote: > How do you invoke java.util.UUID.randomUUID? There is no invoker that > doesn't take an arg? > > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < > [EMAIL PROTECTED]> wrote: > > > I dont think so. However, its a single line java command. You can create > > customUDF for this and use in your code. > > > > java.util.UUID.randomUUID(); > > > > ~Rajesh.B > > > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH > > <[EMAIL PROTECTED]>wrote: > > > > > Thanks Rajesh. > > > > > > Is GUID a built in UDF? > > > > > > > > > -- > > > Dipesh > > > > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < > > > [EMAIL PROTECTED]> wrote: > > > > > > > If you do not bother about sequence number and the intention is to > > create > > > > just unique key, you can just use GUID which doesn't require any > > > > synchronization at all (all mappers can run in parallel). > > > > > > > > The approached I suggested in earlier mail comes into picture mainly > > for > > > > sequence number. > > > > > > > > ~Rajesh.B > > > > > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > Pig doesn't have that facility yet. Moreover, its not very > efficient > > to > > > > do > > > > > this in PIG/MR as it requires synchronization. > > > > > > > > > > However, if this is unavoidable situation for you, following things > > can > > > > be > > > > > considered > > > > > > > > > > 1. Maintaining the seq number details in zookeeper > > > > > 2. Having a simple structure in HBase table (seqNumber --> Value). > > You > > > > can > > > > > get a bucket of values (ex: 1000-2000) from this and use it in your > > > UDF. > > > > > When the range depletes, you have to query/update HBase table (ex: > > > > > 3000-4000). There are corner cases which needs to be handled. > > > > > > > > > > > > > > > ~Rajesh.B > > > > > > > > > > > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < > > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > >> Sorry, if my point was not clear. > > > > >> > > > > >> I wish to create a sequence on a pig relation. > > > > >> > > > > >> Say For example i have a relation with data: > > > > >> (John, A-1) > > > > >> (Jack, B-2) > > > > >> (Jim, C-1) > > > > >> > > > > >> I want to create sequence i.e to add one more column to the > > relation, > > > > like > > > > >> a counter and keep on increasing the count for each record read. > > > > Expected > > > > >> output should be something like this: > > > > >> > > > > >> (If 200 is the start sequence. ) > > > > >> (John, A-1, 201) > > > > >> (Jack, B-2, 202) > > > > >> (Jim, C-1, 203) > > > > >> > > > > >> Could you please suggest to proceed on this? > > > > >> > > > > >> Thanks, > > > > >> Dipesh > > > > >> > > > > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair < > > [EMAIL PROTECTED]> > > > > >> wrote: > > > > >> > > > > >> > What do you mean by 'rdbms like sequence' ? > > > > >> > Thanks, > > > > >> > Thejas > > > > >> > > > > > >> > > > > > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > > > > >> > > > > > >> >> I want to create a rdbms like sequence on a Pig relation. > > > > >> >> > > > > >> >> Is there any existing UDF which could do this? > > > > >> >> > > > > >> >> I am bit new to pig, Kindly suggest how to proceed? > > > > >> >> > > > > >> >> > > > > >> >> Thanks& Regards, > > > > >> >> > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> -- > > > > >> Dipesh Kr. Singh > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > ~Rajesh.B > > > > > > > > > > > > > > > > > > > > > -- > > > > ~Rajesh.B > > > > > > > > > > > > > > > > -- > > > Dipesh Kr. Singh > > > > > > > > > > > -- > > ~Rajesh.B > > > > > > --
-
Re: Create rdbms like sequence in Pig on Pig RelationRussell Jurney 2012-05-24, 05:46
Thanks, I mean how do you invoke it directly in grunt> from Pig?
I keep messing it up for the last 30 minutes. Should I check the settings on my pacemaker, I feel like Fabio on NyQuil messing with this. On Wed, May 23, 2012 at 10:19 PM, Subir S <[EMAIL PROTECTED]> wrote: > Hope this helps -> http://www.javapractices.com/topic/TopicAction.do?Id=56 > > and this -> > > http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 > > Thanks > > > > On Thu, May 24, 2012 at 10:42 AM, Russell Jurney > <[EMAIL PROTECTED]>wrote: > > > How do you invoke java.util.UUID.randomUUID? There is no invoker that > > doesn't take an arg? > > > > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < > > [EMAIL PROTECTED]> wrote: > > > > > I dont think so. However, its a single line java command. You can > create > > > customUDF for this and use in your code. > > > > > > java.util.UUID.randomUUID(); > > > > > > ~Rajesh.B > > > > > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH > > > <[EMAIL PROTECTED]>wrote: > > > > > > > Thanks Rajesh. > > > > > > > > Is GUID a built in UDF? > > > > > > > > > > > > -- > > > > Dipesh > > > > > > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > If you do not bother about sequence number and the intention is to > > > create > > > > > just unique key, you can just use GUID which doesn't require any > > > > > synchronization at all (all mappers can run in parallel). > > > > > > > > > > The approached I suggested in earlier mail comes into picture > mainly > > > for > > > > > sequence number. > > > > > > > > > > ~Rajesh.B > > > > > > > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < > > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > > > Pig doesn't have that facility yet. Moreover, its not very > > efficient > > > to > > > > > do > > > > > > this in PIG/MR as it requires synchronization. > > > > > > > > > > > > However, if this is unavoidable situation for you, following > things > > > can > > > > > be > > > > > > considered > > > > > > > > > > > > 1. Maintaining the seq number details in zookeeper > > > > > > 2. Having a simple structure in HBase table (seqNumber --> > Value). > > > You > > > > > can > > > > > > get a bucket of values (ex: 1000-2000) from this and use it in > your > > > > UDF. > > > > > > When the range depletes, you have to query/update HBase table > (ex: > > > > > > 3000-4000). There are corner cases which needs to be handled. > > > > > > > > > > > > > > > > > > ~Rajesh.B > > > > > > > > > > > > > > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < > > > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > > > >> Sorry, if my point was not clear. > > > > > >> > > > > > >> I wish to create a sequence on a pig relation. > > > > > >> > > > > > >> Say For example i have a relation with data: > > > > > >> (John, A-1) > > > > > >> (Jack, B-2) > > > > > >> (Jim, C-1) > > > > > >> > > > > > >> I want to create sequence i.e to add one more column to the > > > relation, > > > > > like > > > > > >> a counter and keep on increasing the count for each record read. > > > > > Expected > > > > > >> output should be something like this: > > > > > >> > > > > > >> (If 200 is the start sequence. ) > > > > > >> (John, A-1, 201) > > > > > >> (Jack, B-2, 202) > > > > > >> (Jim, C-1, 203) > > > > > >> > > > > > >> Could you please suggest to proceed on this? > > > > > >> > > > > > >> Thanks, > > > > > >> Dipesh > > > > > >> > > > > > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair < > > > [EMAIL PROTECTED]> > > > > > >> wrote: > > > > > >> > > > > > >> > What do you mean by 'rdbms like sequence' ? > > > > > >> > Thanks, > > > > > >> > Thejas > > > > > >> > > > > > > >> > > > > > > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > > > > > >> > > > > > > >> >> I want to create a rdbms like sequence on a Pig relation. > > > > > >> >> > > > > > >> >> Is there any existing UDF which could do this? Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: Create rdbms like sequence in Pig on Pig RelationGianmarco De Francisci Mo... 2012-05-24, 05:55
Hi,
Pig will have this functionality as soon as we finish PIG-2353, which is part of this year's GSoC. Cheers, -- Gianmarco On Fri, May 18, 2012 at 8:34 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]>wrote: > Sorry, if my point was not clear. > > I wish to create a sequence on a pig relation. > > Say For example i have a relation with data: > (John, A-1) > (Jack, B-2) > (Jim, C-1) > > I want to create sequence i.e to add one more column to the relation, like > a counter and keep on increasing the count for each record read. Expected > output should be something like this: > > (If 200 is the start sequence. ) > (John, A-1, 201) > (Jack, B-2, 202) > (Jim, C-1, 203) > > Could you please suggest to proceed on this? > > Thanks, > Dipesh > > On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <[EMAIL PROTECTED]> > wrote: > > > What do you mean by 'rdbms like sequence' ? > > Thanks, > > Thejas > > > > > > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > > > >> I want to create a rdbms like sequence on a Pig relation. > >> > >> Is there any existing UDF which could do this? > >> > >> I am bit new to pig, Kindly suggest how to proceed? > >> > >> > >> Thanks& Regards, > >> > > > > > > > -- > Dipesh Kr. Singh >
-
Re: Create rdbms like sequence in Pig on Pig RelationSubir S 2012-05-27, 09:33
I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0
http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ Thanks On 5/24/12, Russell Jurney <[EMAIL PROTECTED]> wrote: > Thanks, I mean how do you invoke it directly in grunt> from Pig? > > I keep messing it up for the last 30 minutes. Should I check the settings > on my pacemaker, I feel like Fabio on NyQuil messing with this. > > On Wed, May 23, 2012 at 10:19 PM, Subir S <[EMAIL PROTECTED]> > wrote: > >> Hope this helps -> >> http://www.javapractices.com/topic/TopicAction.do?Id=56 >> >> and this -> >> >> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 >> >> Thanks >> >> >> >> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney >> <[EMAIL PROTECTED]>wrote: >> >> > How do you invoke java.util.UUID.randomUUID? There is no invoker that >> > doesn't take an arg? >> > >> > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < >> > [EMAIL PROTECTED]> wrote: >> > >> > > I dont think so. However, its a single line java command. You can >> create >> > > customUDF for this and use in your code. >> > > >> > > java.util.UUID.randomUUID(); >> > > >> > > ~Rajesh.B >> > > >> > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH >> > > <[EMAIL PROTECTED]>wrote: >> > > >> > > > Thanks Rajesh. >> > > > >> > > > Is GUID a built in UDF? >> > > > >> > > > >> > > > -- >> > > > Dipesh >> > > > >> > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < >> > > > [EMAIL PROTECTED]> wrote: >> > > > >> > > > > If you do not bother about sequence number and the intention is >> > > > > to >> > > create >> > > > > just unique key, you can just use GUID which doesn't require any >> > > > > synchronization at all (all mappers can run in parallel). >> > > > > >> > > > > The approached I suggested in earlier mail comes into picture >> mainly >> > > for >> > > > > sequence number. >> > > > > >> > > > > ~Rajesh.B >> > > > > >> > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < >> > > > > [EMAIL PROTECTED]> wrote: >> > > > > >> > > > > > Pig doesn't have that facility yet. Moreover, its not very >> > efficient >> > > to >> > > > > do >> > > > > > this in PIG/MR as it requires synchronization. >> > > > > > >> > > > > > However, if this is unavoidable situation for you, following >> things >> > > can >> > > > > be >> > > > > > considered >> > > > > > >> > > > > > 1. Maintaining the seq number details in zookeeper >> > > > > > 2. Having a simple structure in HBase table (seqNumber --> >> Value). >> > > You >> > > > > can >> > > > > > get a bucket of values (ex: 1000-2000) from this and use it in >> your >> > > > UDF. >> > > > > > When the range depletes, you have to query/update HBase table >> (ex: >> > > > > > 3000-4000). There are corner cases which needs to be handled. >> > > > > > >> > > > > > >> > > > > > ~Rajesh.B >> > > > > > >> > > > > > >> > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < >> > > > > > [EMAIL PROTECTED]> wrote: >> > > > > > >> > > > > >> Sorry, if my point was not clear. >> > > > > >> >> > > > > >> I wish to create a sequence on a pig relation. >> > > > > >> >> > > > > >> Say For example i have a relation with data: >> > > > > >> (John, A-1) >> > > > > >> (Jack, B-2) >> > > > > >> (Jim, C-1) >> > > > > >> >> > > > > >> I want to create sequence i.e to add one more column to the >> > > relation, >> > > > > like >> > > > > >> a counter and keep on increasing the count for each record >> > > > > >> read. >> > > > > Expected >> > > > > >> output should be something like this: >> > > > > >> >> > > > > >> (If 200 is the start sequence. ) >> > > > > >> (John, A-1, 201) >> > > > > >> (Jack, B-2, 202) >> > > > > >> (Jim, C-1, 203) >> > > > > >> >> > > > > >> Could you please suggest to proceed on this? >> > > > > >> >> > > > > >> Thanks, >> > > > > >> Dipesh >> > > > > >> >> > > > > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair < >> > > [EMAIL PROTECTED]>
-
Re: Create rdbms like sequence in Pig on Pig RelationRussell Jurney 2012-05-27, 20:56
It helps, but I am not able to invoke java.util.UUID.toString, maybe
because it doesn't take an argument. This is from the docs: DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String String'); encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray); decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded, 'UTF-8'); Maybe I forgot, but is this how I do it? DEFINE UUID InvokeForString('java.util.UUID.toString'); with_uuid = FOREACH my_stuff generate UUID(), *; Sorry, I only understand example code - not APIs. My Java is quite weak. http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString() On Sun, May 27, 2012 at 2:33 AM, Subir S <[EMAIL PROTECTED]> wrote: > I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0 > > > http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ > > Thanks > > On 5/24/12, Russell Jurney <[EMAIL PROTECTED]> wrote: > > Thanks, I mean how do you invoke it directly in grunt> from Pig? > > > > I keep messing it up for the last 30 minutes. Should I check the settings > > on my pacemaker, I feel like Fabio on NyQuil messing with this. > > > > On Wed, May 23, 2012 at 10:19 PM, Subir S <[EMAIL PROTECTED]> > > wrote: > > > >> Hope this helps -> > >> http://www.javapractices.com/topic/TopicAction.do?Id=56 > >> > >> and this -> > >> > >> > http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 > >> > >> Thanks > >> > >> > >> > >> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney > >> <[EMAIL PROTECTED]>wrote: > >> > >> > How do you invoke java.util.UUID.randomUUID? There is no invoker that > >> > doesn't take an arg? > >> > > >> > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < > >> > [EMAIL PROTECTED]> wrote: > >> > > >> > > I dont think so. However, its a single line java command. You can > >> create > >> > > customUDF for this and use in your code. > >> > > > >> > > java.util.UUID.randomUUID(); > >> > > > >> > > ~Rajesh.B > >> > > > >> > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH > >> > > <[EMAIL PROTECTED]>wrote: > >> > > > >> > > > Thanks Rajesh. > >> > > > > >> > > > Is GUID a built in UDF? > >> > > > > >> > > > > >> > > > -- > >> > > > Dipesh > >> > > > > >> > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < > >> > > > [EMAIL PROTECTED]> wrote: > >> > > > > >> > > > > If you do not bother about sequence number and the intention is > >> > > > > to > >> > > create > >> > > > > just unique key, you can just use GUID which doesn't require any > >> > > > > synchronization at all (all mappers can run in parallel). > >> > > > > > >> > > > > The approached I suggested in earlier mail comes into picture > >> mainly > >> > > for > >> > > > > sequence number. > >> > > > > > >> > > > > ~Rajesh.B > >> > > > > > >> > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < > >> > > > > [EMAIL PROTECTED]> wrote: > >> > > > > > >> > > > > > Pig doesn't have that facility yet. Moreover, its not very > >> > efficient > >> > > to > >> > > > > do > >> > > > > > this in PIG/MR as it requires synchronization. > >> > > > > > > >> > > > > > However, if this is unavoidable situation for you, following > >> things > >> > > can > >> > > > > be > >> > > > > > considered > >> > > > > > > >> > > > > > 1. Maintaining the seq number details in zookeeper > >> > > > > > 2. Having a simple structure in HBase table (seqNumber --> > >> Value). > >> > > You > >> > > > > can > >> > > > > > get a bucket of values (ex: 1000-2000) from this and use it in > >> your > >> > > > UDF. > >> > > > > > When the range depletes, you have to query/update HBase table > >> (ex: > >> > > > > > 3000-4000). There are corner cases which needs to be handled. > >> > > > > > > >> > > > > > > >> > > > > > ~Rajesh.B > >> > > > > > > >> > > > > > > >> > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < > >> > > > > > [EMAIL PROTECTED]> wrote: Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: Create rdbms like sequence in Pig on Pig RelationDragan Nedeljkovic 2012-05-27, 21:39
You have to call UUID.randomUUID() to get an UUID, but you cannot use DEFINE
to do that since DEFINE does not support methods that return arbitrary classes. Wrapping it into an UDF, works just fine, package piggybank; import java.io.IOException; import java.util.UUID; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; public class CreateUUID extends EvalFunc<String> { public String exec(Tuple input) throws IOException { try { return UUID.randomUUID().toString(); } catch(Exception e) { // Throwing an exception will cause the task to fail. throw new IOException("Something bad happened!", e); } } } // eof register 'mypiggybank.jar'; define CreateUUID piggybank.CreateUUID(); input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray); describe input_lines; dump input_lines; new_list = FOREACH input_lines GENERATE line, CreateUUID(); describe new_list; dump new_list; -- eof >________________________________ > From: Russell Jurney <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Sunday, May 27, 2012 4:56:07 PM >Subject: Re: Create rdbms like sequence in Pig on Pig Relation > >It helps, but I am not able to invoke java.util.UUID.toString, maybe >because it doesn't take an argument. This is from the docs: > >DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String >String'); >encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray); >decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded, >'UTF-8'); > > >Maybe I forgot, but is this how I do it? > >DEFINE UUID InvokeForString('java.util.UUID.toString'); >with_uuid = FOREACH my_stuff generate UUID(), *; > > >Sorry, I only understand example code - not APIs. My Java is quite weak. > >http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString() > >On Sun, May 27, 2012 at 2:33 AM, Subir S <[EMAIL PROTECTED]> wrote: > >> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0 >> >> >> http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ >> >> Thanks >> >> On 5/24/12, Russell Jurney <[EMAIL PROTECTED]> wrote: >> > Thanks, I mean how do you invoke it directly in grunt> from Pig? >> > >> > I keep messing it up for the last 30 minutes. Should I check the settings >> > on my pacemaker, I feel like Fabio on NyQuil messing with this. >> > >> > On Wed, May 23, 2012 at 10:19 PM, Subir S <[EMAIL PROTECTED]> >> > wrote: >> > >> >> Hope this helps -> >> >> http://www.javapractices.com/topic/TopicAction.do?Id=56 >> >> >> >> and this -> >> >> >> >> >> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 >> >> >> >> Thanks >> >> >> >> >> >> >> >> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney >> >> <[EMAIL PROTECTED]>wrote: >> >> >> >> > How do you invoke java.util.UUID.randomUUID? There is no invoker that >> >> > doesn't take an arg? >> >> > >> >> > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < >> >> > [EMAIL PROTECTED]> wrote: >> >> > >> >> > > I dont think so. However, its a single line java command. You can >> >> create >> >> > > customUDF for this and use in your code. >> >> > > >> >> > > java.util.UUID.randomUUID(); >> >> > > >> >> > > ~Rajesh.B >> >> > > >> >> > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH >> >> > > <[EMAIL PROTECTED]>wrote: >> >> > > >> >> > > > Thanks Rajesh. >> >> > > > >> >> > > > Is GUID a built in UDF? >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > Dipesh >> >> > > > >> >> > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < >> >> > > > [EMAIL PROTECTED]> wrote: >> >> > > > >> >> > > > > If you do not bother about sequence number and the intention is >> >> > > > > to >> >> > > create >> >> > > > > just unique key, you can just use GUID which doesn't require any >> >> > > > > synchronization at all (all mappers can run in parallel). >> >> > > > > >> >> > > > > The approached I suggested in earlier mail comes into picture >> >> mainly >>
-
Re: Create rdbms like sequence in Pig on Pig RelationDmitriy Ryaboy 2012-05-27, 23:01
Right.. Russel, the reason DynamicInvokers weren't working is that
InvokeForString expects a function that returns a String. randomUUID returns a UUID, not a String. You could of course call this trivially using jruby udfs (less work than the java version). D On Sun, May 27, 2012 at 2:39 PM, Dragan Nedeljkovic <[EMAIL PROTECTED]> wrote: > You have to call UUID.randomUUID() to get an UUID, but you cannot use DEFINE > to do that since DEFINE does not support methods that return arbitrary classes. > > Wrapping it into an UDF, works just fine, > > package piggybank; > > import java.io.IOException; > import java.util.UUID; > > import org.apache.pig.EvalFunc; > import org.apache.pig.data.Tuple; > > public class CreateUUID > extends EvalFunc<String> > { > public String exec(Tuple input) > throws IOException > { > try > { > return UUID.randomUUID().toString(); > } > catch(Exception e) > { > // Throwing an exception will cause the task to fail. > throw new IOException("Something bad happened!", e); > } > } > } > // eof > > > register 'mypiggybank.jar'; > define CreateUUID piggybank.CreateUUID(); > > input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray); > describe input_lines; > dump input_lines; > > new_list = FOREACH input_lines GENERATE line, CreateUUID(); > describe new_list; > dump new_list; > > -- eof > > >>________________________________ >> From: Russell Jurney <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Sunday, May 27, 2012 4:56:07 PM >>Subject: Re: Create rdbms like sequence in Pig on Pig Relation >> >>It helps, but I am not able to invoke java.util.UUID.toString, maybe >>because it doesn't take an argument. This is from the docs: >> >>DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String >>String'); >>encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray); >>decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded, >>'UTF-8'); >> >> >>Maybe I forgot, but is this how I do it? >> >>DEFINE UUID InvokeForString('java.util.UUID.toString'); >>with_uuid = FOREACH my_stuff generate UUID(), *; >> >> >>Sorry, I only understand example code - not APIs. My Java is quite weak. >> >>http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString() >> >>On Sun, May 27, 2012 at 2:33 AM, Subir S <[EMAIL PROTECTED]> wrote: >> >>> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0 >>> >>> >>> http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ >>> >>> Thanks >>> >>> On 5/24/12, Russell Jurney <[EMAIL PROTECTED]> wrote: >>> > Thanks, I mean how do you invoke it directly in grunt> from Pig? >>> > >>> > I keep messing it up for the last 30 minutes. Should I check the settings >>> > on my pacemaker, I feel like Fabio on NyQuil messing with this. >>> > >>> > On Wed, May 23, 2012 at 10:19 PM, Subir S <[EMAIL PROTECTED]> >>> > wrote: >>> > >>> >> Hope this helps -> >>> >> http://www.javapractices.com/topic/TopicAction.do?Id=56 >>> >> >>> >> and this -> >>> >> >>> >> >>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 >>> >> >>> >> Thanks >>> >> >>> >> >>> >> >>> >> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney >>> >> <[EMAIL PROTECTED]>wrote: >>> >> >>> >> > How do you invoke java.util.UUID.randomUUID? There is no invoker that >>> >> > doesn't take an arg? >>> >> > >>> >> > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < >>> >> > [EMAIL PROTECTED]> wrote: >>> >> > >>> >> > > I dont think so. However, its a single line java command. You can >>> >> create >>> >> > > customUDF for this and use in your code. >>> >> > > >>> >> > > java.util.UUID.randomUUID(); >>> >> > > >>> >> > > ~Rajesh.B >>> >> > > >>> >> > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH >>> >> > > <[EMAIL PROTECTED]>wrote: >>> >> > > >>> >> > > > Thanks Rajesh. >>> >> > > > >>> >> > > > Is GUID a built in UDF? >>> >> > > > >>> >> > > > >>> >> > > > -- >>> >> > > > Dipesh
-
Re: Create rdbms like sequence in Pig on Pig RelationRussell Jurney 2012-05-28, 00:06
Are there examples of JRuby UDFs? I couldn't figure it out.
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com On May 27, 2012, at 4:01 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Right.. Russel, the reason DynamicInvokers weren't working is that > InvokeForString expects a function that returns a String. randomUUID > returns a UUID, not a String. > You could of course call this trivially using jruby udfs (less work > than the java version). > > D > > On Sun, May 27, 2012 at 2:39 PM, Dragan Nedeljkovic <[EMAIL PROTECTED]> wrote: >> You have to call UUID.randomUUID() to get an UUID, but you cannot use DEFINE >> to do that since DEFINE does not support methods that return arbitrary classes. >> >> Wrapping it into an UDF, works just fine, >> >> package piggybank; >> >> import java.io.IOException; >> import java.util.UUID; >> >> import org.apache.pig.EvalFunc; >> import org.apache.pig.data.Tuple; >> >> public class CreateUUID >> extends EvalFunc<String> >> { >> public String exec(Tuple input) >> throws IOException >> { >> try >> { >> return UUID.randomUUID().toString(); >> } >> catch(Exception e) >> { >> // Throwing an exception will cause the task to fail. >> throw new IOException("Something bad happened!", e); >> } >> } >> } >> // eof >> >> >> register 'mypiggybank.jar'; >> define CreateUUID piggybank.CreateUUID(); >> >> input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray); >> describe input_lines; >> dump input_lines; >> >> new_list = FOREACH input_lines GENERATE line, CreateUUID(); >> describe new_list; >> dump new_list; >> >> -- eof >> >> >>> ________________________________ >>> From: Russell Jurney <[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Sunday, May 27, 2012 4:56:07 PM >>> Subject: Re: Create rdbms like sequence in Pig on Pig Relation >>> >>> It helps, but I am not able to invoke java.util.UUID.toString, maybe >>> because it doesn't take an argument. This is from the docs: >>> >>> DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String >>> String'); >>> encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray); >>> decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded, >>> 'UTF-8'); >>> >>> >>> Maybe I forgot, but is this how I do it? >>> >>> DEFINE UUID InvokeForString('java.util.UUID.toString'); >>> with_uuid = FOREACH my_stuff generate UUID(), *; >>> >>> >>> Sorry, I only understand example code - not APIs. My Java is quite weak. >>> >>> http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString() >>> >>> On Sun, May 27, 2012 at 2:33 AM, Subir S <[EMAIL PROTECTED]> wrote: >>> >>>> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0 >>>> >>>> >>>> http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ >>>> >>>> Thanks >>>> >>>> On 5/24/12, Russell Jurney <[EMAIL PROTECTED]> wrote: >>>>> Thanks, I mean how do you invoke it directly in grunt> from Pig? >>>>> >>>>> I keep messing it up for the last 30 minutes. Should I check the settings >>>>> on my pacemaker, I feel like Fabio on NyQuil messing with this. >>>>> >>>>> On Wed, May 23, 2012 at 10:19 PM, Subir S <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> Hope this helps -> >>>>>> http://www.javapractices.com/topic/TopicAction.do?Id=56 >>>>>> >>>>>> and this -> >>>>>> >>>>>> >>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney >>>>>> <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> How do you invoke java.util.UUID.randomUUID? There is no invoker that >>>>>>> doesn't take an arg? >>>>>>> >>>>>>> On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < >>>>>>> [EMAIL PROTECTED]> wrote: >>>>>>> >>>>>>>> I dont think so. However, its a single line java command. You can >>>>>> create >>>>>>>> customUDF for this and use in your code. >>>>>>>> >>>>>>>> java.util.UUID.randomUUID();
-
Re: Create rdbms like sequence in Pig on Pig RelationSubir S 2012-05-28, 05:51
Dimitry, just wondering based on your blog.
If Russel invoked the method this way, will it not work? DEFINE UUID InvokeForString('java.util.UUID.*randomUUID.*toString'); with_uuid = FOREACH my_stuff generate UUID(), *; What he did earlier was that he called a instance method toString of UUID class, which must not work. From your blog i understand that method must be static? Can you please clarify this? On Mon, May 28, 2012 at 4:31 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Right.. Russel, the reason DynamicInvokers weren't working is that > InvokeForString expects a function that returns a String. randomUUID > returns a UUID, not a String. > You could of course call this trivially using jruby udfs (less work > than the java version). > > D > > On Sun, May 27, 2012 at 2:39 PM, Dragan Nedeljkovic <[EMAIL PROTECTED]> > wrote: > > You have to call UUID.randomUUID() to get an UUID, but you cannot use > DEFINE > > to do that since DEFINE does not support methods that return arbitrary > classes. > > > > Wrapping it into an UDF, works just fine, > > > > package piggybank; > > > > import java.io.IOException; > > import java.util.UUID; > > > > import org.apache.pig.EvalFunc; > > import org.apache.pig.data.Tuple; > > > > public class CreateUUID > > extends EvalFunc<String> > > { > > public String exec(Tuple input) > > throws IOException > > { > > try > > { > > return UUID.randomUUID().toString(); > > } > > catch(Exception e) > > { > > // Throwing an exception will cause the task to fail. > > throw new IOException("Something bad happened!", e); > > } > > } > > } > > // eof > > > > > > register 'mypiggybank.jar'; > > define CreateUUID piggybank.CreateUUID(); > > > > input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray); > > describe input_lines; > > dump input_lines; > > > > new_list = FOREACH input_lines GENERATE line, CreateUUID(); > > describe new_list; > > dump new_list; > > > > -- eof > > > > > >>________________________________ > >> From: Russell Jurney <[EMAIL PROTECTED]> > >>To: [EMAIL PROTECTED] > >>Sent: Sunday, May 27, 2012 4:56:07 PM > >>Subject: Re: Create rdbms like sequence in Pig on Pig Relation > >> > >>It helps, but I am not able to invoke java.util.UUID.toString, maybe > >>because it doesn't take an argument. This is from the docs: > >> > >>DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String > >>String'); > >>encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray); > >>decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded, > >>'UTF-8'); > >> > >> > >>Maybe I forgot, but is this how I do it? > >> > >>DEFINE UUID InvokeForString('java.util.UUID.toString'); > >>with_uuid = FOREACH my_stuff generate UUID(), *; > >> > >> > >>Sorry, I only understand example code - not APIs. My Java is quite weak. > >> > >>http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString() > >> > >>On Sun, May 27, 2012 at 2:33 AM, Subir S <[EMAIL PROTECTED]> > wrote: > >> > >>> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0 > >>> > >>> > >>> > http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ > >>> > >>> Thanks > >>> > >>> On 5/24/12, Russell Jurney <[EMAIL PROTECTED]> wrote: > >>> > Thanks, I mean how do you invoke it directly in grunt> from Pig? > >>> > > >>> > I keep messing it up for the last 30 minutes. Should I check the > settings > >>> > on my pacemaker, I feel like Fabio on NyQuil messing with this. > >>> > > >>> > On Wed, May 23, 2012 at 10:19 PM, Subir S <[EMAIL PROTECTED] > > > >>> > wrote: > >>> > > >>> >> Hope this helps -> > >>> >> http://www.javapractices.com/topic/TopicAction.do?Id=56 > >>> >> > >>> >> and this -> > >>> >> > >>> >> > >>> > http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 > >>> >> > >>> >> Thanks > >>> >> > >>> >> > >>> >> > >>> >> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney > >>> >> <[EMAIL PROTECTED]>wrote: > >>> >>
-
Re: Create rdbms like sequence in Pig on Pig RelationRajesh Balamohan 2012-05-28, 22:51
Hi Gianmarco,
PIG-2353 would work great for smaller bags. But for larger data sets, requirement would be generate SEQUENCE_NUMBER in the mapper stage itself (and this number has to be unique across mappers) ~Rajesh.B On Thu, May 24, 2012 at 11:25 AM, Gianmarco De Francisci Morales < [EMAIL PROTECTED]> wrote: > Hi, > Pig will have this functionality as soon as we finish PIG-2353, which is > part of this year's GSoC. > > Cheers, > -- > Gianmarco > > > > > On Fri, May 18, 2012 at 8:34 PM, DIPESH KUMAR SINGH > <[EMAIL PROTECTED]>wrote: > > > Sorry, if my point was not clear. > > > > I wish to create a sequence on a pig relation. > > > > Say For example i have a relation with data: > > (John, A-1) > > (Jack, B-2) > > (Jim, C-1) > > > > I want to create sequence i.e to add one more column to the relation, > like > > a counter and keep on increasing the count for each record read. Expected > > output should be something like this: > > > > (If 200 is the start sequence. ) > > (John, A-1, 201) > > (Jack, B-2, 202) > > (Jim, C-1, 203) > > > > Could you please suggest to proceed on this? > > > > Thanks, > > Dipesh > > > > On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <[EMAIL PROTECTED]> > > wrote: > > > > > What do you mean by 'rdbms like sequence' ? > > > Thanks, > > > Thejas > > > > > > > > > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: > > > > > >> I want to create a rdbms like sequence on a Pig relation. > > >> > > >> Is there any existing UDF which could do this? > > >> > > >> I am bit new to pig, Kindly suggest how to proceed? > > >> > > >> > > >> Thanks& Regards, > > >> > > > > > > > > > > > > -- > > Dipesh Kr. Singh > > > -- ~Rajesh.B |