|
|
praveenesh kumar 2012-02-02, 09:05
Hi,
I am trying to learn how can I store records in tuples ?
Suppose I have a txt file
$ cat tmp.txt
1,2,3,4 2,3,4,5 4,5,5,6
I am doing this $ pig > A = Load 'tmp.txt' using PigStorage(',') AS (t:tuple(int:a,int:b,int:c,int:d)); $ pig > Dump A; I am getting nothing in the output ( ) ( ) ( )
Can anyone help me understanding why its happening ? Even if I don't use PigStorage nothing is coming.
Thanks, Praveenesh
-
Re: How to use tuples ?
Daniel Dai 2012-02-02, 09:19
Hi, Praveenesh, Your tmp.txt should be: (1,2,3,4) (2,3,4,5) (4,5,5,6)
And you cannot use "," as a delimit for PigStorage, otherwise, PigStorage will split the line with comma first then parse the tuple.
Daniel
On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <[EMAIL PROTECTED]> wrote: > Hi, > > I am trying to learn how can I store records in tuples ? > > Suppose I have a txt file > > $ cat tmp.txt > > 1,2,3,4 > 2,3,4,5 > 4,5,5,6 > > I am doing this > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS > (t:tuple(int:a,int:b,int:c,int:d)); > $ pig > Dump A; > I am getting nothing in the output > ( ) > ( ) > ( ) > > Can anyone help me understanding why its happening ? > Even if I don't use PigStorage nothing is coming. > > Thanks, > Praveenesh
-
Re: How to use tuples ?
praveenesh kumar 2012-02-02, 09:40
thanks Daniel, so it means for all other complex datatypes, we need the file contents to be in that format like tuples in ( ), bag in { } , map in [ ]
On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:
> Hi, Praveenesh, > Your tmp.txt should be: > (1,2,3,4) > (2,3,4,5) > (4,5,5,6) > > And you cannot use "," as a delimit for PigStorage, otherwise, > PigStorage will split the line with comma first then parse the tuple. > > Daniel > > On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <[EMAIL PROTECTED]> > wrote: > > Hi, > > > > I am trying to learn how can I store records in tuples ? > > > > Suppose I have a txt file > > > > $ cat tmp.txt > > > > 1,2,3,4 > > 2,3,4,5 > > 4,5,5,6 > > > > I am doing this > > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS > > (t:tuple(int:a,int:b,int:c,int:d)); > > $ pig > Dump A; > > I am getting nothing in the output > > ( ) > > ( ) > > ( ) > > > > Can anyone help me understanding why its happening ? > > Even if I don't use PigStorage nothing is coming. > > > > Thanks, > > Praveenesh >
-
Re: How to use tuples ?
praveenesh kumar 2012-02-02, 09:58
One more thing, suppose I have data - tmp.txt like (1,2,3) (2,4,5) (2,3,4) (2,3,5)
So if I will use Z1 = Load 'tmp.txt' The data will get stored in a bag (right?)
( (1,2,3), (2,4,5) ) ( (2,3,4), (2,3,5) )
Now I can refer to the fields in this case ( without schema ) ?
B = Foreach Z1 generate Z1.$0;
This generates error. How can I do it correctly ?
Thanks, Praveenesh
And if so, how can I refer the variables inside ?
Thanks, Praveenesh
On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote:
> thanks Daniel, > so it means for all other complex datatypes, we need the file contents to > be in that format > like tuples in ( ), bag in { } , map in [ ] > > > > > On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: > >> Hi, Praveenesh, >> Your tmp.txt should be: >> (1,2,3,4) >> (2,3,4,5) >> (4,5,5,6) >> >> And you cannot use "," as a delimit for PigStorage, otherwise, >> PigStorage will split the line with comma first then parse the tuple. >> >> Daniel >> >> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <[EMAIL PROTECTED]> >> wrote: >> > Hi, >> > >> > I am trying to learn how can I store records in tuples ? >> > >> > Suppose I have a txt file >> > >> > $ cat tmp.txt >> > >> > 1,2,3,4 >> > 2,3,4,5 >> > 4,5,5,6 >> > >> > I am doing this >> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS >> > (t:tuple(int:a,int:b,int:c,int:d)); >> > $ pig > Dump A; >> > I am getting nothing in the output >> > ( ) >> > ( ) >> > ( ) >> > >> > Can anyone help me understanding why its happening ? >> > Even if I don't use PigStorage nothing is coming. >> > >> > Thanks, >> > Praveenesh >> > >
-
Re: How to use tuples ?
praveenesh kumar 2012-02-02, 10:09
Okie got it.Thanks for guiding. Without schema. we can refer through $0.$0 or $1.$0 and so on based on the positions..
Thanks, Praveenesh
On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote:
> One more thing, suppose I have data - tmp.txt lie > (1,2,3) (2,4,5) > (2,3,4) (2,3,5) > > So if I will use Z1 = Load 'tmp.txt' > The data will get stored in a bag (right?) > > ( (1,2,3), (2,4,5) ) > ( (2,3,4), (2,3,5) ) > > Now I can refer to the fields in this case ( without schema ) ? > > B = Foreach Z1 generate Z1.$0; > > This generates error. How can I do it correctly ? > > Thanks, > Praveenesh > > And if so, how can I refer the variables inside ? > > Thanks, > Praveenesh > > > On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: > >> thanks Daniel, >> so it means for all other complex datatypes, we need the file contents to >> be in that format >> like tuples in ( ), bag in { } , map in [ ] >> >> >> >> >> On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: >> >>> Hi, Praveenesh, >>> Your tmp.txt should be: >>> (1,2,3,4) >>> (2,3,4,5) >>> (4,5,5,6) >>> >>> And you cannot use "," as a delimit for PigStorage, otherwise, >>> PigStorage will split the line with comma first then parse the tuple. >>> >>> Daniel >>> >>> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <[EMAIL PROTECTED]> >>> wrote: >>> > Hi, >>> > >>> > I am trying to learn how can I store records in tuples ? >>> > >>> > Suppose I have a txt file >>> > >>> > $ cat tmp.txt >>> > >>> > 1,2,3,4 >>> > 2,3,4,5 >>> > 4,5,5,6 >>> > >>> > I am doing this >>> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS >>> > (t:tuple(int:a,int:b,int:c,int:d)); >>> > $ pig > Dump A; >>> > I am getting nothing in the output >>> > ( ) >>> > ( ) >>> > ( ) >>> > >>> > Can anyone help me understanding why its happening ? >>> > Even if I don't use PigStorage nothing is coming. >>> > >>> > Thanks, >>> > Praveenesh >>> >> >> >
-
Re: How to use tuples ?
praveenesh kumar 2012-02-02, 10:43
Okie so its wierd.
I was able to run a pig query using $0.$0
the pig script I wrote for the data (tmp.txt) :
(1,2,3) (2,4,5) (2,3,4) (2,3,5)
z = load 'tmp.txt'; x = foreach z generate $0.$0; dump x;
It ran fine for first time. But now its giving me error :
ERROR 1066: Unable to open iterator for alias x
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias x at org.apache.pig.PigServer.openIterator(PigServer.java:858) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:523) at org.apache.pig.Main.main(Main.java:148) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:850) ... 12 more ============================ On Thu, Feb 2, 2012 at 3:39 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote:
> Okie got it.Thanks for guiding. > Without schema. we can refer through $0.$0 or $1.$0 and so on based on the > positions.. > > Thanks, > Praveenesh > > On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: > >> One more thing, suppose I have data - tmp.txt lie >> (1,2,3) (2,4,5) >> (2,3,4) (2,3,5) >> >> So if I will use Z1 = Load 'tmp.txt' >> The data will get stored in a bag (right?) >> >> ( (1,2,3), (2,4,5) ) >> ( (2,3,4), (2,3,5) ) >> >> Now I can refer to the fields in this case ( without schema ) ? >> >> B = Foreach Z1 generate Z1.$0; >> >> This generates error. How can I do it correctly ? >> >> Thanks, >> Praveenesh >> >> And if so, how can I refer the variables inside ? >> >> Thanks, >> Praveenesh >> >> >> On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: >> >>> thanks Daniel, >>> so it means for all other complex datatypes, we need the file contents >>> to be in that format >>> like tuples in ( ), bag in { } , map in [ ] >>> >>> >>> >>> >>> On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <[EMAIL PROTECTED]>wrote: >>> >>>> Hi, Praveenesh, >>>> Your tmp.txt should be: >>>> (1,2,3,4) >>>> (2,3,4,5) >>>> (4,5,5,6) >>>> >>>> And you cannot use "," as a delimit for PigStorage, otherwise, >>>> PigStorage will split the line with comma first then parse the tuple. >>>> >>>> Daniel >>>> >>>> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <[EMAIL PROTECTED]> >>>> wrote: >>>> > Hi, >>>> > >>>> > I am trying to learn how can I store records in tuples ? >>>> > >>>> > Suppose I have a txt file >>>> > >>>> > $ cat tmp.txt >>>> > >>>> > 1,2,3,4 >>>> > 2,3,4,5 >>>> > 4,5,5,6 >>>> > >>>> > I am doing this >>>> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS >>>> > (t:tuple(int:a,int:b,int:c,int:d)); >>>> > $ pig > Dump A; >>>> > I am getting nothing in the output >>>> > ( ) >>>> > ( ) >>>> > ( ) >>>> > >>>> > Can anyone help me understanding why its happening ? >>>> > Even if I don't use PigStorage nothing is coming. >>>> > >>>> > Thanks, >>>> > Praveenesh >>>> >>> >>> >> >
-
Re: How to use tuples ?
Daniel Dai 2012-02-06, 07:40
I guess you mean to load a bag. Your input file should be: {(1,2,3),(2,4,5)} {(2,3,4),(2,3,5)}
And load statement should be: z = load 'tmp.txt' as (b:{(a0:int,a1:int,a2:int)});
Daniel
On Thu, Feb 2, 2012 at 2:43 AM, praveenesh kumar <[EMAIL PROTECTED]> wrote: > Okie so its wierd. > > I was able to run a pig query using $0.$0 > > the pig script I wrote for the data (tmp.txt) : > > (1,2,3) (2,4,5) > (2,3,4) (2,3,5) > > z = load 'tmp.txt'; > x = foreach z generate $0.$0; > dump x; > > It ran fine for first time. But now its giving me error : > > ERROR 1066: Unable to open iterator for alias x > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias x > at org.apache.pig.PigServer.openIterator(PigServer.java:858) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > at org.apache.pig.Main.run(Main.java:523) > at org.apache.pig.Main.main(Main.java:148) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: java.io.IOException: Job terminated with anomalous status FAILED > at org.apache.pig.PigServer.openIterator(PigServer.java:850) > ... 12 more > ============================> > On Thu, Feb 2, 2012 at 3:39 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: > >> Okie got it.Thanks for guiding. >> Without schema. we can refer through $0.$0 or $1.$0 and so on based on the >> positions.. >> >> Thanks, >> Praveenesh >> >> On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: >> >>> One more thing, suppose I have data - tmp.txt lie >>> (1,2,3) (2,4,5) >>> (2,3,4) (2,3,5) >>> >>> So if I will use Z1 = Load 'tmp.txt' >>> The data will get stored in a bag (right?) >>> >>> ( (1,2,3), (2,4,5) ) >>> ( (2,3,4), (2,3,5) ) >>> >>> Now I can refer to the fields in this case ( without schema ) ? >>> >>> B = Foreach Z1 generate Z1.$0; >>> >>> This generates error. How can I do it correctly ? >>> >>> Thanks, >>> Praveenesh >>> >>> And if so, how can I refer the variables inside ? >>> >>> Thanks, >>> Praveenesh >>> >>> >>> On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: >>> >>>> thanks Daniel, >>>> so it means for all other complex datatypes, we need the file contents >>>> to be in that format >>>> like tuples in ( ), bag in { } , map in [ ] >>>> >>>> >>>> >>>> >>>> On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi, Praveenesh, >>>>> Your tmp.txt should be: >>>>> (1,2,3,4) >>>>> (2,3,4,5) >>>>> (4,5,5,6) >>>>> >>>>> And you cannot use "," as a delimit for PigStorage, otherwise, >>>>> PigStorage will split the line with comma first then parse the tuple. >>>>> >>>>> Daniel >>>>> >>>>> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <[EMAIL PROTECTED]> >>>>> wrote: >>>>> > Hi, >>>>> > >>>>> > I am trying to learn how can I store records in tuples ? >>>>> > >>>>> > Suppose I have a txt file >>>>> > >>>>> > $ cat tmp.txt >>>>> > >>>>> > 1,2,3,4 >>>>> > 2,3,4,5 >>>>> > 4,5,5,6 >>>>> > >>>>> > I am doing this >>>>> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS >>>>> > (t:tuple(int:a,int:b,int:c,int:d)); >>>>> > $ pig > Dump A; >>>>> > I am getting nothing in the output >>>>> > ( ) >>>>> > ( ) >>>>> > ( ) >>>>> > >>>>> > Can anyone help me understanding why its happening ?
|
|