Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Are records tuple


Copy link to this message
-
Re: Are records tuple
Mohit Anchlia 2012-04-23, 01:27
Thanks for the response, that helps. I was thinking the same but now
knowing too much about pig I wanted to clarify. I'll look at how to use
PigStorage in my unit test.

On Sun, Apr 22, 2012 at 3:47 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> You are trying to read a string that represents a tuple using binary
> deserialization.
>
> Pig has an abstraction called LoadFunc that knows how to read data off
> disk and turn it into tuples (yes, records are tuples).  PigStorage is
> one such LoadFunc, and it reads data represented as strings such as
> what you are trying to feed in.  There are other load funcs that know
> how to read other serializations and interpret the data in very
> different ways (json, avro, thrift, records from a database, xml...).
> There is no way for Tuple.readFields to know what format you are
> trying to feed into it. Tuples serialization is used for intermediate
> serialization between MR jobs and is not intended for the end-user.
>
> You should be using the appropriate LoadFunc to create tuples
> (PigStorage in this case?), or create them in code as I demonstrated
> earlier.
>
> You might find ReadToEndLoader, which wraps a real loadfunc and helps
> with some details of instantiating input formats, getting splits, etc,
> helpful:
> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/impl/io/ReadToEndLoader.html
>
> But really, you should just create the tuples you want in code rather
> than involve all of this machinery.
>
> D
>
>
> On Sun, Apr 22, 2012 at 9:56 AM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> > Could someone help mw answer this question if records (each line) => tuples?
> >
> > On Fri, Apr 20, 2012 at 4:22 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
> >
> >> I am writing unit test but I had a doubt. My understanding is that
> >> complete record is a tuple. So record "a b
> >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
> >> Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}"
> >> which is one line in a file is a tuple? But I somehow feel it's not
> right.
> >> Could someone please clarify?
> >>
> >> Below is the code, my test is incomplete but just pasting it to show
> how I
> >> am constructing this tuple.
> >>
> >>
> >>   TupleFactory mTupleFactory = TupleFactory.getInstance();
> >>  BagFactory mBagFactory = BagFactory.getInstance();
> >>
> >>  @Test
> >>  public void evalFuncTest() throws IOException{
> >>   String record = "a b
> >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
> >> Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}";
> >>   Tuple t = mTupleFactory.newTuple();
> >>   DataInput in = new DataInputStream(new
> >> ByteArrayInputStream(record.getBytes()));
> >>   t.readFields(in);
> >>  }
> >>
>