Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Are records tuple


Copy link to this message
-
Re: Are records tuple
Thanks for the response, that helps. I was thinking the same but now
knowing too much about pig I wanted to clarify. I'll look at how to use
PigStorage in my unit test.

On Sun, Apr 22, 2012 at 3:47 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> You are trying to read a string that represents a tuple using binary
> deserialization.
>
> Pig has an abstraction called LoadFunc that knows how to read data off
> disk and turn it into tuples (yes, records are tuples).  PigStorage is
> one such LoadFunc, and it reads data represented as strings such as
> what you are trying to feed in.  There are other load funcs that know
> how to read other serializations and interpret the data in very
> different ways (json, avro, thrift, records from a database, xml...).
> There is no way for Tuple.readFields to know what format you are
> trying to feed into it. Tuples serialization is used for intermediate
> serialization between MR jobs and is not intended for the end-user.
>
> You should be using the appropriate LoadFunc to create tuples
> (PigStorage in this case?), or create them in code as I demonstrated
> earlier.
>
> You might find ReadToEndLoader, which wraps a real loadfunc and helps
> with some details of instantiating input formats, getting splits, etc,
> helpful:
> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/impl/io/ReadToEndLoader.html
>
> But really, you should just create the tuples you want in code rather
> than involve all of this machinery.
>
> D
>
>
> On Sun, Apr 22, 2012 at 9:56 AM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> > Could someone help mw answer this question if records (each line) => tuples?
> >
> > On Fri, Apr 20, 2012 at 4:22 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
> >
> >> I am writing unit test but I had a doubt. My understanding is that
> >> complete record is a tuple. So record "a b
> >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
> >> Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}"
> >> which is one line in a file is a tuple? But I somehow feel it's not
> right.
> >> Could someone please clarify?
> >>
> >> Below is the code, my test is incomplete but just pasting it to show
> how I
> >> am constructing this tuple.
> >>
> >>
> >>   TupleFactory mTupleFactory = TupleFactory.getInstance();
> >>  BagFactory mBagFactory = BagFactory.getInstance();
> >>
> >>  @Test
> >>  public void evalFuncTest() throws IOException{
> >>   String record = "a b
> >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
> >> Xxxxxx)}        {(OCCUP:xxxxxxx xxxxx),(AGE:55    ),(MARITAL:Married)}";
> >>   Tuple t = mTupleFactory.newTuple();
> >>   DataInput in = new DataInputStream(new
> >> ByteArrayInputStream(record.getBytes()));
> >>   t.readFields(in);
> >>  }
> >>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB