Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Storing tuple into HBaseStorage


Copy link to this message
-
Re: Storing tuple into HBaseStorage
Jerry Lam 2013-11-28, 01:49
Hi Shawn,

I see your point now. Thank you for your help!

Jerry
On Wed, Nov 27, 2013 at 6:14 PM, Shawn Hermans <[EMAIL PROTECTED]>wrote:

> That is a very good question.  I am not sure if there is an easy way to use
> the alias of the field as the key. I looked at the Tuple class definition (
> http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/data/Tuple.html) and
> it appears it does not give an option to get the name associated with a
> particular tuple field.
>
> One potential workaround to this issue is to define a simple UDF.  I
> provided a quick, untested Jython UDF as an example at
> https://gist.github.com/shawnhermans/7684660.   It hard codes the field
> names as a part of the UDF, but you could add a second argument to the
> function allowing it to pass in field names.
>
> -Shawn
>
>
> On Wed, Nov 27, 2013 at 12:27 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>
> > Hi Shawn,
> >
> > Thanks for the advice.
> >
> > Can TOMAP generate a map from tuple using the alias of the field in the
> > tuple as the key of the map and the field value as the value of the map?
> > Form the documentation, TOMAP syntax is:
> >
> > TOMAP(key-expression, value-expression [, key-expression,
> value-expression
> > ...])
> >
> > It does not look like it can use the alias of the field as the key... Any
> > further advice? Thanks!
> >
> > Jerry
> >
> >
> > On Wed, Nov 27, 2013 at 12:09 PM, Shawn Hermans <[EMAIL PROTECTED]
> > >wrote:
> >
> > > You should be able to use a Pig map to do this.  Use the column name as
> > the
> > > key in the map and the value as the value.  You should be able to use
> the
> > > builtin TOMAP function to generate the map (
> > > http://pig.apache.org/docs/r0.11.0/func.html#tomap).  The HBaseStorage
> > > documentation gives an example of storing a map using friends:* and
> > info:*
> > > as the column families.
> > >
> > >
> > >
> >
> http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > >
> > > copy = STORE raw INTO 'hbase://SampleTableCopy'
> > >        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> > >        'info:first_name info:last_name friends:* info:*');
> > >
> > >
> > >
> > >
> > > On Wed, Nov 27, 2013 at 10:57 AM, Jerry Lam <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hello Pig users,
> > > >
> > > > I want to store the entire tuple into hbase from Pig using
> > HBaseStorage.
> > > > I know that I can do something like:
> > > >
> > > > output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
> > > > STORE output INTO 'hbase://outputtable' USING
> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 .....
> > > > f1:cN');
> > > >
> > > > Since the output contains tuples of 100 fields, I don't want to write
> > > them
> > > > manually. Additionally, I want to use the alias name of the field as
> > the
> > > > column name for hbase. Since the entire tuple goes into the same
> column
> > > > family, I wonder if there is an easy way to express this in Pig?
> > > >
> > > > Thank you,
> > > >
> > > > Jerry
> > > >
> > >
> >
>