Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Storing tuple into HBaseStorage


Copy link to this message
-
Re: Storing tuple into HBaseStorage
Shawn Hermans 2013-11-27, 23:14
That is a very good question.  I am not sure if there is an easy way to use
the alias of the field as the key. I looked at the Tuple class definition (
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/data/Tuple.html) and
it appears it does not give an option to get the name associated with a
particular tuple field.

One potential workaround to this issue is to define a simple UDF.  I
provided a quick, untested Jython UDF as an example at
https://gist.github.com/shawnhermans/7684660.   It hard codes the field
names as a part of the UDF, but you could add a second argument to the
function allowing it to pass in field names.

-Shawn
On Wed, Nov 27, 2013 at 12:27 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:

> Hi Shawn,
>
> Thanks for the advice.
>
> Can TOMAP generate a map from tuple using the alias of the field in the
> tuple as the key of the map and the field value as the value of the map?
> Form the documentation, TOMAP syntax is:
>
> TOMAP(key-expression, value-expression [, key-expression, value-expression
> ...])
>
> It does not look like it can use the alias of the field as the key... Any
> further advice? Thanks!
>
> Jerry
>
>
> On Wed, Nov 27, 2013 at 12:09 PM, Shawn Hermans <[EMAIL PROTECTED]
> >wrote:
>
> > You should be able to use a Pig map to do this.  Use the column name as
> the
> > key in the map and the value as the value.  You should be able to use the
> > builtin TOMAP function to generate the map (
> > http://pig.apache.org/docs/r0.11.0/func.html#tomap).  The HBaseStorage
> > documentation gives an example of storing a map using friends:* and
> info:*
> > as the column families.
> >
> >
> >
> http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> >
> > copy = STORE raw INTO 'hbase://SampleTableCopy'
> >        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> >        'info:first_name info:last_name friends:* info:*');
> >
> >
> >
> >
> > On Wed, Nov 27, 2013 at 10:57 AM, Jerry Lam <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hello Pig users,
> > >
> > > I want to store the entire tuple into hbase from Pig using
> HBaseStorage.
> > > I know that I can do something like:
> > >
> > > output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
> > > STORE output INTO 'hbase://outputtable' USING
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 .....
> > > f1:cN');
> > >
> > > Since the output contains tuples of 100 fields, I don't want to write
> > them
> > > manually. Additionally, I want to use the alias name of the field as
> the
> > > column name for hbase. Since the entire tuple goes into the same column
> > > family, I wonder if there is an easy way to express this in Pig?
> > >
> > > Thank you,
> > >
> > > Jerry
> > >
> >
>