Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Storing tuple into HBaseStorage


Copy link to this message
-
Re: Storing tuple into HBaseStorage
That is a very good question.  I am not sure if there is an easy way to use
the alias of the field as the key. I looked at the Tuple class definition (
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/data/Tuple.html) and
it appears it does not give an option to get the name associated with a
particular tuple field.

One potential workaround to this issue is to define a simple UDF.  I
provided a quick, untested Jython UDF as an example at
https://gist.github.com/shawnhermans/7684660.   It hard codes the field
names as a part of the UDF, but you could add a second argument to the
function allowing it to pass in field names.

-Shawn
On Wed, Nov 27, 2013 at 12:27 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:

> Hi Shawn,
>
> Thanks for the advice.
>
> Can TOMAP generate a map from tuple using the alias of the field in the
> tuple as the key of the map and the field value as the value of the map?
> Form the documentation, TOMAP syntax is:
>
> TOMAP(key-expression, value-expression [, key-expression, value-expression
> ...])
>
> It does not look like it can use the alias of the field as the key... Any
> further advice? Thanks!
>
> Jerry
>
>
> On Wed, Nov 27, 2013 at 12:09 PM, Shawn Hermans <[EMAIL PROTECTED]
> >wrote:
>
> > You should be able to use a Pig map to do this.  Use the column name as
> the
> > key in the map and the value as the value.  You should be able to use the
> > builtin TOMAP function to generate the map (
> > http://pig.apache.org/docs/r0.11.0/func.html#tomap).  The HBaseStorage
> > documentation gives an example of storing a map using friends:* and
> info:*
> > as the column families.
> >
> >
> >
> http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> >
> > copy = STORE raw INTO 'hbase://SampleTableCopy'
> >        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> >        'info:first_name info:last_name friends:* info:*');
> >
> >
> >
> >
> > On Wed, Nov 27, 2013 at 10:57 AM, Jerry Lam <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hello Pig users,
> > >
> > > I want to store the entire tuple into hbase from Pig using
> HBaseStorage.
> > > I know that I can do something like:
> > >
> > > output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
> > > STORE output INTO 'hbase://outputtable' USING
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 .....
> > > f1:cN');
> > >
> > > Since the output contains tuples of 100 fields, I don't want to write
> > them
> > > manually. Additionally, I want to use the alias name of the field as
> the
> > > column name for hbase. Since the entire tuple goes into the same column
> > > family, I wonder if there is an easy way to express this in Pig?
> > >
> > > Thank you,
> > >
> > > Jerry
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB