Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Syntax and HBaseStorage questions

Copy link to this message
Re: Syntax and HBaseStorage questions
Hi Dmitriy,

Issue filed: https://issues.apache.org/jira/browse/PIG-1782

I meant to say columns in my previous message.  It should read as
"Make alteration of a column in a bug, but not specifying other
columns in the same bag".

Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example
should contains 250 columns.
The next line that I write, would look like this:

ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
$1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250;

It would be nice if the statement can be written like this:

ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
$1) as rowID, MIRROR($2..$250);

Is there something like this in pig built-in functions?


On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Hi Eric,
> Yes, we can certainly add the convention that a string without a ":" refers
> to a complete column family.
> It should be fairly straightforward.. step 1 is to open a ticket on the
> Jira, step to is to do it :).
> I am not sure what you mean by "make alteration of a tuple in a bag, but not
> specifying other tuples in the same bag" -- can you provide an example that
> illustrates what you want to do?
> Thanks,
> -Dmitriy
> On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>> Hi,
>> Consider this use case:
>> There is a program store cpu usage metrics to a HBase table.  This
>> HBase table has a column family called cpu, and individual cpu core
>> usage is stored in columns like, cpu:user.0, cpu:user.1 etc.  The
>> suffix number represent unique cpu core id in the system.
>> While it is possible to write query like:
>> SystemMetrics = load 'hbase://SystemMetrics' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster
>> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS
>> (rowKey: chararray, cluster: chararray, cpuCombined0:float,
>> cpuCombined1:float ... LoadAverage:float);
>> To get a long list of columns to load and specify the same list in
>> group by command like:
>> CleanseBuffer = foreach SystemMetrics generate
>> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0,
>> cpuCombined1, ..., LoadAverage;
>> The syntax works fine, but it would be nice to load all columns of a
>> given column family without specifying individual columns.
>> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu
>> system');
>> Is this syntax possible to implement in pig?
>> Second question, is it possible to make alteration of a tuple in a
>> bag, but not specifying other tuples in the same bag?
>> For large column tables, it would be nice if there is short hand
>> syntax to make pig syntax shorter to write.
>> Any tip on making foreach and group by shorter?  Thanks
>> regards,
>> Eric