|
|
-
Syntax and HBaseStorage questions
Eric Yang 2010-12-29, 07:10
Hi,
Consider this use case:
There is a program store cpu usage metrics to a HBase table. This HBase table has a column family called cpu, and individual cpu core usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The suffix number represent unique cpu core id in the system.
While it is possible to write query like:
SystemMetrics = load 'hbase://SystemMetrics' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS (rowKey: chararray, cluster: chararray, cpuCombined0:float, cpuCombined1:float ... LoadAverage:float);
To get a long list of columns to load and specify the same list in group by command like:
CleanseBuffer = foreach SystemMetrics generate REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, cpuCombined1, ..., LoadAverage;
The syntax works fine, but it would be nice to load all columns of a given column family without specifying individual columns.
i.e. SystemMetrics = load 'hbase://SystemMetrics' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu system');
Is this syntax possible to implement in pig?
Second question, is it possible to make alteration of a tuple in a bag, but not specifying other tuples in the same bag?
For large column tables, it would be nice if there is short hand syntax to make pig syntax shorter to write. Any tip on making foreach and group by shorter? Thanks
regards, Eric
-
Re: Syntax and HBaseStorage questions
Dmitriy Ryaboy 2010-12-30, 02:09
Hi Eric, Yes, we can certainly add the convention that a string without a ":" refers to a complete column family. It should be fairly straightforward.. step 1 is to open a ticket on the Jira, step to is to do it :).
I am not sure what you mean by "make alteration of a tuple in a bag, but not specifying other tuples in the same bag" -- can you provide an example that illustrates what you want to do?
Thanks, -Dmitriy
On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
> Hi, > > Consider this use case: > > There is a program store cpu usage metrics to a HBase table. This > HBase table has a column family called cpu, and individual cpu core > usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The > suffix number represent unique cpu core id in the system. > > While it is possible to write query like: > > SystemMetrics = load 'hbase://SystemMetrics' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster > cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS > (rowKey: chararray, cluster: chararray, cpuCombined0:float, > cpuCombined1:float ... LoadAverage:float); > > To get a long list of columns to load and specify the same list in > group by command like: > > CleanseBuffer = foreach SystemMetrics generate > REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, > cpuCombined1, ..., LoadAverage; > > The syntax works fine, but it would be nice to load all columns of a > given column family without specifying individual columns. > > i.e. SystemMetrics = load 'hbase://SystemMetrics' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu > system'); > > Is this syntax possible to implement in pig? > > Second question, is it possible to make alteration of a tuple in a > bag, but not specifying other tuples in the same bag? > > For large column tables, it would be nice if there is short hand > syntax to make pig syntax shorter to write. > Any tip on making foreach and group by shorter? Thanks > > regards, > Eric >
-
Re: Syntax and HBaseStorage questions
Eric Yang 2010-12-30, 05:12
Hi Dmitriy, Issue filed: https://issues.apache.org/jira/browse/PIG-1782I meant to say columns in my previous message. It should read as "Make alteration of a column in a bug, but not specifying other columns in the same bag". Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example should contains 250 columns. The next line that I write, would look like this: ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250; It would be nice if the statement can be written like this: ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), $1) as rowID, MIRROR($2..$250); Is there something like this in pig built-in functions? regards, Eric On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Hi Eric, > Yes, we can certainly add the convention that a string without a ":" refers > to a complete column family. > It should be fairly straightforward.. step 1 is to open a ticket on the > Jira, step to is to do it :). > > I am not sure what you mean by "make alteration of a tuple in a bag, but not > specifying other tuples in the same bag" -- can you provide an example that > illustrates what you want to do? > > Thanks, > -Dmitriy > > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> Consider this use case: >> >> There is a program store cpu usage metrics to a HBase table. This >> HBase table has a column family called cpu, and individual cpu core >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The >> suffix number represent unique cpu core id in the system. >> >> While it is possible to write query like: >> >> SystemMetrics = load 'hbase://SystemMetrics' USING >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS >> (rowKey: chararray, cluster: chararray, cpuCombined0:float, >> cpuCombined1:float ... LoadAverage:float); >> >> To get a long list of columns to load and specify the same list in >> group by command like: >> >> CleanseBuffer = foreach SystemMetrics generate >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, >> cpuCombined1, ..., LoadAverage; >> >> The syntax works fine, but it would be nice to load all columns of a >> given column family without specifying individual columns. >> >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu >> system'); >> >> Is this syntax possible to implement in pig? >> >> Second question, is it possible to make alteration of a tuple in a >> bag, but not specifying other tuples in the same bag? >> >> For large column tables, it would be nice if there is short hand >> syntax to make pig syntax shorter to write. >> Any tip on making foreach and group by shorter? Thanks >> >> regards, >> Eric >> >
-
Re: Syntax and HBaseStorage questions
Dmitriy Ryaboy 2010-12-30, 10:15
Ah, I see. There is no such function available right now. There is some discussion of such a feature here: https://issues.apache.org/jira/browse/PIG-1693As you can see, there isn't yet a consensus on how such syntax would work. Feel free to weigh in. -Dmitriy On Wed, Dec 29, 2010 at 9:12 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > Hi Dmitriy, > > Issue filed: https://issues.apache.org/jira/browse/PIG-1782> > I meant to say columns in my previous message. It should read as > "Make alteration of a column in a bug, but not specifying other > columns in the same bag". > > Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example > should contains 250 columns. > The next line that I write, would look like this: > > ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), > $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250; > > It would be nice if the statement can be written like this: > > ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), > $1) as rowID, MIRROR($2..$250); > > Is there something like this in pig built-in functions? > > regards, > Eric > > On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > Hi Eric, > > Yes, we can certainly add the convention that a string without a ":" > refers > > to a complete column family. > > It should be fairly straightforward.. step 1 is to open a ticket on the > > Jira, step to is to do it :). > > > > I am not sure what you mean by "make alteration of a tuple in a bag, but > not > > specifying other tuples in the same bag" -- can you provide an example > that > > illustrates what you want to do? > > > > Thanks, > > -Dmitriy > > > > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > > > >> Hi, > >> > >> Consider this use case: > >> > >> There is a program store cpu usage metrics to a HBase table. This > >> HBase table has a column family called cpu, and individual cpu core > >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The > >> suffix number represent unique cpu core id in the system. > >> > >> While it is possible to write query like: > >> > >> SystemMetrics = load 'hbase://SystemMetrics' USING > >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster > >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS > >> (rowKey: chararray, cluster: chararray, cpuCombined0:float, > >> cpuCombined1:float ... LoadAverage:float); > >> > >> To get a long list of columns to load and specify the same list in > >> group by command like: > >> > >> CleanseBuffer = foreach SystemMetrics generate > >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, > >> cpuCombined1, ..., LoadAverage; > >> > >> The syntax works fine, but it would be nice to load all columns of a > >> given column family without specifying individual columns. > >> > >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING > >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu > >> system'); > >> > >> Is this syntax possible to implement in pig? > >> > >> Second question, is it possible to make alteration of a tuple in a > >> bag, but not specifying other tuples in the same bag? > >> > >> For large column tables, it would be nice if there is short hand > >> syntax to make pig syntax shorter to write. > >> Any tip on making foreach and group by shorter? Thanks > >> > >> regards, > >> Eric > >> > > >
-
Re: Syntax and HBaseStorage questions
Eric Yang 2010-12-30, 19:32
Thanks for the pointer. :) regards, Eric On Thu, Dec 30, 2010 at 2:15 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Ah, I see. There is no such function available right now. > There is some discussion of such a feature here: > https://issues.apache.org/jira/browse/PIG-1693> As you can see, there isn't yet a consensus on how such syntax would work. > Feel free to weigh in. > > -Dmitriy > > On Wed, Dec 29, 2010 at 9:12 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > >> Hi Dmitriy, >> >> Issue filed: https://issues.apache.org/jira/browse/PIG-1782>> >> I meant to say columns in my previous message. It should read as >> "Make alteration of a column in a bug, but not specifying other >> columns in the same bag". >> >> Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example >> should contains 250 columns. >> The next line that I write, would look like this: >> >> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), >> $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250; >> >> It would be nice if the statement can be written like this: >> >> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), >> $1) as rowID, MIRROR($2..$250); >> >> Is there something like this in pig built-in functions? >> >> regards, >> Eric >> >> On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> >> wrote: >> > Hi Eric, >> > Yes, we can certainly add the convention that a string without a ":" >> refers >> > to a complete column family. >> > It should be fairly straightforward.. step 1 is to open a ticket on the >> > Jira, step to is to do it :). >> > >> > I am not sure what you mean by "make alteration of a tuple in a bag, but >> not >> > specifying other tuples in the same bag" -- can you provide an example >> that >> > illustrates what you want to do? >> > >> > Thanks, >> > -Dmitriy >> > >> > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[EMAIL PROTECTED]> wrote: >> > >> >> Hi, >> >> >> >> Consider this use case: >> >> >> >> There is a program store cpu usage metrics to a HBase table. This >> >> HBase table has a column family called cpu, and individual cpu core >> >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The >> >> suffix number represent unique cpu core id in the system. >> >> >> >> While it is possible to write query like: >> >> >> >> SystemMetrics = load 'hbase://SystemMetrics' USING >> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster >> >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS >> >> (rowKey: chararray, cluster: chararray, cpuCombined0:float, >> >> cpuCombined1:float ... LoadAverage:float); >> >> >> >> To get a long list of columns to load and specify the same list in >> >> group by command like: >> >> >> >> CleanseBuffer = foreach SystemMetrics generate >> >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, >> >> cpuCombined1, ..., LoadAverage; >> >> >> >> The syntax works fine, but it would be nice to load all columns of a >> >> given column family without specifying individual columns. >> >> >> >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING >> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu >> >> system'); >> >> >> >> Is this syntax possible to implement in pig? >> >> >> >> Second question, is it possible to make alteration of a tuple in a >> >> bag, but not specifying other tuples in the same bag? >> >> >> >> For large column tables, it would be nice if there is short hand >> >> syntax to make pig syntax shorter to write. >> >> Any tip on making foreach and group by shorter? Thanks >> >> >> >> regards, >> >> Eric >> >> >> > >> >
|
|