Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Syntax and HBaseStorage questions


Copy link to this message
-
Re: Syntax and HBaseStorage questions
Eric Yang 2010-12-30, 19:32
Thanks for the pointer. :)

regards,
Eric

On Thu, Dec 30, 2010 at 2:15 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Ah, I see. There is no such function available right now.
> There is some discussion of such a feature here:
> https://issues.apache.org/jira/browse/PIG-1693
> As you can see, there isn't yet a consensus on how such syntax would work.
> Feel free to weigh in.
>
> -Dmitriy
>
> On Wed, Dec 29, 2010 at 9:12 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>
>> Hi Dmitriy,
>>
>> Issue filed: https://issues.apache.org/jira/browse/PIG-1782
>>
>> I meant to say columns in my previous message.  It should read as
>> "Make alteration of a column in a bug, but not specifying other
>> columns in the same bag".
>>
>> Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example
>> should contains 250 columns.
>> The next line that I write, would look like this:
>>
>> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
>> $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250;
>>
>> It would be nice if the statement can be written like this:
>>
>> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
>> $1) as rowID, MIRROR($2..$250);
>>
>> Is there something like this in pig built-in functions?
>>
>> regards,
>> Eric
>>
>> On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
>> wrote:
>> > Hi Eric,
>> > Yes, we can certainly add the convention that a string without a ":"
>> refers
>> > to a complete column family.
>> > It should be fairly straightforward.. step 1 is to open a ticket on the
>> > Jira, step to is to do it :).
>> >
>> > I am not sure what you mean by "make alteration of a tuple in a bag, but
>> not
>> > specifying other tuples in the same bag" -- can you provide an example
>> that
>> > illustrates what you want to do?
>> >
>> > Thanks,
>> > -Dmitriy
>> >
>> > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>> >
>> >> Hi,
>> >>
>> >> Consider this use case:
>> >>
>> >> There is a program store cpu usage metrics to a HBase table.  This
>> >> HBase table has a column family called cpu, and individual cpu core
>> >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc.  The
>> >> suffix number represent unique cpu core id in the system.
>> >>
>> >> While it is possible to write query like:
>> >>
>> >> SystemMetrics = load 'hbase://SystemMetrics' USING
>> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster
>> >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS
>> >> (rowKey: chararray, cluster: chararray, cpuCombined0:float,
>> >> cpuCombined1:float ... LoadAverage:float);
>> >>
>> >> To get a long list of columns to load and specify the same list in
>> >> group by command like:
>> >>
>> >> CleanseBuffer = foreach SystemMetrics generate
>> >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0,
>> >> cpuCombined1, ..., LoadAverage;
>> >>
>> >> The syntax works fine, but it would be nice to load all columns of a
>> >> given column family without specifying individual columns.
>> >>
>> >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING
>> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu
>> >> system');
>> >>
>> >> Is this syntax possible to implement in pig?
>> >>
>> >> Second question, is it possible to make alteration of a tuple in a
>> >> bag, but not specifying other tuples in the same bag?
>> >>
>> >> For large column tables, it would be nice if there is short hand
>> >> syntax to make pig syntax shorter to write.
>> >> Any tip on making foreach and group by shorter?  Thanks
>> >>
>> >> regards,
>> >> Eric
>> >>
>> >
>>
>