Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Syntax and HBaseStorage questions


Copy link to this message
-
Re: Syntax and HBaseStorage questions
Thanks for the pointer. :)

regards,
Eric

On Thu, Dec 30, 2010 at 2:15 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Ah, I see. There is no such function available right now.
> There is some discussion of such a feature here:
> https://issues.apache.org/jira/browse/PIG-1693
> As you can see, there isn't yet a consensus on how such syntax would work.
> Feel free to weigh in.
>
> -Dmitriy
>
> On Wed, Dec 29, 2010 at 9:12 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>
>> Hi Dmitriy,
>>
>> Issue filed: https://issues.apache.org/jira/browse/PIG-1782
>>
>> I meant to say columns in my previous message.  It should read as
>> "Make alteration of a column in a bug, but not specifying other
>> columns in the same bag".
>>
>> Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example
>> should contains 250 columns.
>> The next line that I write, would look like this:
>>
>> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
>> $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250;
>>
>> It would be nice if the statement can be written like this:
>>
>> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
>> $1) as rowID, MIRROR($2..$250);
>>
>> Is there something like this in pig built-in functions?
>>
>> regards,
>> Eric
>>
>> On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
>> wrote:
>> > Hi Eric,
>> > Yes, we can certainly add the convention that a string without a ":"
>> refers
>> > to a complete column family.
>> > It should be fairly straightforward.. step 1 is to open a ticket on the
>> > Jira, step to is to do it :).
>> >
>> > I am not sure what you mean by "make alteration of a tuple in a bag, but
>> not
>> > specifying other tuples in the same bag" -- can you provide an example
>> that
>> > illustrates what you want to do?
>> >
>> > Thanks,
>> > -Dmitriy
>> >
>> > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>> >
>> >> Hi,
>> >>
>> >> Consider this use case:
>> >>
>> >> There is a program store cpu usage metrics to a HBase table.  This
>> >> HBase table has a column family called cpu, and individual cpu core
>> >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc.  The
>> >> suffix number represent unique cpu core id in the system.
>> >>
>> >> While it is possible to write query like:
>> >>
>> >> SystemMetrics = load 'hbase://SystemMetrics' USING
>> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster
>> >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS
>> >> (rowKey: chararray, cluster: chararray, cpuCombined0:float,
>> >> cpuCombined1:float ... LoadAverage:float);
>> >>
>> >> To get a long list of columns to load and specify the same list in
>> >> group by command like:
>> >>
>> >> CleanseBuffer = foreach SystemMetrics generate
>> >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0,
>> >> cpuCombined1, ..., LoadAverage;
>> >>
>> >> The syntax works fine, but it would be nice to load all columns of a
>> >> given column family without specifying individual columns.
>> >>
>> >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING
>> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu
>> >> system');
>> >>
>> >> Is this syntax possible to implement in pig?
>> >>
>> >> Second question, is it possible to make alteration of a tuple in a
>> >> bag, but not specifying other tuples in the same bag?
>> >>
>> >> For large column tables, it would be nice if there is short hand
>> >> syntax to make pig syntax shorter to write.
>> >> Any tip on making foreach and group by shorter?  Thanks
>> >>
>> >> regards,
>> >> Eric
>> >>
>> >
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB