Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Usage of 'limit' with Pig for Hbase


Copy link to this message
-
Re: Usage of 'limit' with Pig for Hbase
Is this the good way to limit than using pig LIMIT like (fields = LIMIT
fields 5;) since filtering is already done while loading ?

Thanks,
On Thu, Mar 14, 2013 at 9:50 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> To explain what's going on:
> -limit for HBaseStorage limits the number of rows returned from *each
> region* in the hbase table. It's an optimization -- there is no way for the
> LIMIT operator to be pushed down to the loader, so you can do it explicitly
> if you know you only need a few rows and don't want to pull the rest from
> HBase just to drop them on the floor once they've been extracted and sent
> to your mappers.
>
>
> On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi
> <[EMAIL PROTECTED]>wrote:
>
> > Thank you. This cleared my doubt.
> >
> >
> > On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham <[EMAIL PROTECTED]>
> > wrote:
> >
> > > The -limit passed to HBaseStorage is the limit per mapper reading from
> > > HBase. If you want to limit overall records, also use LIMIT:
> > >
> > > fields = LIMIT fields 5;
> > >
> > >
> > > On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > Hi!
> > > >
> > > > I am using Pig 0.10.0 with Hbase in distributed mode to read the
> > records
> > > > and I have used this command below.
> > > >
> > > > fields = load 'hbase://documents' using
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
> > > > true  -limit 5') as (rowkey, fields:map[]);
> > > >
> > > > I want pig to limit the records to only 5 but it is quite different.
> > > Please
> > > > see the logs below.
> > > >
> > > > Input(s):
> > > > Successfully read 250 records (16520 bytes) from: "hbase://documents"
> > > >
> > > > Output(s):
> > > > Successfully stored 250 records (19051 bytes) in:
> > > > "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"
> > > >
> > > > Counters:
> > > > > Total records written : 250
> > > > > Total bytes written : 19051
> > > > > Spillable Memory Manager spill count : 0
> > > > > Total bags proactively spilled: 0
> > > > > Total records proactively spilled: 0
> > > > > Job DAG:
> > > > > job_201303121846_0056
> > > > >
> > > > > 2013-03-13 14:43:10,186 [main] WARN
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
> > > time(s).
> > > > > 2013-03-13 14:43:10,186 [main] INFO
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - Success!
> > > > > 2013-03-13 14:43:10,210 [main] INFO
> > > > >  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
> input
> > > > paths
> > > > > to process : 51
> > > > > 2013-03-13 14:43:10,211 [main] INFO
> > > > >  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
> > Total
> > > > > input paths to process : 51
> > > >
> > > >
> > > > Am I using the 'limit' keyword the wrong way ?
> > > >
> > > > Please let me know your suggestions.
> > > >
> > > > Thanks,
> > > > --
> > > > Kiran Chitturi
> > > >
> > > > <http://www.linkedin.com/in/kiranchitturi>
> > > >
> > >
> > >
> > >
> > > --
> > > *Note that I'm no longer using my Yahoo! email address. Please email me
> > at
> > > [EMAIL PROTECTED] going forward.*
> > >
> >
> >
> >
> > --
> > Kiran Chitturi
> >
> > <http://www.linkedin.com/in/kiranchitturi>
> >
>

--
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB