Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Usage of 'limit' with Pig for Hbase


Copy link to this message
-
Re: Usage of 'limit' with Pig for Hbase
kiran chitturi 2013-03-15, 03:16
Is this the good way to limit than using pig LIMIT like (fields = LIMIT
fields 5;) since filtering is already done while loading ?

Thanks,
On Thu, Mar 14, 2013 at 9:50 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> To explain what's going on:
> -limit for HBaseStorage limits the number of rows returned from *each
> region* in the hbase table. It's an optimization -- there is no way for the
> LIMIT operator to be pushed down to the loader, so you can do it explicitly
> if you know you only need a few rows and don't want to pull the rest from
> HBase just to drop them on the floor once they've been extracted and sent
> to your mappers.
>
>
> On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi
> <[EMAIL PROTECTED]>wrote:
>
> > Thank you. This cleared my doubt.
> >
> >
> > On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham <[EMAIL PROTECTED]>
> > wrote:
> >
> > > The -limit passed to HBaseStorage is the limit per mapper reading from
> > > HBase. If you want to limit overall records, also use LIMIT:
> > >
> > > fields = LIMIT fields 5;
> > >
> > >
> > > On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > Hi!
> > > >
> > > > I am using Pig 0.10.0 with Hbase in distributed mode to read the
> > records
> > > > and I have used this command below.
> > > >
> > > > fields = load 'hbase://documents' using
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
> > > > true  -limit 5') as (rowkey, fields:map[]);
> > > >
> > > > I want pig to limit the records to only 5 but it is quite different.
> > > Please
> > > > see the logs below.
> > > >
> > > > Input(s):
> > > > Successfully read 250 records (16520 bytes) from: "hbase://documents"
> > > >
> > > > Output(s):
> > > > Successfully stored 250 records (19051 bytes) in:
> > > > "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"
> > > >
> > > > Counters:
> > > > > Total records written : 250
> > > > > Total bytes written : 19051
> > > > > Spillable Memory Manager spill count : 0
> > > > > Total bags proactively spilled: 0
> > > > > Total records proactively spilled: 0
> > > > > Job DAG:
> > > > > job_201303121846_0056
> > > > >
> > > > > 2013-03-13 14:43:10,186 [main] WARN
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
> > > time(s).
> > > > > 2013-03-13 14:43:10,186 [main] INFO
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - Success!
> > > > > 2013-03-13 14:43:10,210 [main] INFO
> > > > >  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
> input
> > > > paths
> > > > > to process : 51
> > > > > 2013-03-13 14:43:10,211 [main] INFO
> > > > >  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
> > Total
> > > > > input paths to process : 51
> > > >
> > > >
> > > > Am I using the 'limit' keyword the wrong way ?
> > > >
> > > > Please let me know your suggestions.
> > > >
> > > > Thanks,
> > > > --
> > > > Kiran Chitturi
> > > >
> > > > <http://www.linkedin.com/in/kiranchitturi>
> > > >
> > >
> > >
> > >
> > > --
> > > *Note that I'm no longer using my Yahoo! email address. Please email me
> > at
> > > [EMAIL PROTECTED] going forward.*
> > >
> >
> >
> >
> > --
> > Kiran Chitturi
> >
> > <http://www.linkedin.com/in/kiranchitturi>
> >
>

--
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>