Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Usage of 'limit' with Pig for Hbase


Copy link to this message
-
Re: Usage of 'limit' with Pig for Hbase
To explain what's going on:
-limit for HBaseStorage limits the number of rows returned from *each
region* in the hbase table. It's an optimization -- there is no way for the
LIMIT operator to be pushed down to the loader, so you can do it explicitly
if you know you only need a few rows and don't want to pull the rest from
HBase just to drop them on the floor once they've been extracted and sent
to your mappers.
On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi
<[EMAIL PROTECTED]>wrote:

> Thank you. This cleared my doubt.
>
>
> On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham <[EMAIL PROTECTED]>
> wrote:
>
> > The -limit passed to HBaseStorage is the limit per mapper reading from
> > HBase. If you want to limit overall records, also use LIMIT:
> >
> > fields = LIMIT fields 5;
> >
> >
> > On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Hi!
> > >
> > > I am using Pig 0.10.0 with Hbase in distributed mode to read the
> records
> > > and I have used this command below.
> > >
> > > fields = load 'hbase://documents' using
> > >
> >
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
> > > true  -limit 5') as (rowkey, fields:map[]);
> > >
> > > I want pig to limit the records to only 5 but it is quite different.
> > Please
> > > see the logs below.
> > >
> > > Input(s):
> > > Successfully read 250 records (16520 bytes) from: "hbase://documents"
> > >
> > > Output(s):
> > > Successfully stored 250 records (19051 bytes) in:
> > > "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"
> > >
> > > Counters:
> > > > Total records written : 250
> > > > Total bytes written : 19051
> > > > Spillable Memory Manager spill count : 0
> > > > Total bags proactively spilled: 0
> > > > Total records proactively spilled: 0
> > > > Job DAG:
> > > > job_201303121846_0056
> > > >
> > > > 2013-03-13 14:43:10,186 [main] WARN
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
> > time(s).
> > > > 2013-03-13 14:43:10,186 [main] INFO
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - Success!
> > > > 2013-03-13 14:43:10,210 [main] INFO
> > > >  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
> > > paths
> > > > to process : 51
> > > > 2013-03-13 14:43:10,211 [main] INFO
> > > >  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
> Total
> > > > input paths to process : 51
> > >
> > >
> > > Am I using the 'limit' keyword the wrong way ?
> > >
> > > Please let me know your suggestions.
> > >
> > > Thanks,
> > > --
> > > Kiran Chitturi
> > >
> > > <http://www.linkedin.com/in/kiranchitturi>
> > >
> >
> >
> >
> > --
> > *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> > [EMAIL PROTECTED] going forward.*
> >
>
>
>
> --
> Kiran Chitturi
>
> <http://www.linkedin.com/in/kiranchitturi>
>