Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Usage of 'limit' with Pig for Hbase


Copy link to this message
-
Usage of 'limit' with Pig for Hbase
Hi!

I am using Pig 0.10.0 with Hbase in distributed mode to read the records
and I have used this command below.

fields = load 'hbase://documents' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
true  -limit 5') as (rowkey, fields:map[]);

I want pig to limit the records to only 5 but it is quite different. Please
see the logs below.

Input(s):
Successfully read 250 records (16520 bytes) from: "hbase://documents"

Output(s):
Successfully stored 250 records (19051 bytes) in:
"hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"

Counters:
> Total records written : 250
> Total bytes written : 19051
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> Job DAG:
> job_201303121846_0056
>
> 2013-03-13 14:43:10,186 [main] WARN
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250 time(s).
> 2013-03-13 14:43:10,186 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Success!
> 2013-03-13 14:43:10,210 [main] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 51
> 2013-03-13 14:43:10,211 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths to process : 51
Am I using the 'limit' keyword the wrong way ?

Please let me know your suggestions.

Thanks,
--
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB