Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Writes With Large Number of Columns


+
Pankaj Misra 2013-03-25, 16:55
+
Ted Yu 2013-03-25, 16:59
+
Pankaj Misra 2013-03-25, 17:18
+
Ted Yu 2013-03-25, 17:45
+
Pankaj Misra 2013-03-25, 18:03
+
Ted Yu 2013-03-25, 18:24
+
Jean-Marc Spaggiari 2013-03-25, 18:27
+
Pankaj Misra 2013-03-25, 18:40
+
Ted Yu 2013-03-25, 19:39
+
Pankaj Misra 2013-03-25, 20:54
+
Jean-Marc Spaggiari 2013-03-25, 23:49
+
ramkrishna vasudevan 2013-03-26, 06:19
+
Asaf Mesika 2013-03-27, 21:52
Copy link to this message
-
Re: HBase Writes With Large Number of Columns
For 0.95 and beyond, HBaseClient is able to specify codec classes that
encode / compress CellBlock.
See the following in HBaseClient#Connection :

      builder.setCellBlockCodecClass(this.codec
.getClass().getCanonicalName());

      if (this.compressor != null) {

        builder.setCellBlockCompressorClass(this.compressor
.getClass().getCanonicalName());

      }
Cheers

On Wed, Mar 27, 2013 at 2:52 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Correct me if I'm wrong, but you the drop is expected, according to the
> following math:
>
> If you have a Put, for a specific rowkey, and that rowkey weighs 100 bytes,
> then if you have 20 columns you should add the following size to the
> combined size of the columns:
> 20 x (100 bytes) = 2000 bytes
> So the size of the Put sent to HBase should be:
> 1500 bytes (sum of all column qualifier size) + 20x100 (size of row key).
>
> I add this 20x100 since, for each column qualifier, the Put object is
> adding another KeyValue member object, which duplicates the RowKey.
> See here (take from Put.java, v0.94.3 I think):
>
>   public Put add(byte [] family, byte [] qualifier, long ts, byte [] value)
> {
>
>     List<KeyValue> list = getKeyValueList(family);
>
>     KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
>
>     list.add(kv);
>
>     familyMap.put(kv.getFamily(), list);
>
>     return this;
>   }
>
> Each KeyValue also add more information which should also be taken into
> account per Column Qualifier:
> * KeyValue overhead - I think 2 longs
> * Column Family length
> * Timestamp - 1 long
>
> I wrote a class to calculate a rough size of the HBase List<Put> size sent
> to HBase, so I can calculate the throughput:
>
> public class HBaseUtils {
>
>     public static long getSize(List<? extends Row> actions) {
>         long size = 0;
>         for (Row row : actions) {
>             size += getSize(row);
>         }
>         return size;
>     }
>
>     public static long getSize(Row row) {
>         if (row instanceof Increment) {
>             return calcSizeIncrement( (Increment) row);
>         } else if (row instanceof Put) {
>             return calcSizePut((Put) row);
>         } else {
>             throw new IllegalArgumentException("Can't calculate size for
> Row type "+row.getClass());
>         }
>     }
>
>     private static long calcSizePut(Put put) {
>         long size = 0;
>         size += put.getRow().length;
>
>         Map<byte[], List<KeyValue>> familyMap = put.getFamilyMap();
>         for (byte[] family : familyMap.keySet()) {
>             size += family.length;
>             List<KeyValue> kvs = familyMap.get(family);
>             for (KeyValue kv : kvs) {
>                 size += kv.getLength();
>             }
>         }
>         return size;
>
>     }
>
>     private static long calcSizeIncrement(Increment row) {
>         long size = 0;
>
>         size += row.getRow().length;
>
>         Map<byte[], NavigableMap<byte[], Long>> familyMap > row.getFamilyMap();
>         for (byte[] family : familyMap.keySet()) {
>             size += family.length;
>             NavigableMap<byte[], Long> qualifiersMap > familyMap.get(family);
>             for (byte[] qualifier : qualifiersMap.keySet()) {
>                 size += qualifier.length;
>                 size += Bytes.SIZEOF_LONG;;
>             }
>         }
>
>         return size;
>     }
> }
>
> Feel free to use it.
>
>
>
>
> On Tue, Mar 26, 2013 at 1:49 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
> > For a total of 1.5kb with 4 columns = 384 bytes/column
> > bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
> > -num_keys 1000000
> > 13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
> > cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
> > Current: [keys/s=4097, latency=24 ms], insertedUpTo=-1
> >
> > For a total of 1.5kb with 100 columns = 15 bytes/column
> > bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:15:100
+
Asaf Mesika 2013-03-27, 22:28
+
Ted Yu 2013-03-27, 22:33
+
Mohammad Tariq 2013-03-25, 19:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB