Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Writes With Large Number of Columns


+
Pankaj Misra 2013-03-25, 16:55
+
Ted Yu 2013-03-25, 16:59
+
Pankaj Misra 2013-03-25, 17:18
+
Ted Yu 2013-03-25, 17:45
+
Pankaj Misra 2013-03-25, 18:03
+
Ted Yu 2013-03-25, 18:24
+
Jean-Marc Spaggiari 2013-03-25, 18:27
+
Pankaj Misra 2013-03-25, 18:40
+
Ted Yu 2013-03-25, 19:39
+
Pankaj Misra 2013-03-25, 20:54
+
Jean-Marc Spaggiari 2013-03-25, 23:49
+
ramkrishna vasudevan 2013-03-26, 06:19
+
Asaf Mesika 2013-03-27, 21:52
+
Ted Yu 2013-03-27, 22:06
Copy link to this message
-
Re: HBase Writes With Large Number of Columns
CellBlock == KeyValue?
On Thu, Mar 28, 2013 at 12:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> For 0.95 and beyond, HBaseClient is able to specify codec classes that
> encode / compress CellBlock.
> See the following in HBaseClient#Connection :
>
>       builder.setCellBlockCodecClass(this.codec
> .getClass().getCanonicalName());
>
>       if (this.compressor != null) {
>
>         builder.setCellBlockCompressorClass(this.compressor
> .getClass().getCanonicalName());
>
>       }
> Cheers
>
> On Wed, Mar 27, 2013 at 2:52 PM, Asaf Mesika <[EMAIL PROTECTED]>
> wrote:
>
> > Correct me if I'm wrong, but you the drop is expected, according to the
> > following math:
> >
> > If you have a Put, for a specific rowkey, and that rowkey weighs 100
> bytes,
> > then if you have 20 columns you should add the following size to the
> > combined size of the columns:
> > 20 x (100 bytes) = 2000 bytes
> > So the size of the Put sent to HBase should be:
> > 1500 bytes (sum of all column qualifier size) + 20x100 (size of row key).
> >
> > I add this 20x100 since, for each column qualifier, the Put object is
> > adding another KeyValue member object, which duplicates the RowKey.
> > See here (take from Put.java, v0.94.3 I think):
> >
> >   public Put add(byte [] family, byte [] qualifier, long ts, byte []
> value)
> > {
> >
> >     List<KeyValue> list = getKeyValueList(family);
> >
> >     KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> >
> >     list.add(kv);
> >
> >     familyMap.put(kv.getFamily(), list);
> >
> >     return this;
> >   }
> >
> > Each KeyValue also add more information which should also be taken into
> > account per Column Qualifier:
> > * KeyValue overhead - I think 2 longs
> > * Column Family length
> > * Timestamp - 1 long
> >
> > I wrote a class to calculate a rough size of the HBase List<Put> size
> sent
> > to HBase, so I can calculate the throughput:
> >
> > public class HBaseUtils {
> >
> >     public static long getSize(List<? extends Row> actions) {
> >         long size = 0;
> >         for (Row row : actions) {
> >             size += getSize(row);
> >         }
> >         return size;
> >     }
> >
> >     public static long getSize(Row row) {
> >         if (row instanceof Increment) {
> >             return calcSizeIncrement( (Increment) row);
> >         } else if (row instanceof Put) {
> >             return calcSizePut((Put) row);
> >         } else {
> >             throw new IllegalArgumentException("Can't calculate size for
> > Row type "+row.getClass());
> >         }
> >     }
> >
> >     private static long calcSizePut(Put put) {
> >         long size = 0;
> >         size += put.getRow().length;
> >
> >         Map<byte[], List<KeyValue>> familyMap = put.getFamilyMap();
> >         for (byte[] family : familyMap.keySet()) {
> >             size += family.length;
> >             List<KeyValue> kvs = familyMap.get(family);
> >             for (KeyValue kv : kvs) {
> >                 size += kv.getLength();
> >             }
> >         }
> >         return size;
> >
> >     }
> >
> >     private static long calcSizeIncrement(Increment row) {
> >         long size = 0;
> >
> >         size += row.getRow().length;
> >
> >         Map<byte[], NavigableMap<byte[], Long>> familyMap > > row.getFamilyMap();
> >         for (byte[] family : familyMap.keySet()) {
> >             size += family.length;
> >             NavigableMap<byte[], Long> qualifiersMap > > familyMap.get(family);
> >             for (byte[] qualifier : qualifiersMap.keySet()) {
> >                 size += qualifier.length;
> >                 size += Bytes.SIZEOF_LONG;;
> >             }
> >         }
> >
> >         return size;
> >     }
> > }
> >
> > Feel free to use it.
> >
> >
> >
> >
> > On Tue, Mar 26, 2013 at 1:49 AM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]> wrote:
> >
> > > For a total of 1.5kb with 4 columns = 384 bytes/column
> > > bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
+
Ted Yu 2013-03-27, 22:33
+
Mohammad Tariq 2013-03-25, 19:30