Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase Writes With Large Number of Columns


Copy link to this message
-
Re: HBase Writes With Large Number of Columns
Ted Yu 2013-03-27, 22:33
>From http://hbase.apache.org/book.html#hbase.rpc :

Optionally, Cells(KeyValues) can be passed outside of protobufs in
follow-behind Cell blocks (because we can’t protobuf megabytes of
KeyValues<https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#>
or
Cells). These CellBlocks are encoded and optionally compressed.

>From IPCUtil, you should find this:

  ByteBuffer buildCellBlock(final Codec codec, final CompressionCodec
compressor,

      final CellScanner cells)
Cheers

On Wed, Mar 27, 2013 at 3:28 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> CellBlock == KeyValue?
>
>
> On Thu, Mar 28, 2013 at 12:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > For 0.95 and beyond, HBaseClient is able to specify codec classes that
> > encode / compress CellBlock.
> > See the following in HBaseClient#Connection :
> >
> >       builder.setCellBlockCodecClass(this.codec
> > .getClass().getCanonicalName());
> >
> >       if (this.compressor != null) {
> >
> >         builder.setCellBlockCompressorClass(this.compressor
> > .getClass().getCanonicalName());
> >
> >       }
> > Cheers
> >
> > On Wed, Mar 27, 2013 at 2:52 PM, Asaf Mesika <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Correct me if I'm wrong, but you the drop is expected, according to the
> > > following math:
> > >
> > > If you have a Put, for a specific rowkey, and that rowkey weighs 100
> > bytes,
> > > then if you have 20 columns you should add the following size to the
> > > combined size of the columns:
> > > 20 x (100 bytes) = 2000 bytes
> > > So the size of the Put sent to HBase should be:
> > > 1500 bytes (sum of all column qualifier size) + 20x100 (size of row
> key).
> > >
> > > I add this 20x100 since, for each column qualifier, the Put object is
> > > adding another KeyValue member object, which duplicates the RowKey.
> > > See here (take from Put.java, v0.94.3 I think):
> > >
> > >   public Put add(byte [] family, byte [] qualifier, long ts, byte []
> > value)
> > > {
> > >
> > >     List<KeyValue> list = getKeyValueList(family);
> > >
> > >     KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> > >
> > >     list.add(kv);
> > >
> > >     familyMap.put(kv.getFamily(), list);
> > >
> > >     return this;
> > >   }
> > >
> > > Each KeyValue also add more information which should also be taken into
> > > account per Column Qualifier:
> > > * KeyValue overhead - I think 2 longs
> > > * Column Family length
> > > * Timestamp - 1 long
> > >
> > > I wrote a class to calculate a rough size of the HBase List<Put> size
> > sent
> > > to HBase, so I can calculate the throughput:
> > >
> > > public class HBaseUtils {
> > >
> > >     public static long getSize(List<? extends Row> actions) {
> > >         long size = 0;
> > >         for (Row row : actions) {
> > >             size += getSize(row);
> > >         }
> > >         return size;
> > >     }
> > >
> > >     public static long getSize(Row row) {
> > >         if (row instanceof Increment) {
> > >             return calcSizeIncrement( (Increment) row);
> > >         } else if (row instanceof Put) {
> > >             return calcSizePut((Put) row);
> > >         } else {
> > >             throw new IllegalArgumentException("Can't calculate size
> for
> > > Row type "+row.getClass());
> > >         }
> > >     }
> > >
> > >     private static long calcSizePut(Put put) {
> > >         long size = 0;
> > >         size += put.getRow().length;
> > >
> > >         Map<byte[], List<KeyValue>> familyMap = put.getFamilyMap();
> > >         for (byte[] family : familyMap.keySet()) {
> > >             size += family.length;
> > >             List<KeyValue> kvs = familyMap.get(family);
> > >             for (KeyValue kv : kvs) {
> > >                 size += kv.getLength();
> > >             }
> > >         }
> > >         return size;
> > >
> > >     }
> > >
> > >     private static long calcSizeIncrement(Increment row) {
> > >         long size = 0;
> > >
> > >         size += row.getRow().length;