Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Is there a method in the accumulo api to get the total bytes used and/or total key/value pairs for each tablet?


Copy link to this message
-
Re: Is there a method in the accumulo api to get the total bytes used and/or total key/value pairs for each tablet?
Keith Turner 2013-04-11, 13:50
On Thu, Apr 11, 2013 at 7:15 AM, Jeff Kubina <[EMAIL PROTECTED]> wrote:
> Is there a method in the accumulo api to get the total bytes used and/or
> total key/value pairs for each tablet? I believe I can get the total bytes
> used per tablet using HDFS file size calls on the tables directory, but what
> about the total key/value pairs for each tablet?
>

Jeff,

You can scan the metadata table to get this info.  A few pointers :

 * call Connector.tableOperations().tableIdMap() to convert your table
name to table id
 * do "new org.apache.accumulo.core.data.KeyExtent(Text, Text, Text)"
to create a KeyExtent that represents the tablet you are interested
in.
 * call KeyExtent.toMetaDataRange() to get a range to scan the metadata table
 * add the column
org.apache.accumulo.core.Constants.METADATA_DATAFILE_COLUMN_FAMILY to
the metadata table scanner
 * take the value from this scan and create a
org.apache.accumulo.core.util.MetadataTable.DataFileValue object, that
will have info you need

This file data in the metadata table may be an estimate or not
present.    In the case of a split, the children of the split have
estimated file sizes.  The sum of the childrens info is correct until
one of them compacts.  For bulk imported files, there is no info about
file size or #entries.   After a tablet is compacted, all of this info
will be correct.   You could call Connector.tableOperations.compact()
passing in a range that will compact just the tablet you want stats
about.

Keith