Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Check split points of a given table


+
Mastergeek 2013-09-20, 16:05
+
David Medinets 2013-09-20, 17:20
+
Eric Newton 2013-09-20, 16:13
+
Mastergeek 2013-09-23, 21:15
Copy link to this message
-
Re: Check split points of a given table
The !METADATA table isn't well documented, or part of the public API. And
it will change in future releases.

The row id consists of the table id, a semicolon, and the end-row.  When
there is no end-row, the row id ends with "<".

7;endrow
7;lastrow
7;zzzzzz
7<

Every tablet will have a "prev row" entry, which points to the previous
tablet's end row.  This entry contains a value for the prev row, which
starts with \x01 if the entry has a previous row, or \x00 if this is the
first tablet in the table.

7;endrow ~tab:~pr \x00
7;lastrow ~tab:~pr \x01endrow
7;zzzzzz ~tab;~pr \x01lastrow
7<          ~tab;~pr \x01zzzzzz

BTW, the tilde (~) is used to make sure that this entry occurs last in the
tablet.  The !METADATA should always have chains of end-row/prev-row
entries, except during splits and merges.

Tablets contain file references, which contain the file size, and estimated
key count.  Due to splits and bulk imports, the number of keys that apply
to a given tablet for a file reference is not precise.  The entry looks
like this:

7;endrow file:/t-000000/F000000j.rf 9999,123

This tablet points to
hdfs://namenode/accumulo/tables/7/t-000000/F000000j.rf.  The file is 9999
bytes long (compressed) and contains 123 key/value entries.

Since 1.4, file names (F000000.rf) should be universally unique.  The file
naming scheme is:

F- Result of a Flush, a minor compaction
C- Result of a major Compaction, but not over all files
A- Major compaction of All files, in which delete entries were removed.
M- Result of a Merging minor compaction.  A flush that was combined with
the smallest file because there were already too many files in the tablet.
B- A file that was bulk imported

So, if you scan the !METADATA table, looking for prev-row entries, and file
entries, you can get a reasonable estimate of size of each tablet,
including those that are empty.

When tables are cloned, the filenames are relative:

7;endrow file:../5/t-1234567/C00000f.rf 1234,56

In 1.6, the filenames will be absolute:

7;endrow file:hdfs://namenode:port/accumulo/tables/7/t-1234567/C00000f.rf
1234,56

tl;dr - use the file entries: the first number in the value is the file
size.

-Eric
On Mon, Sep 23, 2013 at 5:15 PM, Mastergeek <[EMAIL PROTECTED]> wrote:

> So I basically blanket scanned the !METADATA table, but I'm having trouble
> interpreting the information. I can't seem to find a clear definition of
> what is in that table so I'm having issues reading the data. A link, if you
> have one, or any kind of elaboration would be greatly appreciated.
>
> Thanks,
> Jeff
>
>
>
> -----
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Check-split-points-of-a-given-table-tp5478p5510.html
> Sent from the Developers mailing list archive at Nabble.com.
>
+
Mastergeek 2013-09-26, 20:22