Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - schema examples


Copy link to this message
-
Re: schema examples
Kepner, Jeremy - 0553 - M... 2013-12-28, 18:36
Hi Arshak,
  Here is how you might do it.  We implement everything with batch writers and batch scanners.  Note: if you are doing high ingest rates, the degree table can be tricky and usually requires pre-summing prior to ingestion to reduce the pressure on the accumulator inside of Accumulo.  Feel free to ask further questions as I would imagine that there a details that still wouldn't be clear.  In particular, why we do it this way.

Regards.  -Jeremy

Original data:

Machine,Pool,Load,ReadingTimestamp
neptune,west,5,1388191975000
neptune,west,9,1388191975010
pluto,east,13,1388191975090
Tedge table:
rowKey,columnQualifier,value

0005791918831-neptune,Machine|neptune,1
0005791918831-neptune,Pool|west,1
0005791918831-neptune,Load|5,1
0005791918831-neptune,ReadingTimestamp|1388191975000,1
0105791918831-neptune,Machine|neptune,1
0105791918831-neptune,Pool|west,1
0105791918831-neptune,Load|9,1
0105791918831-neptune,ReadingTimestamp|1388191975010,1
0905791918831-pluto,Machine|pluto,1
0905791918831-pluto,Pool|east,1
0905791918831-pluto,Load|13,1
0905791918831-pluto,ReadingTimestamp|1388191975090,1
TedgeTranspose table:
rowKey,columnQualifier,value

Machine|neptune,0005791918831-neptune,1
Pool|west,0005791918831-neptune,1
Load|5,0005791918831-neptune,1
ReadingTimestamp|1388191975000,0005791918831-neptune,1
Machine|neptune,0105791918831-neptune,1
Pool|west,0105791918831-neptune,1
Load|9,0105791918831-neptune,1
ReadingTimestamp|1388191975010,0105791918831-neptune,1
Machine|pluto,0905791918831-pluto,1
Pool|east,0905791918831-pluto,1
Load|13,0905791918831-pluto,1
ReadingTimestamp|1388191975090,0905791918831-pluto,1
TedgeDegree table:
rowKey,columnQualifier,value

Machine|neptune,Degree,2
Pool|west,Degree,2
Load|5,Degree,1
ReadingTimestamp|1388191975000,Degree,1
Load|9,Degree,1
ReadingTimestamp|1388191975010,Degree,1
Machine|pluto,Degree,1
Pool|east,Degree,1
Load|13,Degree,1
ReadingTimestamp|1388191975090,Degree,1
TedgeText table:
rowKey,columnQualifier,value

0005791918831-neptune,Text,< ... raw text of original log ...>
0105791918831-neptune,Text,< ... raw text of original log ...>
0905791918831-pluto,Text,< ... raw text of original log ...>

On Dec 27, 2013, at 8:01 PM, Arshak Navruzyan <[EMAIL PROTECTED]> wrote:

> Jeremy,
>
> Wow, didn't expect to get help from the author :)
>
> How about something simple like this:
>
> Machine    Pool      Load        ReadingTimestamp
> neptune     west      5            1388191975000
> neptune     west      9            1388191975010
> pluto         east       13           1388191975090
>
> These are the areas I am unclear on:
>
> 1.  Should the transpose table be built as part of ingest code or as an accumulo combiner?
> 2.  What does the degree table do in this example ?  The paper mentions it's useful for query optimization.  How?  
> 3.  Does D4M accommodate "repurposing" the row_id to a partition key?  The wikisearch shows how the partition id is important for parallel scans of the index.  But since Accumulo is a row store how can you do fast lookups by row if you've used the row_id as a partition key.
>
> Thank you,
>
> Arshak
>
>
>
>
>
>
> On Thu, Dec 26, 2013 at 5:31 PM, Jeremy Kepner <[EMAIL PROTECTED]> wrote:
> Hi Arshak,
>   Maybe you can send a few (~3) records of data that you are familiar with
> and we can walk you through how the D4M schema would be applied to those records.
>
> Regards.  -Jeremy
>
> On Thu, Dec 26, 2013 at 03:10:59PM -0500, Arshak Navruzyan wrote:
> >    Hello,
> >    I am trying to get my head around Accumulo schema designs.  I went through
> >    a lot of trouble to get the wikisearch example running but since the data
> >    in protobuf lists, it's not that illustrative (for a newbie).
> >    Would love to find another example that is a little simpler to understand.
> >     In particular I am interested in java/scala code that mimics the D4M
> >    schema design (not a Matlab guy).
> >    Thanks,
> >    Arshak
>