Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> schema examples


Hi Arshak,
  Here is how you might do it.  We implement everything with batch writers and batch scanners.  Note: if you are doing high ingest rates, the degree table can be tricky and usually requires pre-summing prior to ingestion to reduce the pressure on the accumulator inside of Accumulo.  Feel free to ask further questions as I would imagine that there a details that still wouldn't be clear.  In particular, why we do it this way.

Regards.  -Jeremy

Original data:

Machine,Pool,Load,ReadingTimestamp
neptune,west,5,1388191975000
neptune,west,9,1388191975010
pluto,east,13,1388191975090
Tedge table:
rowKey,columnQualifier,value

0005791918831-neptune,Machine|neptune,1
0005791918831-neptune,Pool|west,1
0005791918831-neptune,Load|5,1
0005791918831-neptune,ReadingTimestamp|1388191975000,1
0105791918831-neptune,Machine|neptune,1
0105791918831-neptune,Pool|west,1
0105791918831-neptune,Load|9,1
0105791918831-neptune,ReadingTimestamp|1388191975010,1
0905791918831-pluto,Machine|pluto,1
0905791918831-pluto,Pool|east,1
0905791918831-pluto,Load|13,1
0905791918831-pluto,ReadingTimestamp|1388191975090,1
TedgeTranspose table:
rowKey,columnQualifier,value

Machine|neptune,0005791918831-neptune,1
Pool|west,0005791918831-neptune,1
Load|5,0005791918831-neptune,1
ReadingTimestamp|1388191975000,0005791918831-neptune,1
Machine|neptune,0105791918831-neptune,1
Pool|west,0105791918831-neptune,1
Load|9,0105791918831-neptune,1
ReadingTimestamp|1388191975010,0105791918831-neptune,1
Machine|pluto,0905791918831-pluto,1
Pool|east,0905791918831-pluto,1
Load|13,0905791918831-pluto,1
ReadingTimestamp|1388191975090,0905791918831-pluto,1
TedgeDegree table:
rowKey,columnQualifier,value

Machine|neptune,Degree,2
Pool|west,Degree,2
Load|5,Degree,1
ReadingTimestamp|1388191975000,Degree,1
Load|9,Degree,1
ReadingTimestamp|1388191975010,Degree,1
Machine|pluto,Degree,1
Pool|east,Degree,1
Load|13,Degree,1
ReadingTimestamp|1388191975090,Degree,1
TedgeText table:
rowKey,columnQualifier,value

0005791918831-neptune,Text,< ... raw text of original log ...>
0105791918831-neptune,Text,< ... raw text of original log ...>
0905791918831-pluto,Text,< ... raw text of original log ...>

On Dec 27, 2013, at 8:01 PM, Arshak Navruzyan <[EMAIL PROTECTED]> wrote:

> Jeremy,
>
> Wow, didn't expect to get help from the author :)
>
> How about something simple like this:
>
> Machine    Pool      Load        ReadingTimestamp
> neptune     west      5            1388191975000
> neptune     west      9            1388191975010
> pluto         east       13           1388191975090
>
> These are the areas I am unclear on:
>
> 1.  Should the transpose table be built as part of ingest code or as an accumulo combiner?
> 2.  What does the degree table do in this example ?  The paper mentions it's useful for query optimization.  How?  
> 3.  Does D4M accommodate "repurposing" the row_id to a partition key?  The wikisearch shows how the partition id is important for parallel scans of the index.  But since Accumulo is a row store how can you do fast lookups by row if you've used the row_id as a partition key.
>
> Thank you,
>
> Arshak
>
>
>
>
>
>
> On Thu, Dec 26, 2013 at 5:31 PM, Jeremy Kepner <[EMAIL PROTECTED]> wrote:
> Hi Arshak,
>   Maybe you can send a few (~3) records of data that you are familiar with
> and we can walk you through how the D4M schema would be applied to those records.
>
> Regards.  -Jeremy
>
> On Thu, Dec 26, 2013 at 03:10:59PM -0500, Arshak Navruzyan wrote:
> >    Hello,
> >    I am trying to get my head around Accumulo schema designs.  I went through
> >    a lot of trouble to get the wikisearch example running but since the data
> >    in protobuf lists, it's not that illustrative (for a newbie).
> >    Would love to find another example that is a little simpler to understand.
> >     In particular I am interested in java/scala code that mimics the D4M
> >    schema design (not a Matlab guy).
> >    Thanks,
> >    Arshak
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB