I guess a little of both. In the Enron email set I have a bunch of folders
representing people. Each folder has subfolders that equate to mailboxes
(inbox, sent_mail, etc...). Each mailbox simply contains text files named
1, 2, 3, 4 that equate to an individual email.
Each email is a text file that if easy to parse into specific fields. I
want to place those emails in accumulo and run some simple MapReduce for
the demo. Similar to what I saw in some *Cloudbase *training last year.
What I didn't remember is how the tables were arranged.
I was just going to make each email, regardless of mailbox as a row in
accumulo and make make the mailbox and owner separate columns (or column
qualifier to be more specific). My issue is the To and CC fields. Each can
be a list. I was thinking of making the column family to and the column
qualifier 1,2,3, ...). I could also make the column qualifier for the to
family the actual value "[EMAIL PROTECTED]". I wasn't exactly sure of the best
Each email has a Message_ID and so far I think they are unique. If not I
can generate a unique ID.
Again this will be for a simple demo where people may want to search from
some person, to some person and maybe for specific terms in the body of the
Hope this gives a good idea of what I am trying to do. Feel free to ask any
other questions you may have if I wasn';t clear enough. Again I have more
experience working with existing structures. I am trying t use this
experience to learn a little about how to organize the data.
thanks in advance,
On Thu, Jan 3, 2013 at 12:28 PM, John Vines <[EMAIL PROTECTED]> wrote:
> Are you looking for generic pointers for it or do you have specific
> questions? Feel free to ask away and someone will be able to help.
> On Thu, Jan 3, 2013 at 12:23 PM, Tim Piety <[EMAIL PROTECTED]> wrote:
>> No I hadn't. Thank you that was it. I to another look at the install doc
>> and didn't see this step in there. I then looked at the README file on the
>> ACCUMULO website and it is in there.
>> I was able to start accumulo and then start an accumulo shell and execute
>> the tables command and it listed !METADATA. I presume that this means I am
>> up and running.
>> I am going to use the enron dataset for my demo. I do have a few
>> questions regarding how to structure it if you don't mind a few more
>> thanks again.
>> On Thu, Jan 3, 2013 at 12:07 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>> Did you initialize accumulo by running bin/accumulo init?
>>> On Thu, Jan 3, 2013 at 12:02 PM, Tim Piety <[EMAIL PROTECTED]> wrote:
>>>> I posted a message the the dev list before Xmas and got not response. I
>>>> figuered I'd try this list. If this is not the correct forum can someone
>>>> please let me know what the correct forum is. I am trying to install
>>>> accumulo for a simple demo. I have hadoop installed and running. I verified
>>>> by testing a mapreduce program and I can look at the HDFS system.
>>>> When I try to start accumulo I get a INFO message saying attempting to
>>>> talk to zookeeper. I verified zookeeper is running and I can access it
>>>> using the zkCli.sh. The next line to display is INFO :Waiting for accumulo
>>>> to be initialized. That line repeats infinitely.
>>>> I looked at the logs and get a message in the tserver_localhost.out
>>>> saying unable obtain instance id at /accumulo/instance_id. A quick web
>>>> search found a message (
>>>> saying I needed to put the HADOOP/conf directory in my CLASSPATH. I tried
>>>> that, but that did not work.
>>>> I have looked and didn't find any other groupsw where I could post a