Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> problem installing accumulo


Copy link to this message
-
Re: problem installing accumulo
John,

I guess a little of both. In the Enron email set I have a bunch of folders
representing people. Each folder has subfolders that equate to mailboxes
(inbox, sent_mail, etc...). Each mailbox simply contains text files named
1, 2, 3, 4 that equate to an individual email.

Each email is a text file that if easy to parse into specific fields.  I
want to place those emails in accumulo and run some simple MapReduce for
the demo. Similar to what I saw in some *Cloudbase *training last year.
What I didn't remember is how the tables were arranged.

I was just going to make each email, regardless of mailbox as a row in
accumulo and make make the mailbox and owner separate columns (or column
qualifier to be more specific). My issue is the To and CC fields. Each can
be a list. I was thinking of making the column family to and the column
qualifier 1,2,3, ...).  I could also make the column qualifier for the to
family the actual value "[EMAIL PROTECTED]". I wasn't exactly sure of the best
way.

Each email has a Message_ID and so far I think they are unique. If not I
can generate a unique ID.

Again this will be for a simple demo where people may want to search from
some person, to some person and maybe for specific terms in the body of the
email.

Hope this gives a good idea of what I am trying to do. Feel free to ask any
other questions you may have if I wasn';t clear enough. Again I have more
experience working with existing structures. I am trying t use this
experience to learn a little about how to organize the data.

thanks in advance,

Tim

On Thu, Jan 3, 2013 at 12:28 PM, John Vines <[EMAIL PROTECTED]> wrote:

> Are you looking for generic pointers for it or do you have specific
> questions? Feel free to ask away and someone will be able to help.
>
> John
>
>
> On Thu, Jan 3, 2013 at 12:23 PM, Tim Piety <[EMAIL PROTECTED]> wrote:
>
>> John,
>>
>> No I hadn't. Thank you that was it. I to another look at the install doc
>> and didn't see this step in there.  I then looked at the README file on the
>> ACCUMULO website and it is in there.
>> I was able to start accumulo and then start an accumulo shell and execute
>> the tables command and it listed !METADATA. I presume that this means I am
>> up and running.
>>
>> I am going to use the enron dataset for my demo. I do have a few
>> questions regarding how to structure it if you don't mind a few more
>> questions.
>>
>>
>> thanks again.
>>
>> Tim
>>
>>
>> On Thu, Jan 3, 2013 at 12:07 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>
>>> Did you initialize accumulo by running bin/accumulo init?
>>>
>>>
>>> On Thu, Jan 3, 2013 at 12:02 PM, Tim Piety <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I posted a message the the dev list before Xmas and got not response. I
>>>> figuered I'd try this list. If this is not the correct forum can someone
>>>> please let me know what the correct forum is. I am trying to install
>>>> accumulo for a simple demo. I have hadoop installed and running. I verified
>>>> by testing a mapreduce program and I can look at the HDFS system.
>>>>
>>>> When I try to start accumulo I get a INFO message saying attempting to
>>>> talk to zookeeper. I verified zookeeper is running and I can access it
>>>> using the zkCli.sh. The next line to display is INFO :Waiting for accumulo
>>>> to be initialized. That line repeats infinitely.
>>>>
>>>> I looked at the logs and get a message in the tserver_localhost.out
>>>> saying unable obtain instance id at /accumulo/instance_id. A quick web
>>>> search found a message (
>>>> http://affy.blogspot.com/2012/06/accumulo-where-is-my-instance-id.html)
>>>> saying I needed to put the HADOOP/conf directory in  my CLASSPATH. I tried
>>>> that, but that did not work.
>>>>
>>>> I have looked and didn't find any other groupsw where I could post  a
>>>> question.
>>>>
>>>> thanks,
>>>>
>>>> Tim
>>>
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB