Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - some guidance needed


Copy link to this message
-
Re: some guidance needed
Mark Kerzner 2011-05-18, 23:12
Ioan,

I second what Todd said, even with FuseHDFS, mounting HDFS as a regular file
system, it won't give you the immediate response about the file status that
you need. I believe Google implemented Gmail with HBase. Here is an example
of implementing a mail store with Cassandra:
http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf

<http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf>Mark

On Wed, May 18, 2011 at 5:05 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> Hi Ioan,
>
> I would encourage you to look at a system like HBase for your mail
> backend. HDFS doesn't work well with lots of little files, and also
> doesn't support random update, so existing formats like Maildir
> wouldn't be a good fit.
>
> -Todd
>
> On Wed, May 18, 2011 at 4:02 PM, Ioan Eugen Stan <[EMAIL PROTECTED]>
> wrote:
> > Hello everybody,
> >
> > I'm a GSoC student for this year and I will be working on James [1].
> > My project is to implement email storage over HDFS. I am quite new to
> > Hadoop and associates and I am looking for some hints as to get
> > started on the right track.
> >
> > I have installed a single node Hadoop instance on my machine and
> > played around with it (ran some examples) but I am interested into
> > what you (more experienced people) think it's the best way to approach
> > my problem.
> >
> > I am a little puzzled about the fact that that I read hadoop is best
> > used for large files and email aren't that large from what I know.
> > Another thing that crossed my mind is that since HDFS is a file
> > system, wouldn't it be possible to set it as a back-end for the
> > (existing) maildir and mailbox storage formats? (I think this question
> > is more suited on the James mailing list, but if you have some ideas
> > please speak your mind).
> >
> > Also, any development resources to get me started are welcomed.
> >
> >
> > [1] http://james.apache.org/mailbox/
> > [2] https://issues.apache.org/jira/browse/MAILBOX-44
> >
> > Regards,
> > --
> > Ioan Eugen Stan
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>