Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Storage file format


Copy link to this message
-
Re: Storage file format
there should be two seperate topics here:
1) storage file format
2) DFS

because we should support map/reduce output data to Drill,(maybe this is
the only way for Drill to load data)

for the second topic, I mentioned in this thread, I prefer Mapr DFS, which
is really HA.

as for the first topic, we should try to find mature open source project
and do some modification to fit for us.

On Sun, Sep 16, 2012 at 5:11 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> There is no project-wide roadmap in a real open source project.
>
> There are vision documents that various people use to try to motivate
> consensus.
>
> There are also individual roadmaps that describe what the individual
> contributors plan to do.
>
> Power Drill style in memory data is definitely intriguing and once Drill
> works and works fast on simpler structures, I would expect that somebody
> would be interested in implementing it.
>
> Perhaps that would be you?
>
> On Sat, Sep 15, 2012 at 10:16 AM, Tsuyoshi OZAWA
> <[EMAIL PROTECTED]>wrote:
>
> > Hello,
> >
> > Is there a roadmap to suppor in-memory index and storage like
> > PowerDrill? It's one kind of storage, though its format is different
> > from the columnar storage format in Dremel paper as you mentioned.
> >
> > IMO, the in-memory index and storage are much useful for analysis with
> > small cluster.
> >
> > Thanks,
> > - Tsuyoshi
> >
> > On Sun, Sep 16, 2012 at 2:02 AM, Dharm Raj <[EMAIL PROTECTED]>
> > wrote:
> > > You are right Camuel. While thinking  storage format I was thinking
> about
> > > append. Misplaced update.
> > >
> > > On Sat, Sep 15, 2012 at 9:49 PM, Camuel Gilyadov <[EMAIL PROTECTED]>
> > wrote:
> > >
> > >> Drill doesn't support updates. It is append only data store and append
> > is
> > >> usually expected to be a nice data chunk not a single row
> > >>
> > >> On Sat, Sep 15, 2012 at 8:09 AM, Dharm Raj <[EMAIL PROTECTED]
> > >> >wrote:
> > >>
> > >> > For columnar storage, IMO each column can be managed in a separate
> > file.
> > >> > Dremel also seems to have each column in a separate file. This
> should
> > be
> > >> > easy to manage and update are possible. Please see
> > >> > https://issues.apache.org/jira/browse/AVRO-806
> > >> >
> > >> > Drill architecture slides shows AVRO-806 and trevni in Column
> storage
> > >> box.
> > >> > Are we looking them as candidate for storage format for drill?
> > >> >
> > >> > If we have lot of data with high amount of sparsity and major use
> > case is
> > >> > to read only once data is written - Another way could be to store
> in a
> > >> > column major sparse matrix format. It  looks easy to implement but
> > >> updates
> > >> > may be problematic. just a thought.
> > >> >
> > >> > Regards,
> > >> > Dharm
> > >> >
> > >> > On Sat, Sep 15, 2012 at 7:24 PM, NAVEEN MAANJU <
> > >> > [EMAIL PROTECTED]> wrote:
> > >> >
> > >> > > make sense..
> > >> > >
> > >> > > On Sat, Sep 15, 2012 at 6:44 AM, Ted Dunning <
> [EMAIL PROTECTED]
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > The key goal here is to get something simple working quickly in
> a
> > way
> > >> > > that
> > >> > > > allows additional, more advanced implementations.
> > >> > > >
> > >> > > > On Sat, Sep 15, 2012 at 5:47 AM, moon soo Lee <
> > [EMAIL PROTECTED]>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > for column-storage, how about leverage Hbase or Accumulo?
> > >> > > > >
> > >> > > > > they'll also give a chance to data update (future work?)
> > >> > > > >
> > >> > > > >
> > >> > > > > On Sat, Sep 15, 2012 at 9:30 PM, Azuryy Yu <
> [EMAIL PROTECTED]>
> > >> > wrote:
> > >> > > > >
> > >> > > > > > Hi All,
> > >> > > > > >
> > >> > > > > > I am interested in working on storage format. (sign up?)
> > >> > > > > >
> > >> > > > > > I wrote a HDFS  file format, which is similar to Sequence
> file
> > >> (row
> > >> > > > > > storage, block management, compress), I provide InputFormat
> > and
> > >> > > > > > OutputFormat,
> > >> > >