Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # user - How to load data in Drill


Copy link to this message
-
Re: How to load data in Drill
Tom Seddon 2013-12-04, 15:02
Hi,

Jinfeng, do you want a copy of my parquet file too?  If so, can send later
tonight.

Cheers,

Tom

On 3 December 2013 07:38, Madhu Borkar <[EMAIL PROTECTED]> wrote:

> Hi Jason nd Jinfeng,
> Thank you guys for taking your time to debug the problem. I have sent my
> data to Jinfeng.
> Other than parquet file, can I put my data in hbase (or any other data
> source) and query it thru drill?
> Please, let me know.
>
>
>
> On Mon, Dec 2, 2013 at 10:21 PM, Jinfeng Ni <[EMAIL PROTECTED]> wrote:
>
> > Hi Jason,
> >
> > Thanks for offering your help to look at this issue.
> >
> > I did try to see if the file PageReadStatus.java has been changed
> > recently.  The output of git log for that file shows the latest change is
> > Sep 9 for "DRILL-221 Add license header to all files".  I thought the
> > binary distribution is made after the license header was added.  But you
> > are right, there might be change after the binary distribution.
> >
> > Thanks,
> >
> > Jinfeng
> >
> >
> >
> > On Mon, Dec 2, 2013 at 10:03 PM, Jason Altekruse
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Hi Madhu,
> > >
> > > I would be happy to take a look at this as well. I wrote most of the
> code
> > > we are using to read parquet files, so I should be able to figure out
> why
> > > we are getting an NPE with the files you are reading. I took a look
> back
> > at
> > > the previous thread where this issue was being discussed and noticed
> that
> > > you reported having installed Drill from binaries. Have you tried
> > compiling
> > > Drill with a more recent version of the source from our repository?
> > >
> > > We ended up learning that Apache does not consider binary releases
> > > official, while we will obviously be providing them for users in future
> > > releases, we ended up giving up on the binaries before we reached the
> end
> > > of the Apache approval process. As such, several bugs were fixed (not
> > > necessarily in the parquet reader) between this binary and our final m1
> > > source release. Since the release, there have also been code changes
> made
> > > that may solve the issue you are having, so we can test it against the
> > > latest development code to see if changes still need to be made to
> solve
> > > the problem.
> > >
> > > Jinfeng,
> > > This also could mean that line 92 that you found in the source does not
> > > match what 92 was at the time of building this release, just something
> to
> > > keep in mind if you look at this again.
> > >
> > > Thanks,
> > > Jason Altekruse
> > >
> > >
> > > On Mon, Dec 2, 2013 at 11:38 PM, Jinfeng Ni <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi Madhu,
> > > >
> > > > Yes, the log is helpful; I can see the NPE is raised in storage
> engine
> > > > component ParquetRecordReader,  not in the query execution component.
> > > >
> > > > Unfortunately, I can not reproduce this parquet reader NPE problem
> > using
> > > > either sample data (nation.parquet, region.parquet), or other TPCH
> > > parquet
> > > > files. From the log, I could see the NPE is raised in the following
> > code:
> > > >
> > > >     currentPage = new Page(
> > > >         bytesIn,
> > > >         pageHeader.data_page_header.num_values,
> > > >         pageHeader.uncompressed_page_size,
> > > >
> > > >
> > > >
> > >
> >
> ParquetStorageEngine.parquetMetadataConverter.getEncoding(pageHeader.data_page_header.repetition_level_encoding),
> > > >
> > > >
> > > >
> > >
> >
> ParquetStorageEngine.parquetMetadataConverter.getEncoding(pageHeader.data_page_header.definition_level_encoding),
> > > >
> > > >
> > > >
> > >
> >
> ParquetStorageEngine.parquetMetadataConverter.getEncoding(pageHeader.data_page_header.encoding)
> > > >     );
> > > >
> > > > My guess is either pageHeader, or it's member data_page_header is
> NULL.
> > > But
> > > > without the parquet file to recreate this NPE, I do not have a way to
> > > > verify.
> > > >
> > > > Is it possible you share your parquet file ( after remove any
> sensitive
> > >