Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Creating external table poiting to s3 folder with files not loading data


Copy link to this message
-
Re: Creating external table poiting to s3 folder with files not loading data
Dean Wampler 2012-12-14, 15:22
A couple of clarifying questions and suggestions. First, keep in mind that
Hive doesn't care if you have a typo of some kind in your external location
;) Use DESCRIBE FORMATTED to verify the path is right. For an external
partitioned table, DESCRIBE FORMATTED table
PARTITION(col1=val1,col2=val2,...).

A dumb mistake I've often made is use a variable in a script, e.g., "...
LOCATION '${DATA}/foo/bar/baz';" and forget to define DATA when invoking
the script.

When you said "load a file", did you mean using the LOAD DATA ... INPATH
's3n://...' command? I've read that s3n is not supported for these
statements, but I'm not sure that's actually true.

If everything looks correct, you should be able to do hadoop fs -ls
s3n://... successfully. Actually, since your hive environment could have
different settings for some filesystem properties, it might be a better
check to use dfs -ls ... at the hive CLI prompt.

Otherwise, it's probably the SerDe, as Mark suggested. If possible, I would
attempt to use the data in some temporary external table using a built-in
SerDe, like the default, just to confirm that it's not a file system issue
and it's probably the SerDe.

Hope that helps.
dean

On Tue, Dec 11, 2012 at 8:05 AM, Fernando Andrés Doglio Turissini <
[EMAIL PROTECTED]> wrote:

> Long subject, I know.. let me explain a bit more about the problem:
>
> I'm trying to load a file into a hive table (this is on an EMR instance)
> for that I create an external table, and I set the location to the folder
> on an s3 bucket, where the file resides.
> The problem is that even though the table is created correctly, when I do
> a "select * from table" it returns nothing. I'm not seeing errors on the
> logs either, so I don't know what can be happening....
>
> Also, probably important: I'm using a custom SerDe that I did not
> write...but I do have the code for it.
>
> I'm quite new to hive, so I appreciate any kind of pointers you can throw
> at me.
>
> Thanks!
> Fernando Doglio
>

--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330