|
wd
2010-01-05, 08:51
Zheng Shao
2010-01-05, 09:10
wd
2010-01-06, 01:33
wd
2010-01-08, 06:23
Anty
2010-01-11, 16:31
Zheng Shao
2010-01-11, 19:05
Ning Zhang
2010-01-11, 19:45
Anty
2010-01-12, 01:54
Zheng Shao
2010-01-12, 06:28
Anty
2010-01-12, 11:27
Zheng Shao
2010-01-13, 09:01
Min Zhou
2010-01-21, 08:39
Namit Jain
2010-01-21, 17:56
Ning Zhang
2010-01-21, 18:17
wd
2010-01-26, 10:07
|
-
hive multiple insertswd 2010-01-05, 08:51
In hive wiki <http://wiki.apache.org/hadoop/Hive/LanguageManual/DML>:
Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... I'm try to use hive multi inserts to extract data from hive to local disk. Follows is the hql from test_tbl INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where id%10=0 INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where id%10=1 INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where id%10=2 This hql can execute, but only /tmp/out/0 have datafile in it, other directories are empty. why this happen? bug?
-
Re: hive multiple insertsZheng Shao 2010-01-05, 09:10
Looks like a bug.
What is the svn revision of Hive? Did you verify that single insert into '/tmp/out/1' produces non-empty files? Zheng On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: > In hive wiki: > > Hive extension (multiple inserts): > FROM from_statement > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 > > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... > > I'm try to use hive multi inserts to extract data from hive to local disk. > Follows is the hql > > from test_tbl > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where id%10=0 > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where id%10=1 > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where id%10=2 > > This hql can execute, but only /tmp/out/0 have datafile in it, other > directories are empty. why this happen? bug? > > > > > -- Yours, Zheng
-
Re: hive multiple insertswd 2010-01-06, 01:33
hi,
Single insert can extract data into '/tmp/out/1'.I even can see "xxx rows loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi inserts, but there is no data in fact. Havn't try svn revision, will try it today.thx. 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> > Looks like a bug. > What is the svn revision of Hive? > > Did you verify that single insert into '/tmp/out/1' produces non-empty > files? > > Zheng > > On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: > > In hive wiki: > > > > Hive extension (multiple inserts): > > FROM from_statement > > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 > > > > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... > > > > I'm try to use hive multi inserts to extract data from hive to local > disk. > > Follows is the hql > > > > from test_tbl > > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where > id%10=0 > > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where > id%10=1 > > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where > id%10=2 > > > > This hql can execute, but only /tmp/out/0 have datafile in it, other > > directories are empty. why this happen? bug? > > > > > > > > > > > > > > -- > Yours, > Zheng >
-
Re: hive multiple insertswd 2010-01-08, 06:23
hi,
I'v tried use hive svn version, seems this bug still exists. svn st -v 896805 896744 namit . 896805 894292 namit eclipse-templates 896805 894292 namit eclipse-templates/.classpath 896805 765509 zshao eclipse-templates/TestHive.launchtemplate 896805 765509 zshao eclipse-templates/TestMTQueries.l .......... svn reversion 896805 ? follows is the execute log. hive> from test > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * where a = 1 > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * where a = 3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201001071716_4691, Tracking URL http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill job_201001071716_4691 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% Ended Job = job_201001071716_4691 Copying data to local directory /home/stefdong/tmp/0 Copying data to local directory /home/stefdong/tmp/0 13 Rows loaded to /home/stefdong/tmp/0 9 Rows loaded to /home/stefdong/tmp/1 OK Time taken: 9.409 seconds thx. 2010/1/6 wd <[EMAIL PROTECTED]> > hi, > > Single insert can extract data into '/tmp/out/1'.I even can see "xxx rows > loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi > inserts, but there is no data in fact. > > Havn't try svn revision, will try it today.thx. > > 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> > > Looks like a bug. >> What is the svn revision of Hive? >> >> Did you verify that single insert into '/tmp/out/1' produces non-empty >> files? >> >> Zheng >> >> On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: >> > In hive wiki: >> > >> > Hive extension (multiple inserts): >> > FROM from_statement >> > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 >> > >> > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... >> > >> > I'm try to use hive multi inserts to extract data from hive to local >> disk. >> > Follows is the hql >> > >> > from test_tbl >> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where >> id%10=0 >> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where >> id%10=1 >> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where >> id%10=2 >> > >> > This hql can execute, but only /tmp/out/0 have datafile in it, other >> > directories are empty. why this happen? bug? >> > >> > >> > >> > >> > >> >> >> >> -- >> Yours, >> Zheng >> > >
-
Re: hive multiple insertsAnty 2010-01-11, 16:31
HI:
I came across the same problean, therein is no data.I have one more question,can i specify the field delimiter for the output file,not just the default ctrl-a field delimiter? On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: > hi, > > I'v tried use hive svn version, seems this bug still exists. > > svn st -v > > 896805 896744 namit . > 896805 894292 namit eclipse-templates > 896805 894292 namit eclipse-templates/.classpath > 896805 765509 zshao > eclipse-templates/TestHive.launchtemplate > 896805 765509 zshao eclipse-templates/TestMTQueries.l > .......... > > svn reversion 896805 ? > > follows is the execute log. > > hive> from > test > > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * where > a = 1 > > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * where > a = 3; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201001071716_4691, Tracking URL > http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 > Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill > job_201001071716_4691 > 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% > 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% > Ended Job = job_201001071716_4691 > Copying data to local directory /home/stefdong/tmp/0 > Copying data to local directory /home/stefdong/tmp/0 > 13 Rows loaded to /home/stefdong/tmp/0 > 9 Rows loaded to /home/stefdong/tmp/1 > OK > Time taken: 9.409 seconds > > > thx. > > 2010/1/6 wd <[EMAIL PROTECTED]> >> >> hi, >> >> Single insert can extract data into '/tmp/out/1'.I even can see "xxx rows >> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi >> inserts, but there is no data in fact. >> >> Havn't try svn revision, will try it today.thx. >> >> 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> >>> >>> Looks like a bug. >>> What is the svn revision of Hive? >>> >>> Did you verify that single insert into '/tmp/out/1' produces non-empty >>> files? >>> >>> Zheng >>> >>> On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: >>> > In hive wiki: >>> > >>> > Hive extension (multiple inserts): >>> > FROM from_statement >>> > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 >>> > >>> > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... >>> > >>> > I'm try to use hive multi inserts to extract data from hive to local >>> > disk. >>> > Follows is the hql >>> > >>> > from test_tbl >>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where >>> > id%10=0 >>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where >>> > id%10=1 >>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where >>> > id%10=2 >>> > >>> > This hql can execute, but only /tmp/out/0 have datafile in it, other >>> > directories are empty. why this happen? bug? >>> > >>> > >>> > >>> > >>> > >>> >>> >>> >>> -- >>> Yours, >>> Zheng >> > > -- Best Regards Anty Rao
-
Re: hive multiple insertsZheng Shao 2010-01-11, 19:05
For your second question, currently we can do it with a little extra work:
1. Create an external table on the target directory with the field delimiter you want; 2. Run the query and insert overwrite the target external table. For the first question we can also do the similar thing (create a bunch of external table and then insert), but I think we should fix the problem. Zheng On Mon, Jan 11, 2010 at 8:31 AM, Anty <[EMAIL PROTECTED]> wrote: > HI: > I came across the same problean, therein is no data.I have one > more question,can i specify the field delimiter for the output > file,not just the default ctrl-a field delimiter? > > On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: >> hi, >> >> I'v tried use hive svn version, seems this bug still exists. >> >> svn st -v >> >> 896805 896744 namit . >> 896805 894292 namit eclipse-templates >> 896805 894292 namit eclipse-templates/.classpath >> 896805 765509 zshao >> eclipse-templates/TestHive.launchtemplate >> 896805 765509 zshao eclipse-templates/TestMTQueries.l >> .......... >> >> svn reversion 896805 ? >> >> follows is the execute log. >> >> hive> from >> test >> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * where >> a = 1 >> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * where >> a = 3; >> Total MapReduce jobs = 1 >> Launching Job 1 out of 1 >> Number of reduce tasks is set to 0 since there's no reduce operator >> Starting Job = job_201001071716_4691, Tracking URL >> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 >> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill >> job_201001071716_4691 >> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% >> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% >> Ended Job = job_201001071716_4691 >> Copying data to local directory /home/stefdong/tmp/0 >> Copying data to local directory /home/stefdong/tmp/0 >> 13 Rows loaded to /home/stefdong/tmp/0 >> 9 Rows loaded to /home/stefdong/tmp/1 >> OK >> Time taken: 9.409 seconds >> >> >> thx. >> >> 2010/1/6 wd <[EMAIL PROTECTED]> >>> >>> hi, >>> >>> Single insert can extract data into '/tmp/out/1'.I even can see "xxx rows >>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi >>> inserts, but there is no data in fact. >>> >>> Havn't try svn revision, will try it today.thx. >>> >>> 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> >>>> >>>> Looks like a bug. >>>> What is the svn revision of Hive? >>>> >>>> Did you verify that single insert into '/tmp/out/1' produces non-empty >>>> files? >>>> >>>> Zheng >>>> >>>> On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: >>>> > In hive wiki: >>>> > >>>> > Hive extension (multiple inserts): >>>> > FROM from_statement >>>> > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 >>>> > >>>> > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... >>>> > >>>> > I'm try to use hive multi inserts to extract data from hive to local >>>> > disk. >>>> > Follows is the hql >>>> > >>>> > from test_tbl >>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where >>>> > id%10=0 >>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where >>>> > id%10=1 >>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where >>>> > id%10=2 >>>> > >>>> > This hql can execute, but only /tmp/out/0 have datafile in it, other >>>> > directories are empty. why this happen? bug? >>>> > >>>> > >>>> > >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Yours, >>>> Zheng >>> >> >> > > > > -- > Best Regards > Anty Rao > -- Yours, Zheng
-
Re: hive multiple insertsNing Zhang 2010-01-11, 19:45
HIVE-1039 is created for the bug when inserting to multiple local directories.
Thanks, Ning On Jan 11, 2010, at 11:05 AM, Zheng Shao wrote: > For your second question, currently we can do it with a little extra work: > 1. Create an external table on the target directory with the field > delimiter you want; > 2. Run the query and insert overwrite the target external table. > > For the first question we can also do the similar thing (create a > bunch of external table and then insert), but I think we should fix > the problem. > > Zheng > > On Mon, Jan 11, 2010 at 8:31 AM, Anty <[EMAIL PROTECTED]> wrote: >> HI: >> I came across the same problean, therein is no data.I have one >> more question,can i specify the field delimiter for the output >> file,not just the default ctrl-a field delimiter? >> >> On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: >>> hi, >>> >>> I'v tried use hive svn version, seems this bug still exists. >>> >>> svn st -v >>> >>> 896805 896744 namit . >>> 896805 894292 namit eclipse-templates >>> 896805 894292 namit eclipse-templates/.classpath >>> 896805 765509 zshao >>> eclipse-templates/TestHive.launchtemplate >>> 896805 765509 zshao eclipse-templates/TestMTQueries.l >>> .......... >>> >>> svn reversion 896805 ? >>> >>> follows is the execute log. >>> >>> hive> from >>> test >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * where >>> a = 1 >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * where >>> a = 3; >>> Total MapReduce jobs = 1 >>> Launching Job 1 out of 1 >>> Number of reduce tasks is set to 0 since there's no reduce operator >>> Starting Job = job_201001071716_4691, Tracking URL >>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 >>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill >>> job_201001071716_4691 >>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% >>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% >>> Ended Job = job_201001071716_4691 >>> Copying data to local directory /home/stefdong/tmp/0 >>> Copying data to local directory /home/stefdong/tmp/0 >>> 13 Rows loaded to /home/stefdong/tmp/0 >>> 9 Rows loaded to /home/stefdong/tmp/1 >>> OK >>> Time taken: 9.409 seconds >>> >>> >>> thx. >>> >>> 2010/1/6 wd <[EMAIL PROTECTED]> >>>> >>>> hi, >>>> >>>> Single insert can extract data into '/tmp/out/1'.I even can see "xxx rows >>>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi >>>> inserts, but there is no data in fact. >>>> >>>> Havn't try svn revision, will try it today.thx. >>>> >>>> 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> >>>>> >>>>> Looks like a bug. >>>>> What is the svn revision of Hive? >>>>> >>>>> Did you verify that single insert into '/tmp/out/1' produces non-empty >>>>> files? >>>>> >>>>> Zheng >>>>> >>>>> On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: >>>>>> In hive wiki: >>>>>> >>>>>> Hive extension (multiple inserts): >>>>>> FROM from_statement >>>>>> INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 >>>>>> >>>>>> [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... >>>>>> >>>>>> I'm try to use hive multi inserts to extract data from hive to local >>>>>> disk. >>>>>> Follows is the hql >>>>>> >>>>>> from test_tbl >>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where >>>>>> id%10=0 >>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where >>>>>> id%10=1 >>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where >>>>>> id%10=2 >>>>>> >>>>>> This hql can execute, but only /tmp/out/0 have datafile in it, other >>>>>> directories are empty. why this happen? bug? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Yours, >>>>> Zheng >>>> >>> >>> >> >> >> >> -- >> Best Regards >> Anty Rao >> > > > > -- > Yours,
-
Re: hive multiple insertsAnty 2010-01-12, 01:54
Thanks Zheng.
It does works. I have a another question,if the field delimiter is a string ,e.g. "<>",it looks like the LazySimpleSerDe can't works.Does the LazySimpleSerDe didn't support string field delimiter,only one byte of control characters? On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > For your second question, currently we can do it with a little extra work: > 1. Create an external table on the target directory with the field > delimiter you want; > 2. Run the query and insert overwrite the target external table. > > For the first question we can also do the similar thing (create a > bunch of external table and then insert), but I think we should fix > the problem. > > Zheng > > On Mon, Jan 11, 2010 at 8:31 AM, Anty <[EMAIL PROTECTED]> wrote: >> HI: >> I came across the same problean, therein is no data.I have one >> more question,can i specify the field delimiter for the output >> file,not just the default ctrl-a field delimiter? >> >> On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: >>> hi, >>> >>> I'v tried use hive svn version, seems this bug still exists. >>> >>> svn st -v >>> >>> 896805 896744 namit . >>> 896805 894292 namit eclipse-templates >>> 896805 894292 namit eclipse-templates/.classpath >>> 896805 765509 zshao >>> eclipse-templates/TestHive.launchtemplate >>> 896805 765509 zshao eclipse-templates/TestMTQueries.l >>> .......... >>> >>> svn reversion 896805 ? >>> >>> follows is the execute log. >>> >>> hive> from >>> test >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * where >>> a = 1 >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * where >>> a = 3; >>> Total MapReduce jobs = 1 >>> Launching Job 1 out of 1 >>> Number of reduce tasks is set to 0 since there's no reduce operator >>> Starting Job = job_201001071716_4691, Tracking URL >>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 >>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill >>> job_201001071716_4691 >>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% >>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% >>> Ended Job = job_201001071716_4691 >>> Copying data to local directory /home/stefdong/tmp/0 >>> Copying data to local directory /home/stefdong/tmp/0 >>> 13 Rows loaded to /home/stefdong/tmp/0 >>> 9 Rows loaded to /home/stefdong/tmp/1 >>> OK >>> Time taken: 9.409 seconds >>> >>> >>> thx. >>> >>> 2010/1/6 wd <[EMAIL PROTECTED]> >>>> >>>> hi, >>>> >>>> Single insert can extract data into '/tmp/out/1'.I even can see "xxx rows >>>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi >>>> inserts, but there is no data in fact. >>>> >>>> Havn't try svn revision, will try it today.thx. >>>> >>>> 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> >>>>> >>>>> Looks like a bug. >>>>> What is the svn revision of Hive? >>>>> >>>>> Did you verify that single insert into '/tmp/out/1' produces non-empty >>>>> files? >>>>> >>>>> Zheng >>>>> >>>>> On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: >>>>> > In hive wiki: >>>>> > >>>>> > Hive extension (multiple inserts): >>>>> > FROM from_statement >>>>> > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 >>>>> > >>>>> > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... >>>>> > >>>>> > I'm try to use hive multi inserts to extract data from hive to local >>>>> > disk. >>>>> > Follows is the hql >>>>> > >>>>> > from test_tbl >>>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where >>>>> > id%10=0 >>>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where >>>>> > id%10=1 >>>>> > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where >>>>> > id%10=2 >>>>> > >>>>> > This hql can execute, but only /tmp/out/0 have datafile in it, other >>>>> > directories are empty. why this happen? bug? >>>>> Best Regards Anty Rao
-
Re: hive multiple insertsZheng Shao 2010-01-12, 06:28
Yes we only support one-byte delimiter for performance reasons.
You can use the RegexSerDe in the contrib package for any row format that allows a regular expression (including your case "<>"), but the speed will be slower. Zheng On Mon, Jan 11, 2010 at 5:54 PM, Anty <[EMAIL PROTECTED]> wrote: > Thanks Zheng. > It does works. > I have a another question,if the field delimiter is a string ,e.g. > "<>",it looks like the LazySimpleSerDe can't works.Does the > LazySimpleSerDe didn't support string field delimiter,only one byte of > control characters? > > On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > > For your second question, currently we can do it with a little extra > work: > > 1. Create an external table on the target directory with the field > > delimiter you want; > > 2. Run the query and insert overwrite the target external table. > > > > For the first question we can also do the similar thing (create a > > bunch of external table and then insert), but I think we should fix > > the problem. > > > > Zheng > > > > On Mon, Jan 11, 2010 at 8:31 AM, Anty <[EMAIL PROTECTED]> wrote: > >> HI: > >> I came across the same problean, therein is no data.I have one > >> more question,can i specify the field delimiter for the output > >> file,not just the default ctrl-a field delimiter? > >> > >> On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: > >>> hi, > >>> > >>> I'v tried use hive svn version, seems this bug still exists. > >>> > >>> svn st -v > >>> > >>> 896805 896744 namit . > >>> 896805 894292 namit eclipse-templates > >>> 896805 894292 namit eclipse-templates/.classpath > >>> 896805 765509 zshao > >>> eclipse-templates/TestHive.launchtemplate > >>> 896805 765509 zshao > eclipse-templates/TestMTQueries.l > >>> .......... > >>> > >>> svn reversion 896805 ? > >>> > >>> follows is the execute log. > >>> > >>> hive> from > >>> test > >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * > where > >>> a = 1 > >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * > where > >>> a = 3; > >>> Total MapReduce jobs = 1 > >>> Launching Job 1 out of 1 > >>> Number of reduce tasks is set to 0 since there's no reduce operator > >>> Starting Job = job_201001071716_4691, Tracking URL > >>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 > >>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill > >>> job_201001071716_4691 > >>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% > >>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% > >>> Ended Job = job_201001071716_4691 > >>> Copying data to local directory /home/stefdong/tmp/0 > >>> Copying data to local directory /home/stefdong/tmp/0 > >>> 13 Rows loaded to /home/stefdong/tmp/0 > >>> 9 Rows loaded to /home/stefdong/tmp/1 > >>> OK > >>> Time taken: 9.409 seconds > >>> > >>> > >>> thx. > >>> > >>> 2010/1/6 wd <[EMAIL PROTECTED]> > >>>> > >>>> hi, > >>>> > >>>> Single insert can extract data into '/tmp/out/1'.I even can see "xxx > rows > >>>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi > >>>> inserts, but there is no data in fact. > >>>> > >>>> Havn't try svn revision, will try it today.thx. > >>>> > >>>> 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> > >>>>> > >>>>> Looks like a bug. > >>>>> What is the svn revision of Hive? > >>>>> > >>>>> Did you verify that single insert into '/tmp/out/1' produces > non-empty > >>>>> files? > >>>>> > >>>>> Zheng > >>>>> > >>>>> On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: > >>>>> > In hive wiki: > >>>>> > > >>>>> > Hive extension (multiple inserts): > >>>>> > FROM from_statement > >>>>> > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 > >>>>> > > >>>>> > [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] > ... > >>>>> > > >>>>> > I'm try to use hive multi inserts to extract data from hive to Yours, Zheng
-
Re: hive multiple insertsAnty 2010-01-12, 11:27
Thanks Zheng.
We have used RegexSerDe in some use cases, but the speed is indeed slower, so we don't want to use regular expression if not necessary. yes, we have used RegexSerDe in some use cases. I found HIVE-634 <https://issues.apache.org/jira/browse/HIVE-634> is what i need ,allowing for the user to specify field delimiter with any format. INSERT OVERWRITE LOCAL DIRECTORY '/mnt/daily_timelines' [ ROW FORMAT DELIMITED | SERDE ... ] [ FILE FORMAT ...] SELECT * FROM daily_timelines; Is somebody still working on this feature? On Tue, Jan 12, 2010 at 2:28 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Yes we only support one-byte delimiter for performance reasons. > > You can use the RegexSerDe in the contrib package for any row format that > allows a regular expression (including your case "<>"), but the speed will > be slower. > > Zheng > > On Mon, Jan 11, 2010 at 5:54 PM, Anty <[EMAIL PROTECTED]> wrote: >> >> Thanks Zheng. >> It does works. >> I have a another question,if the field delimiter is a string ,e.g. >> "<>",it looks like the LazySimpleSerDe can't works.Does the >> LazySimpleSerDe didn't support string field delimiter,only one byte of >> control characters? >> >> On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> > For your second question, currently we can do it with a little extra >> > work: >> > 1. Create an external table on the target directory with the field >> > delimiter you want; >> > 2. Run the query and insert overwrite the target external table. >> > >> > For the first question we can also do the similar thing (create a >> > bunch of external table and then insert), but I think we should fix >> > the problem. >> > >> > Zheng >> > >> > On Mon, Jan 11, 2010 at 8:31 AM, Anty <[EMAIL PROTECTED]> wrote: >> >> HI: >> >> I came across the same problean, therein is no data.I have one >> >> more question,can i specify the field delimiter for the output >> >> file,not just the default ctrl-a field delimiter? >> >> >> >> On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: >> >>> hi, >> >>> >> >>> I'v tried use hive svn version, seems this bug still exists. >> >>> >> >>> svn st -v >> >>> >> >>> 896805 896744 namit . >> >>> 896805 894292 namit eclipse-templates >> >>> 896805 894292 namit eclipse-templates/.classpath >> >>> 896805 765509 zshao >> >>> eclipse-templates/TestHive.launchtemplate >> >>> 896805 765509 zshao >> >>> eclipse-templates/TestMTQueries.l >> >>> .......... >> >>> >> >>> svn reversion 896805 ? >> >>> >> >>> follows is the execute log. >> >>> >> >>> hive> from >> >>> test >> >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * >> >>> where >> >>> a = 1 >> >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * >> >>> where >> >>> a = 3; >> >>> Total MapReduce jobs = 1 >> >>> Launching Job 1 out of 1 >> >>> Number of reduce tasks is set to 0 since there's no reduce operator >> >>> Starting Job = job_201001071716_4691, Tracking URL >> >>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 >> >>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill >> >>> job_201001071716_4691 >> >>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% >> >>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% >> >>> Ended Job = job_201001071716_4691 >> >>> Copying data to local directory /home/stefdong/tmp/0 >> >>> Copying data to local directory /home/stefdong/tmp/0 >> >>> 13 Rows loaded to /home/stefdong/tmp/0 >> >>> 9 Rows loaded to /home/stefdong/tmp/1 >> >>> OK >> >>> Time taken: 9.409 seconds >> >>> >> >>> >> >>> thx. >> >>> >> >>> 2010/1/6 wd <[EMAIL PROTECTED]> >> >>>> >> >>>> hi, >> >>>> >> >>>> Single insert can extract data into '/tmp/out/1'.I even can see "xxx >> >>>> rows >> >>>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in >> >>>> multi >> >>>> inserts, but there is no data in fact. >> >>> Best Regards Anty Rao
-
Re: hive multiple insertsZheng Shao 2010-01-13, 09:01
https://issues.apache.org/jira/browse/HIVE-634
As far as I know there is nobody working on that right now. If you are interested, we can work together on that. Let's move the discussion to the JIRA. Zheng On Tue, Jan 12, 2010 at 3:27 AM, Anty <[EMAIL PROTECTED]> wrote: > Thanks Zheng. > We have used RegexSerDe in some use cases, but the speed is indeed slower, > so we don't want to use regular expression if not necessary. > > > yes, we have used RegexSerDe in some use cases. > I found HIVE-634 <https://issues.apache.org/jira/browse/HIVE-634> is what > i need ,allowing for the user to specify field delimiter with any format. > > INSERT OVERWRITE LOCAL DIRECTORY '/mnt/daily_timelines' > [ ROW FORMAT DELIMITED | SERDE ... ] > [ FILE FORMAT ...] > SELECT * FROM daily_timelines; > > Is somebody still working on this feature? > > > On Tue, Jan 12, 2010 at 2:28 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > > Yes we only support one-byte delimiter for performance reasons. > > > > You can use the RegexSerDe in the contrib package for any row format that > > allows a regular expression (including your case "<>"), but the speed > will > > be slower. > > > > Zheng > > > > On Mon, Jan 11, 2010 at 5:54 PM, Anty <[EMAIL PROTECTED]> wrote: > >> > >> Thanks Zheng. > >> It does works. > >> I have a another question,if the field delimiter is a string ,e.g. > >> "<>",it looks like the LazySimpleSerDe can't works.Does the > >> LazySimpleSerDe didn't support string field delimiter,only one byte of > >> control characters? > >> > >> On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> > For your second question, currently we can do it with a little extra > >> > work: > >> > 1. Create an external table on the target directory with the field > >> > delimiter you want; > >> > 2. Run the query and insert overwrite the target external table. > >> > > >> > For the first question we can also do the similar thing (create a > >> > bunch of external table and then insert), but I think we should fix > >> > the problem. > >> > > >> > Zheng > >> > > >> > On Mon, Jan 11, 2010 at 8:31 AM, Anty <[EMAIL PROTECTED]> wrote: > >> >> HI: > >> >> I came across the same problean, therein is no data.I have one > >> >> more question,can i specify the field delimiter for the output > >> >> file,not just the default ctrl-a field delimiter? > >> >> > >> >> On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: > >> >>> hi, > >> >>> > >> >>> I'v tried use hive svn version, seems this bug still exists. > >> >>> > >> >>> svn st -v > >> >>> > >> >>> 896805 896744 namit . > >> >>> 896805 894292 namit eclipse-templates > >> >>> 896805 894292 namit eclipse-templates/.classpath > >> >>> 896805 765509 zshao > >> >>> eclipse-templates/TestHive.launchtemplate > >> >>> 896805 765509 zshao > >> >>> eclipse-templates/TestMTQueries.l > >> >>> .......... > >> >>> > >> >>> svn reversion 896805 ? > >> >>> > >> >>> follows is the execute log. > >> >>> > >> >>> hive> from > >> >>> test > >> >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select > * > >> >>> where > >> >>> a = 1 > >> >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select > * > >> >>> where > >> >>> a = 3; > >> >>> Total MapReduce jobs = 1 > >> >>> Launching Job 1 out of 1 > >> >>> Number of reduce tasks is set to 0 since there's no reduce operator > >> >>> Starting Job = job_201001071716_4691, Tracking URL > >> >>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 > >> >>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill > >> >>> job_201001071716_4691 > >> >>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% > >> >>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% > >> >>> Ended Job = job_201001071716_4691 > >> >>> Copying data to local directory /home/stefdong/tmp/0 > >> >>> Copying data to local directory /home/stefdong/tmp/0 Yours, Zheng
-
Re: hive multiple insertsMin Zhou 2010-01-21, 08:39
It should be a bug of hive. see below
hive> set hive.merge.mapfiles=true; hive> explain from netflix insert overwrite table t1 select movie_id insert overwrite table t2 select user_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) STAGE DEPENDENCIES: Stage-2 is a root stage Stage-5 depends on stages: Stage-2 Stage-0 depends on stages: Stage-5 Stage-8 depends on stages: Stage-2 Stage-1 depends on stages: Stage-8 STAGE PLANS: Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: netflix TableScan alias: netflix Select Operator expressions: expr: movie_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Select Operator expressions: expr: user_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 2 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 Stage: Stage-5 Conditional Operator list of dependent Tasks: Move Operator files: hdfs directory: true destination: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10000 Map Reduce Alias -> Map Operator Tree: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10004 Reduce Output Operator sort order: Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: foo type: string Reduce Operator Tree: Extract File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Stage: Stage-8 Conditional Operator list of dependent Tasks: Move Operator files: hdfs directory: true destination: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10002 Map Reduce Alias -> Map Operator Tree: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10005 Reduce Output Operator sort order: Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: bar type: string Reduce Operator Tree: Extract File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 Stage: Stage-1 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 hive> set hive.merge.mapfiles=false; hive> explain from netflix insert overwrite table t1 select movie_id insert overwrite table t2 select user_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 STAGE PLANS: Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: netflix TableScan alias: netflix Select Operator expressions: expr: movie_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 1 table: input format: org.apache.hadoop.map
-
RE: hive multiple insertsNamit Jain 2010-01-21, 17:56
Which version are you using ?
The bug mentioned was fixed by: https://issues.apache.org/jira/browse/HIVE-1039 Thanks, -namit -----Original Message----- From: Min Zhou [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 21, 2010 12:40 AM To: [EMAIL PROTECTED] Subject: Re: hive multiple inserts It should be a bug of hive. see below hive> set hive.merge.mapfiles=true; hive> explain from netflix insert overwrite table t1 select movie_id insert overwrite table t2 select user_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) STAGE DEPENDENCIES: Stage-2 is a root stage Stage-5 depends on stages: Stage-2 Stage-0 depends on stages: Stage-5 Stage-8 depends on stages: Stage-2 Stage-1 depends on stages: Stage-8 STAGE PLANS: Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: netflix TableScan alias: netflix Select Operator expressions: expr: movie_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Select Operator expressions: expr: user_id type: string outputColumnNames: _col0 File Output Operator compressed: true GlobalTableId: 2 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 Stage: Stage-5 Conditional Operator list of dependent Tasks: Move Operator files: hdfs directory: true destination: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10000 Map Reduce Alias -> Map Operator Tree: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10004 Reduce Output Operator sort order: Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: foo type: string Reduce Operator Tree: Extract File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t1 Stage: Stage-8 Conditional Operator list of dependent Tasks: Move Operator files: hdfs directory: true destination: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10002 Map Reduce Alias -> Map Operator Tree: hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10005 Reduce Output Operator sort order: Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: bar type: string Reduce Operator Tree: Extract File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 Stage: Stage-1 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: t2 hive> set hive.merge.mapfiles=false; hive> explain from netflix insert overwrite table t1 select movie_id insert overwrite table t2 select user_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 STAGE PLANS: Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: netflix TableScan alias: netflix Select Operator
-
Re: hive multiple insertsNing Zhang 2010-01-21, 18:17
BTW, JIRA HIVE-1047 subsumes HIVE-1039 in trunk. So if you are using branch 0.5.0, HIVE-1039 is already there. If you are using 0.4 or previous releases, you can either apply HIVE-1039 or HIVE-1047. Both of them are very simple changes.
On Jan 21, 2010, at 9:56 AM, Namit Jain wrote: > Which version are you using ? > > The bug mentioned was fixed by: > > https://issues.apache.org/jira/browse/HIVE-1039 > > > Thanks, > -namit > > -----Original Message----- > From: Min Zhou [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 21, 2010 12:40 AM > To: [EMAIL PROTECTED] > Subject: Re: hive multiple inserts > > It should be a bug of hive. see below > > hive> set hive.merge.mapfiles=true; > hive> explain from netflix insert overwrite table t1 select movie_id > insert overwrite table t2 select user_id; > OK > ABSTRACT SYNTAX TREE: > (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT > (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR > (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB > t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) > > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-5 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-5 > Stage-8 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-8 > > STAGE PLANS: > Stage: Stage-2 > Map Reduce > Alias -> Map Operator Tree: > netflix > TableScan > alias: netflix > Select Operator > expressions: > expr: movie_id > type: string > outputColumnNames: _col0 > File Output Operator > compressed: true > GlobalTableId: 1 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t1 > Select Operator > expressions: > expr: user_id > type: string > outputColumnNames: _col0 > File Output Operator > compressed: true > GlobalTableId: 2 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t2 > > Stage: Stage-5 > Conditional Operator > list of dependent Tasks: > Move Operator > files: > hdfs directory: true > destination: > hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10000 > Map Reduce > Alias -> Map Operator Tree: > hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10004 > Reduce Output Operator > sort order: > Map-reduce partition columns: > expr: rand() > type: double > tag: -1 > value expressions: > expr: foo > type: string > Reduce Operator Tree: > Extract > File Output Operator > compressed: true > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t1 > > Stage: Stage-0 > Move Operator > tables: > replace: true > table: > input format: org.apache.hadoop.mapred.TextInputFormat
-
Re: hive multiple insertswd 2010-01-26, 10:07
hi,
I've tested branch 0.5 rev 903141. The sql runs as expected. Thx. I can't compile trunk version, so haven't test it. 2010/1/12 Ning Zhang <[EMAIL PROTECTED]> > HIVE-1039 is created for the bug when inserting to multiple local > directories. > > Thanks, > Ning > On Jan 11, 2010, at 11:05 AM, Zheng Shao wrote: > > > For your second question, currently we can do it with a little extra > work: > > 1. Create an external table on the target directory with the field > > delimiter you want; > > 2. Run the query and insert overwrite the target external table. > > > > For the first question we can also do the similar thing (create a > > bunch of external table and then insert), but I think we should fix > > the problem. > > > > Zheng > > > > On Mon, Jan 11, 2010 at 8:31 AM, Anty <[EMAIL PROTECTED]> wrote: > >> HI: > >> I came across the same problean, therein is no data.I have one > >> more question,can i specify the field delimiter for the output > >> file,not just the default ctrl-a field delimiter? > >> > >> On Fri, Jan 8, 2010 at 2:23 PM, wd <[EMAIL PROTECTED]> wrote: > >>> hi, > >>> > >>> I'v tried use hive svn version, seems this bug still exists. > >>> > >>> svn st -v > >>> > >>> 896805 896744 namit . > >>> 896805 894292 namit eclipse-templates > >>> 896805 894292 namit eclipse-templates/.classpath > >>> 896805 765509 zshao > >>> eclipse-templates/TestHive.launchtemplate > >>> 896805 765509 zshao > eclipse-templates/TestMTQueries.l > >>> .......... > >>> > >>> svn reversion 896805 ? > >>> > >>> follows is the execute log. > >>> > >>> hive> from > >>> test > >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select * > where > >>> a = 1 > >>> > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select * > where > >>> a = 3; > >>> Total MapReduce jobs = 1 > >>> Launching Job 1 out of 1 > >>> Number of reduce tasks is set to 0 since there's no reduce operator > >>> Starting Job = job_201001071716_4691, Tracking URL > >>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 > >>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill > >>> job_201001071716_4691 > >>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% > >>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% > >>> Ended Job = job_201001071716_4691 > >>> Copying data to local directory /home/stefdong/tmp/0 > >>> Copying data to local directory /home/stefdong/tmp/0 > >>> 13 Rows loaded to /home/stefdong/tmp/0 > >>> 9 Rows loaded to /home/stefdong/tmp/1 > >>> OK > >>> Time taken: 9.409 seconds > >>> > >>> > >>> thx. > >>> > >>> 2010/1/6 wd <[EMAIL PROTECTED]> > >>>> > >>>> hi, > >>>> > >>>> Single insert can extract data into '/tmp/out/1'.I even can see "xxx > rows > >>>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi > >>>> inserts, but there is no data in fact. > >>>> > >>>> Havn't try svn revision, will try it today.thx. > >>>> > >>>> 2010/1/5 Zheng Shao <[EMAIL PROTECTED]> > >>>>> > >>>>> Looks like a bug. > >>>>> What is the svn revision of Hive? > >>>>> > >>>>> Did you verify that single insert into '/tmp/out/1' produces > non-empty > >>>>> files? > >>>>> > >>>>> Zheng > >>>>> > >>>>> On Tue, Jan 5, 2010 at 12:51 AM, wd <[EMAIL PROTECTED]> wrote: > >>>>>> In hive wiki: > >>>>>> > >>>>>> Hive extension (multiple inserts): > >>>>>> FROM from_statement > >>>>>> INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 > >>>>>> > >>>>>> [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] > ... > >>>>>> > >>>>>> I'm try to use hive multi inserts to extract data from hive to local > >>>>>> disk. > >>>>>> Follows is the hql > >>>>>> > >>>>>> from test_tbl > >>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * where > >>>>>> id%10=0 > >>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * where > >>>>>> id%10=1 > >>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * where |