|
Saurabh Nanda
2009-12-30, 05:57
Zheng Shao
2009-12-30, 06:09
Saurabh Nanda
2009-12-30, 06:18
Saurabh Nanda
2009-12-30, 06:22
Zheng Shao
2009-12-30, 06:29
Saurabh Nanda
2009-12-30, 06:35
Zheng Shao
2009-12-30, 06:46
Saurabh Nanda
2009-12-30, 06:47
Saurabh Nanda
2009-12-30, 06:50
Saurabh Nanda
2009-12-30, 06:54
Saurabh Nanda
2009-12-30, 07:02
Vijay
2009-12-30, 19:45
Saurabh Nanda
2010-01-04, 12:18
Carl Steinbach
2010-01-04, 16:57
Ashish Thusoo
2010-01-04, 18:37
Ashish Thusoo
2010-01-04, 20:25
|
-
First import into new partition disappearsSaurabh Nanda 2009-12-30, 05:57
Hi,
I'm revisiting Hive after a long hiatus, so I may not be aware of any new developments. I had written a script some time back to import webserver logs for a day into a new partition. The same script now running on the latest version of Hive (r894548 compiled off trunk) seems to be misbehaving. I'm importing about 6 files into each partition. However, after the script ends, only 5 files show up in each partition. Do I need to explicitly issue the ADD PARTITION command before loading data? Isn't the partition implicitly created? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsZheng Shao 2009-12-30, 06:09
Can you list the HDFS directories? Are the files in the corresponding
directories yet? Zheng On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda <[EMAIL PROTECTED]> wrote: > Hi, > > I'm revisiting Hive after a long hiatus, so I may not be aware of any new > developments. I had written a script some time back to import webserver logs > for a day into a new partition. The same script now running on the latest > version of Hive (r894548 compiled off trunk) seems to be misbehaving. > > I'm importing about 6 files into each partition. However, after the script > ends, only 5 files show up in each partition. Do I need to explicitly issue > the ADD PARTITION command before loading data? Isn't the partition > implicitly created? > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: First import into new partition disappearsSaurabh Nanda 2009-12-30, 06:18
I'm taking a look at the HDFS directories through the web interface and I
can see only 5 files there, not 6. I tried creating the partition using the ADD PARTITION command. After that all 6 files get imported successfully. Saurabh. On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Can you list the HDFS directories? Are the files in the corresponding > directories yet? > > > Zheng > > On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda <[EMAIL PROTECTED]> > wrote: > > Hi, > > > > I'm revisiting Hive after a long hiatus, so I may not be aware of any new > > developments. I had written a script some time back to import webserver > logs > > for a day into a new partition. The same script now running on the latest > > version of Hive (r894548 compiled off trunk) seems to be misbehaving. > > > > I'm importing about 6 files into each partition. However, after the > script > > ends, only 5 files show up in each partition. Do I need to explicitly > issue > > the ADD PARTITION command before loading data? Isn't the partition > > implicitly created? > > > > Saurabh. > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsSaurabh Nanda 2009-12-30, 06:22
Also has something changed drastically in Hive over the last 2-3 months? A
simply import query seems to be taking forever now! Saurabh. On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > I'm taking a look at the HDFS directories through the web interface and I > can see only 5 files there, not 6. I tried creating the partition using the > ADD PARTITION command. After that all 6 files get imported successfully. > > Saurabh. > > > On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> Can you list the HDFS directories? Are the files in the corresponding >> directories yet? >> >> >> Zheng >> >> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> wrote: >> > Hi, >> > >> > I'm revisiting Hive after a long hiatus, so I may not be aware of any >> new >> > developments. I had written a script some time back to import webserver >> logs >> > for a day into a new partition. The same script now running on the >> latest >> > version of Hive (r894548 compiled off trunk) seems to be misbehaving. >> > >> > I'm importing about 6 files into each partition. However, after the >> script >> > ends, only 5 files show up in each partition. Do I need to explicitly >> issue >> > the ADD PARTITION command before loading data? Isn't the partition >> > implicitly created? >> > >> > Saurabh. >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsZheng Shao 2009-12-30, 06:29
What is the import query? Do you mean "load data"?
Can you give an example? Zheng On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[EMAIL PROTECTED]> wrote: > Also has something changed drastically in Hive over the last 2-3 months? A > simply import query seems to be taking forever now! > > Saurabh. > > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <[EMAIL PROTECTED]> > wrote: >> >> I'm taking a look at the HDFS directories through the web interface and I >> can see only 5 files there, not 6. I tried creating the partition using the >> ADD PARTITION command. After that all 6 files get imported successfully. >> >> Saurabh. >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>> >>> Can you list the HDFS directories? Are the files in the corresponding >>> directories yet? >>> >>> >>> Zheng >>> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda <[EMAIL PROTECTED]> >>> wrote: >>> > Hi, >>> > >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of any >>> > new >>> > developments. I had written a script some time back to import webserver >>> > logs >>> > for a day into a new partition. The same script now running on the >>> > latest >>> > version of Hive (r894548 compiled off trunk) seems to be misbehaving. >>> > >>> > I'm importing about 6 files into each partition. However, after the >>> > script >>> > ends, only 5 files show up in each partition. Do I need to explicitly >>> > issue >>> > the ADD PARTITION command before loading data? Isn't the partition >>> > implicitly created? >>> > >>> > Saurabh. >>> > -- >>> > http://nandz.blogspot.com >>> > http://foodieforlife.blogspot.com >>> > >>> >>> >>> >>> -- >>> Yours, >>> Zheng >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: First import into new partition disappearsSaurabh Nanda 2009-12-30, 06:35
Picking up data from the 'raw' table, filtering the unwanted lines and
inserting into 'raw_compressed' table which is stored as sequencefile: insert overwrite table raw_compressed partition(dt='2009-04-01') select line from raw where dt='2009-04-01' and lower(line) rlike '.*get .*/confirmation.*http.*' and not lower(line) rlike '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; Saurabh. On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > What is the import query? Do you mean "load data"? > Can you give an example? > > Zheng > > On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[EMAIL PROTECTED]> > wrote: > > Also has something changed drastically in Hive over the last 2-3 months? > A > > simply import query seems to be taking forever now! > > > > Saurabh. > > > > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <[EMAIL PROTECTED]> > > wrote: > >> > >> I'm taking a look at the HDFS directories through the web interface and > I > >> can see only 5 files there, not 6. I tried creating the partition using > the > >> ADD PARTITION command. After that all 6 files get imported successfully. > >> > >> Saurabh. > >> > >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >>> > >>> Can you list the HDFS directories? Are the files in the corresponding > >>> directories yet? > >>> > >>> > >>> Zheng > >>> > >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda <[EMAIL PROTECTED] > > > >>> wrote: > >>> > Hi, > >>> > > >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of any > >>> > new > >>> > developments. I had written a script some time back to import > webserver > >>> > logs > >>> > for a day into a new partition. The same script now running on the > >>> > latest > >>> > version of Hive (r894548 compiled off trunk) seems to be misbehaving. > >>> > > >>> > I'm importing about 6 files into each partition. However, after the > >>> > script > >>> > ends, only 5 files show up in each partition. Do I need to explicitly > >>> > issue > >>> > the ADD PARTITION command before loading data? Isn't the partition > >>> > implicitly created? > >>> > > >>> > Saurabh. > >>> > -- > >>> > http://nandz.blogspot.com > >>> > http://foodieforlife.blogspot.com > >>> > > >>> > >>> > >>> > >>> -- > >>> Yours, > >>> Zheng > >> > >> > >> > >> -- > >> http://nandz.blogspot.com > >> http://foodieforlife.blogspot.com > > > > > > > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsZheng Shao 2009-12-30, 06:46
This should be compiled into a single map-only job.
Can you take a look at the progress and the task logs of the job? We are not aware of any changes that might cause this problem. Zheng On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[EMAIL PROTECTED]> wrote: > Picking up data from the 'raw' table, filtering the unwanted lines and > inserting into 'raw_compressed' table which is stored as sequencefile: > > insert overwrite table raw_compressed partition(dt='2009-04-01') select line > from raw where dt='2009-04-01' and lower(line) rlike '.*get > .*/confirmation.*http.*' and not lower(line) rlike > '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; > > Saurabh. > > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >> What is the import query? Do you mean "load data"? >> Can you give an example? >> >> Zheng >> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> wrote: >> > Also has something changed drastically in Hive over the last 2-3 months? >> > A >> > simply import query seems to be taking forever now! >> > >> > Saurabh. >> > >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <[EMAIL PROTECTED]> >> > wrote: >> >> >> >> I'm taking a look at the HDFS directories through the web interface and >> >> I >> >> can see only 5 files there, not 6. I tried creating the partition using >> >> the >> >> ADD PARTITION command. After that all 6 files get imported >> >> successfully. >> >> >> >> Saurabh. >> >> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> >> >>> Can you list the HDFS directories? Are the files in the corresponding >> >>> directories yet? >> >>> >> >>> >> >>> Zheng >> >>> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda >> >>> <[EMAIL PROTECTED]> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of >> >>> > any >> >>> > new >> >>> > developments. I had written a script some time back to import >> >>> > webserver >> >>> > logs >> >>> > for a day into a new partition. The same script now running on the >> >>> > latest >> >>> > version of Hive (r894548 compiled off trunk) seems to be >> >>> > misbehaving. >> >>> > >> >>> > I'm importing about 6 files into each partition. However, after the >> >>> > script >> >>> > ends, only 5 files show up in each partition. Do I need to >> >>> > explicitly >> >>> > issue >> >>> > the ADD PARTITION command before loading data? Isn't the partition >> >>> > implicitly created? >> >>> > >> >>> > Saurabh. >> >>> > -- >> >>> > http://nandz.blogspot.com >> >>> > http://foodieforlife.blogspot.com >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Yours, >> >>> Zheng >> >> >> >> >> >> >> >> -- >> >> http://nandz.blogspot.com >> >> http://foodieforlife.blogspot.com >> > >> > >> > >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: First import into new partition disappearsSaurabh Nanda 2009-12-30, 06:47
I've attached the plan file for the query given below. I had executed the
same query yesterday on an older Hive version. I updated my Hive source code from SVN today, rebuilt Hive, and now the query is crawling! What could be going wrong? Anything else that I can give to help troubleshoot this? Saurabh. On Wed, Dec 30, 2009 at 12:05 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Picking up data from the 'raw' table, filtering the unwanted lines and > inserting into 'raw_compressed' table which is stored as sequencefile: > > insert overwrite table raw_compressed partition(dt='2009-04-01') select > line from raw where dt='2009-04-01' and lower(line) rlike '.*get > .*/confirmation.*http.*' and not lower(line) rlike > '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; > > Saurabh. > > > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> What is the import query? Do you mean "load data"? >> Can you give an example? >> >> Zheng >> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> wrote: >> > Also has something changed drastically in Hive over the last 2-3 months? >> A >> > simply import query seems to be taking forever now! >> > >> > Saurabh. >> > >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <[EMAIL PROTECTED] >> > >> > wrote: >> >> >> >> I'm taking a look at the HDFS directories through the web interface and >> I >> >> can see only 5 files there, not 6. I tried creating the partition using >> the >> >> ADD PARTITION command. After that all 6 files get imported >> successfully. >> >> >> >> Saurabh. >> >> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> >> >>> Can you list the HDFS directories? Are the files in the corresponding >> >>> directories yet? >> >>> >> >>> >> >>> Zheng >> >>> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda < >> [EMAIL PROTECTED]> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of >> any >> >>> > new >> >>> > developments. I had written a script some time back to import >> webserver >> >>> > logs >> >>> > for a day into a new partition. The same script now running on the >> >>> > latest >> >>> > version of Hive (r894548 compiled off trunk) seems to be >> misbehaving. >> >>> > >> >>> > I'm importing about 6 files into each partition. However, after the >> >>> > script >> >>> > ends, only 5 files show up in each partition. Do I need to >> explicitly >> >>> > issue >> >>> > the ADD PARTITION command before loading data? Isn't the partition >> >>> > implicitly created? >> >>> > >> >>> > Saurabh. >> >>> > -- >> >>> > http://nandz.blogspot.com >> >>> > http://foodieforlife.blogspot.com >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Yours, >> >>> Zheng >> >> >> >> >> >> >> >> -- >> >> http://nandz.blogspot.com >> >> http://foodieforlife.blogspot.com >> > >> > >> > >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsSaurabh Nanda 2009-12-30, 06:50
Attached are the task logs of one of the tasks.
Saurabh. On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > This should be compiled into a single map-only job. > Can you take a look at the progress and the task logs of the job? > > We are not aware of any changes that might cause this problem. > > Zheng > > On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[EMAIL PROTECTED]> > wrote: > > Picking up data from the 'raw' table, filtering the unwanted lines and > > inserting into 'raw_compressed' table which is stored as sequencefile: > > > > insert overwrite table raw_compressed partition(dt='2009-04-01') select > line > > from raw where dt='2009-04-01' and lower(line) rlike '.*get > > .*/confirmation.*http.*' and not lower(line) rlike > > > '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; > > > > Saurabh. > > > > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> > >> What is the import query? Do you mean "load data"? > >> Can you give an example? > >> > >> Zheng > >> > >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[EMAIL PROTECTED] > > > >> wrote: > >> > Also has something changed drastically in Hive over the last 2-3 > months? > >> > A > >> > simply import query seems to be taking forever now! > >> > > >> > Saurabh. > >> > > >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda < > [EMAIL PROTECTED]> > >> > wrote: > >> >> > >> >> I'm taking a look at the HDFS directories through the web interface > and > >> >> I > >> >> can see only 5 files there, not 6. I tried creating the partition > using > >> >> the > >> >> ADD PARTITION command. After that all 6 files get imported > >> >> successfully. > >> >> > >> >> Saurabh. > >> >> > >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> > wrote: > >> >>> > >> >>> Can you list the HDFS directories? Are the files in the > corresponding > >> >>> directories yet? > >> >>> > >> >>> > >> >>> Zheng > >> >>> > >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda > >> >>> <[EMAIL PROTECTED]> > >> >>> wrote: > >> >>> > Hi, > >> >>> > > >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of > >> >>> > any > >> >>> > new > >> >>> > developments. I had written a script some time back to import > >> >>> > webserver > >> >>> > logs > >> >>> > for a day into a new partition. The same script now running on the > >> >>> > latest > >> >>> > version of Hive (r894548 compiled off trunk) seems to be > >> >>> > misbehaving. > >> >>> > > >> >>> > I'm importing about 6 files into each partition. However, after > the > >> >>> > script > >> >>> > ends, only 5 files show up in each partition. Do I need to > >> >>> > explicitly > >> >>> > issue > >> >>> > the ADD PARTITION command before loading data? Isn't the partition > >> >>> > implicitly created? > >> >>> > > >> >>> > Saurabh. > >> >>> > -- > >> >>> > http://nandz.blogspot.com > >> >>> > http://foodieforlife.blogspot.com > >> >>> > > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Yours, > >> >>> Zheng > >> >> > >> >> > >> >> > >> >> -- > >> >> http://nandz.blogspot.com > >> >> http://foodieforlife.blogspot.com > >> > > >> > > >> > > >> > -- > >> > http://nandz.blogspot.com > >> > http://foodieforlife.blogspot.com > >> > > >> > >> > >> > >> -- > >> Yours, > >> Zheng > > > > > > > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsSaurabh Nanda 2009-12-30, 06:54
The rate at which "Map input bytes" and "Map input records" is growing is
extremely slow. Is something wrong with the HDFS configuration? But, it was working perfectly fine with the previous Hive version. Saurabh. On Wed, Dec 30, 2009 at 12:20 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Attached are the task logs of one of the tasks. > > Saurabh. > > > On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> This should be compiled into a single map-only job. >> Can you take a look at the progress and the task logs of the job? >> >> We are not aware of any changes that might cause this problem. >> >> Zheng >> >> On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> wrote: >> > Picking up data from the 'raw' table, filtering the unwanted lines and >> > inserting into 'raw_compressed' table which is stored as sequencefile: >> > >> > insert overwrite table raw_compressed partition(dt='2009-04-01') select >> line >> > from raw where dt='2009-04-01' and lower(line) rlike '.*get >> > .*/confirmation.*http.*' and not lower(line) rlike >> > >> '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; >> > >> > Saurabh. >> > >> > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >> >> >> What is the import query? Do you mean "load data"? >> >> Can you give an example? >> >> >> >> Zheng >> >> >> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda < >> [EMAIL PROTECTED]> >> >> wrote: >> >> > Also has something changed drastically in Hive over the last 2-3 >> months? >> >> > A >> >> > simply import query seems to be taking forever now! >> >> > >> >> > Saurabh. >> >> > >> >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda < >> [EMAIL PROTECTED]> >> >> > wrote: >> >> >> >> >> >> I'm taking a look at the HDFS directories through the web interface >> and >> >> >> I >> >> >> can see only 5 files there, not 6. I tried creating the partition >> using >> >> >> the >> >> >> ADD PARTITION command. After that all 6 files get imported >> >> >> successfully. >> >> >> >> >> >> Saurabh. >> >> >> >> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> >> wrote: >> >> >>> >> >> >>> Can you list the HDFS directories? Are the files in the >> corresponding >> >> >>> directories yet? >> >> >>> >> >> >>> >> >> >>> Zheng >> >> >>> >> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda >> >> >>> <[EMAIL PROTECTED]> >> >> >>> wrote: >> >> >>> > Hi, >> >> >>> > >> >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of >> >> >>> > any >> >> >>> > new >> >> >>> > developments. I had written a script some time back to import >> >> >>> > webserver >> >> >>> > logs >> >> >>> > for a day into a new partition. The same script now running on >> the >> >> >>> > latest >> >> >>> > version of Hive (r894548 compiled off trunk) seems to be >> >> >>> > misbehaving. >> >> >>> > >> >> >>> > I'm importing about 6 files into each partition. However, after >> the >> >> >>> > script >> >> >>> > ends, only 5 files show up in each partition. Do I need to >> >> >>> > explicitly >> >> >>> > issue >> >> >>> > the ADD PARTITION command before loading data? Isn't the >> partition >> >> >>> > implicitly created? >> >> >>> > >> >> >>> > Saurabh. >> >> >>> > -- >> >> >>> > http://nandz.blogspot.com >> >> >>> > http://foodieforlife.blogspot.com >> >> >>> > >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> Yours, >> >> >>> Zheng >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> http://nandz.blogspot.com >> >> >> http://foodieforlife.blogspot.com >> >> > >> >> > >> >> > >> >> > -- >> >> > http://nandz.blogspot.com >> >> > http://foodieforlife.blogspot.com >> >> > >> >> >> >> >> >> >> >> -- >> >> Yours, >> >> Zheng >> > >> > >> > >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsSaurabh Nanda 2009-12-30, 07:02
I reverted back to the old build and the same query is working fine now. How
do I find out the SVN revision of the old build? Saurabh. On Wed, Dec 30, 2009 at 12:24 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > The rate at which "Map input bytes" and "Map input records" is growing is > extremely slow. Is something wrong with the HDFS configuration? But, it was > working perfectly fine with the previous Hive version. > > Saurabh. > > > On Wed, Dec 30, 2009 at 12:20 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> Attached are the task logs of one of the tasks. >> >> Saurabh. >> >> >> On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> This should be compiled into a single map-only job. >>> Can you take a look at the progress and the task logs of the job? >>> >>> We are not aware of any changes that might cause this problem. >>> >>> Zheng >>> >>> On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[EMAIL PROTECTED]> >>> wrote: >>> > Picking up data from the 'raw' table, filtering the unwanted lines and >>> > inserting into 'raw_compressed' table which is stored as sequencefile: >>> > >>> > insert overwrite table raw_compressed partition(dt='2009-04-01') select >>> line >>> > from raw where dt='2009-04-01' and lower(line) rlike '.*get >>> > .*/confirmation.*http.*' and not lower(line) rlike >>> > >>> '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; >>> > >>> > Saurabh. >>> > >>> > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>> >> >>> >> What is the import query? Do you mean "load data"? >>> >> Can you give an example? >>> >> >>> >> Zheng >>> >> >>> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda < >>> [EMAIL PROTECTED]> >>> >> wrote: >>> >> > Also has something changed drastically in Hive over the last 2-3 >>> months? >>> >> > A >>> >> > simply import query seems to be taking forever now! >>> >> > >>> >> > Saurabh. >>> >> > >>> >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda < >>> [EMAIL PROTECTED]> >>> >> > wrote: >>> >> >> >>> >> >> I'm taking a look at the HDFS directories through the web interface >>> and >>> >> >> I >>> >> >> can see only 5 files there, not 6. I tried creating the partition >>> using >>> >> >> the >>> >> >> ADD PARTITION command. After that all 6 files get imported >>> >> >> successfully. >>> >> >> >>> >> >> Saurabh. >>> >> >> >>> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> >>> wrote: >>> >> >>> >>> >> >>> Can you list the HDFS directories? Are the files in the >>> corresponding >>> >> >>> directories yet? >>> >> >>> >>> >> >>> >>> >> >>> Zheng >>> >> >>> >>> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda >>> >> >>> <[EMAIL PROTECTED]> >>> >> >>> wrote: >>> >> >>> > Hi, >>> >> >>> > >>> >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware >>> of >>> >> >>> > any >>> >> >>> > new >>> >> >>> > developments. I had written a script some time back to import >>> >> >>> > webserver >>> >> >>> > logs >>> >> >>> > for a day into a new partition. The same script now running on >>> the >>> >> >>> > latest >>> >> >>> > version of Hive (r894548 compiled off trunk) seems to be >>> >> >>> > misbehaving. >>> >> >>> > >>> >> >>> > I'm importing about 6 files into each partition. However, after >>> the >>> >> >>> > script >>> >> >>> > ends, only 5 files show up in each partition. Do I need to >>> >> >>> > explicitly >>> >> >>> > issue >>> >> >>> > the ADD PARTITION command before loading data? Isn't the >>> partition >>> >> >>> > implicitly created? >>> >> >>> > >>> >> >>> > Saurabh. >>> >> >>> > -- >>> >> >>> > http://nandz.blogspot.com >>> >> >>> > http://foodieforlife.blogspot.com >>> >> >>> > >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> -- >>> >> >>> Yours, >>> >> >>> Zheng >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> http://nandz.blogspot.com >>> >> >> http://foodieforlife.blogspot.com >>> >> > >>> >> > >>> >> > >>> >> > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsVijay 2009-12-30, 19:45
You can try "svn info" in the directory to get detailed information.
On Tue, Dec 29, 2009 at 11:02 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > I reverted back to the old build and the same query is working fine now. > How do I find out the SVN revision of the old build? > > Saurabh. > > > On Wed, Dec 30, 2009 at 12:24 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> The rate at which "Map input bytes" and "Map input records" is growing is >> extremely slow. Is something wrong with the HDFS configuration? But, it was >> working perfectly fine with the previous Hive version. >> >> Saurabh. >> >> >> On Wed, Dec 30, 2009 at 12:20 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: >> >>> Attached are the task logs of one of the tasks. >>> >>> Saurabh. >>> >>> >>> On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>> >>>> This should be compiled into a single map-only job. >>>> Can you take a look at the progress and the task logs of the job? >>>> >>>> We are not aware of any changes that might cause this problem. >>>> >>>> Zheng >>>> >>>> On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[EMAIL PROTECTED]> >>>> wrote: >>>> > Picking up data from the 'raw' table, filtering the unwanted lines and >>>> > inserting into 'raw_compressed' table which is stored as sequencefile: >>>> > >>>> > insert overwrite table raw_compressed partition(dt='2009-04-01') >>>> select line >>>> > from raw where dt='2009-04-01' and lower(line) rlike '.*get >>>> > .*/confirmation.*http.*' and not lower(line) rlike >>>> > >>>> '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; >>>> > >>>> > Saurabh. >>>> > >>>> > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> >>>> wrote: >>>> >> >>>> >> What is the import query? Do you mean "load data"? >>>> >> Can you give an example? >>>> >> >>>> >> Zheng >>>> >> >>>> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda < >>>> [EMAIL PROTECTED]> >>>> >> wrote: >>>> >> > Also has something changed drastically in Hive over the last 2-3 >>>> months? >>>> >> > A >>>> >> > simply import query seems to be taking forever now! >>>> >> > >>>> >> > Saurabh. >>>> >> > >>>> >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda < >>>> [EMAIL PROTECTED]> >>>> >> > wrote: >>>> >> >> >>>> >> >> I'm taking a look at the HDFS directories through the web >>>> interface and >>>> >> >> I >>>> >> >> can see only 5 files there, not 6. I tried creating the partition >>>> using >>>> >> >> the >>>> >> >> ADD PARTITION command. After that all 6 files get imported >>>> >> >> successfully. >>>> >> >> >>>> >> >> Saurabh. >>>> >> >> >>>> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> >>>> wrote: >>>> >> >>> >>>> >> >>> Can you list the HDFS directories? Are the files in the >>>> corresponding >>>> >> >>> directories yet? >>>> >> >>> >>>> >> >>> >>>> >> >>> Zheng >>>> >> >>> >>>> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda >>>> >> >>> <[EMAIL PROTECTED]> >>>> >> >>> wrote: >>>> >> >>> > Hi, >>>> >> >>> > >>>> >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware >>>> of >>>> >> >>> > any >>>> >> >>> > new >>>> >> >>> > developments. I had written a script some time back to import >>>> >> >>> > webserver >>>> >> >>> > logs >>>> >> >>> > for a day into a new partition. The same script now running on >>>> the >>>> >> >>> > latest >>>> >> >>> > version of Hive (r894548 compiled off trunk) seems to be >>>> >> >>> > misbehaving. >>>> >> >>> > >>>> >> >>> > I'm importing about 6 files into each partition. However, after >>>> the >>>> >> >>> > script >>>> >> >>> > ends, only 5 files show up in each partition. Do I need to >>>> >> >>> > explicitly >>>> >> >>> > issue >>>> >> >>> > the ADD PARTITION command before loading data? Isn't the >>>> partition >>>> >> >>> > implicitly created? >>>> >> >>> > >>>> >> >>> > Saurabh. >>>> >> >>> > -- >>>> >> >>> > http://nandz.blogspot.com >>>> >> >>> > http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsSaurabh Nanda 2010-01-04, 12:18
For that I'll have to do an 'svn revert'.
I have a directory with Hive built from the old svn revision. Is there a file in the build directory which stores the svn revision? Saurabh. On Thu, Dec 31, 2009 at 1:15 AM, Vijay <[EMAIL PROTECTED]> wrote: > You can try "svn info" in the directory to get detailed information. > > > On Tue, Dec 29, 2009 at 11:02 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> I reverted back to the old build and the same query is working fine now. >> How do I find out the SVN revision of the old build? >> >> Saurabh. >> >> >> On Wed, Dec 30, 2009 at 12:24 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: >> >>> The rate at which "Map input bytes" and "Map input records" is growing is >>> extremely slow. Is something wrong with the HDFS configuration? But, it was >>> working perfectly fine with the previous Hive version. >>> >>> Saurabh. >>> >>> >>> On Wed, Dec 30, 2009 at 12:20 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: >>> >>>> Attached are the task logs of one of the tasks. >>>> >>>> Saurabh. >>>> >>>> >>>> On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>>> >>>>> This should be compiled into a single map-only job. >>>>> Can you take a look at the progress and the task logs of the job? >>>>> >>>>> We are not aware of any changes that might cause this problem. >>>>> >>>>> Zheng >>>>> >>>>> On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda < >>>>> [EMAIL PROTECTED]> wrote: >>>>> > Picking up data from the 'raw' table, filtering the unwanted lines >>>>> and >>>>> > inserting into 'raw_compressed' table which is stored as >>>>> sequencefile: >>>>> > >>>>> > insert overwrite table raw_compressed partition(dt='2009-04-01') >>>>> select line >>>>> > from raw where dt='2009-04-01' and lower(line) rlike '.*get >>>>> > .*/confirmation.*http.*' and not lower(line) rlike >>>>> > >>>>> '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; >>>>> > >>>>> > Saurabh. >>>>> > >>>>> > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >> >>>>> >> What is the import query? Do you mean "load data"? >>>>> >> Can you give an example? >>>>> >> >>>>> >> Zheng >>>>> >> >>>>> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda < >>>>> [EMAIL PROTECTED]> >>>>> >> wrote: >>>>> >> > Also has something changed drastically in Hive over the last 2-3 >>>>> months? >>>>> >> > A >>>>> >> > simply import query seems to be taking forever now! >>>>> >> > >>>>> >> > Saurabh. >>>>> >> > >>>>> >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda < >>>>> [EMAIL PROTECTED]> >>>>> >> > wrote: >>>>> >> >> >>>>> >> >> I'm taking a look at the HDFS directories through the web >>>>> interface and >>>>> >> >> I >>>>> >> >> can see only 5 files there, not 6. I tried creating the partition >>>>> using >>>>> >> >> the >>>>> >> >> ADD PARTITION command. After that all 6 files get imported >>>>> >> >> successfully. >>>>> >> >> >>>>> >> >> Saurabh. >>>>> >> >> >>>>> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >> >>> >>>>> >> >>> Can you list the HDFS directories? Are the files in the >>>>> corresponding >>>>> >> >>> directories yet? >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> Zheng >>>>> >> >>> >>>>> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda >>>>> >> >>> <[EMAIL PROTECTED]> >>>>> >> >>> wrote: >>>>> >> >>> > Hi, >>>>> >> >>> > >>>>> >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware >>>>> of >>>>> >> >>> > any >>>>> >> >>> > new >>>>> >> >>> > developments. I had written a script some time back to import >>>>> >> >>> > webserver >>>>> >> >>> > logs >>>>> >> >>> > for a day into a new partition. The same script now running on >>>>> the >>>>> >> >>> > latest >>>>> >> >>> > version of Hive (r894548 compiled off trunk) seems to be >>>>> >> >>> > misbehaving. >>>>> >> >>> > >>>>> >> >>> > I'm importing about 6 files into each partition. However, >>>>> after the http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: First import into new partition disappearsCarl Steinbach 2010-01-04, 16:57
Hi Saurabh,
I don't think there is currently a way of determining the SVN revision number by looking at the contents of the Hive build directory, but it is possible to include the SVN revision number in JAR manifests, and we should probably start doing this. I filed HIVE-1025 to track this improvement request. Thanks. Carl On Mon, Jan 4, 2010 at 4:18 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > For that I'll have to do an 'svn revert'. > > I have a directory with Hive built from the old svn revision. Is there a > file in the build directory which stores the svn revision? > > Saurabh. > > > On Thu, Dec 31, 2009 at 1:15 AM, Vijay <[EMAIL PROTECTED]> wrote: > >> You can try "svn info" in the directory to get detailed information. >> >> >> On Tue, Dec 29, 2009 at 11:02 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: >> >>> I reverted back to the old build and the same query is working fine now. >>> How do I find out the SVN revision of the old build? >>> >>> Saurabh. >>> >>> >>> On Wed, Dec 30, 2009 at 12:24 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: >>> >>>> The rate at which "Map input bytes" and "Map input records" is growing >>>> is extremely slow. Is something wrong with the HDFS configuration? But, it >>>> was working perfectly fine with the previous Hive version. >>>> >>>> Saurabh. >>>> >>>> >>>> On Wed, Dec 30, 2009 at 12:20 PM, Saurabh Nanda <[EMAIL PROTECTED] >>>> > wrote: >>>> >>>>> Attached are the task logs of one of the tasks. >>>>> >>>>> Saurabh. >>>>> >>>>> >>>>> On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> This should be compiled into a single map-only job. >>>>>> Can you take a look at the progress and the task logs of the job? >>>>>> >>>>>> We are not aware of any changes that might cause this problem. >>>>>> >>>>>> Zheng >>>>>> >>>>>> On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda < >>>>>> [EMAIL PROTECTED]> wrote: >>>>>> > Picking up data from the 'raw' table, filtering the unwanted lines >>>>>> and >>>>>> > inserting into 'raw_compressed' table which is stored as >>>>>> sequencefile: >>>>>> > >>>>>> > insert overwrite table raw_compressed partition(dt='2009-04-01') >>>>>> select line >>>>>> > from raw where dt='2009-04-01' and lower(line) rlike '.*get >>>>>> > .*/confirmation.*http.*' and not lower(line) rlike >>>>>> > >>>>>> '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; >>>>>> > >>>>>> > Saurabh. >>>>>> > >>>>>> > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>> >> >>>>>> >> What is the import query? Do you mean "load data"? >>>>>> >> Can you give an example? >>>>>> >> >>>>>> >> Zheng >>>>>> >> >>>>>> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda < >>>>>> [EMAIL PROTECTED]> >>>>>> >> wrote: >>>>>> >> > Also has something changed drastically in Hive over the last 2-3 >>>>>> months? >>>>>> >> > A >>>>>> >> > simply import query seems to be taking forever now! >>>>>> >> > >>>>>> >> > Saurabh. >>>>>> >> > >>>>>> >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda < >>>>>> [EMAIL PROTECTED]> >>>>>> >> > wrote: >>>>>> >> >> >>>>>> >> >> I'm taking a look at the HDFS directories through the web >>>>>> interface and >>>>>> >> >> I >>>>>> >> >> can see only 5 files there, not 6. I tried creating the >>>>>> partition using >>>>>> >> >> the >>>>>> >> >> ADD PARTITION command. After that all 6 files get imported >>>>>> >> >> successfully. >>>>>> >> >> >>>>>> >> >> Saurabh. >>>>>> >> >> >>>>>> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>> >> >>> >>>>>> >> >>> Can you list the HDFS directories? Are the files in the >>>>>> corresponding >>>>>> >> >>> directories yet? >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> Zheng >>>>>> >> >>> >>>>>> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda >>>>>> >> >>> <[EMAIL PROTECTED]> >>>>>> >> >>> wrote: >>>>>> >> >>> > Hi, >>>>>> >> >>> > >>>>>> >> >>> > I'm revisiting Hive after a long hiatus, so I may not be
-
RE: First import into new partition disappearsAshish Thusoo 2010-01-04, 18:37
This is should already be there.
http://issues.apache.org/jira/browse/HIVE-760 Ashish ________________________________ From: Carl Steinbach [mailto:[EMAIL PROTECTED]] Sent: Monday, January 04, 2010 8:57 AM To: [EMAIL PROTECTED] Subject: Re: First import into new partition disappears Hi Saurabh, I don't think there is currently a way of determining the SVN revision number by looking at the contents of the Hive build directory, but it is possible to include the SVN revision number in JAR manifests, and we should probably start doing this. I filed HIVE-1025 to track this improvement request. Thanks. Carl On Mon, Jan 4, 2010 at 4:18 AM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: For that I'll have to do an 'svn revert'. I have a directory with Hive built from the old svn revision. Is there a file in the build directory which stores the svn revision? Saurabh. On Thu, Dec 31, 2009 at 1:15 AM, Vijay <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: You can try "svn info" in the directory to get detailed information. On Tue, Dec 29, 2009 at 11:02 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: I reverted back to the old build and the same query is working fine now. How do I find out the SVN revision of the old build? Saurabh. On Wed, Dec 30, 2009 at 12:24 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: The rate at which "Map input bytes" and "Map input records" is growing is extremely slow. Is something wrong with the HDFS configuration? But, it was working perfectly fine with the previous Hive version. Saurabh. On Wed, Dec 30, 2009 at 12:20 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Attached are the task logs of one of the tasks. Saurabh. On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: This should be compiled into a single map-only job. Can you take a look at the progress and the task logs of the job? We are not aware of any changes that might cause this problem. Zheng On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > Picking up data from the 'raw' table, filtering the unwanted lines and > inserting into 'raw_compressed' table which is stored as sequencefile: > > insert overwrite table raw_compressed partition(dt='2009-04-01') select line > from raw where dt='2009-04-01' and lower(line) rlike '.*get > .*/confirmation.*http.*' and not lower(line) rlike > '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; > > Saurabh. > > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >> >> What is the import query? Do you mean "load data"? >> Can you give an example? >> >> Zheng >> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> wrote: >> > Also has something changed drastically in Hive over the last 2-3 months? >> > A >> > simply import query seems to be taking forever now! >> > >> > Saurabh. >> > >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> > wrote: >> >> >> >> I'm taking a look at the HDFS directories through the web interface and >> >> I >> >> can see only 5 files there, not 6. I tried creating the partition using >> >> the >> >> ADD PARTITION command. After that all 6 files get imported >> >> successfully. >> >> >> >> Saurabh. >> >> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >> >>> >> >>> Can you list the HDFS directories? Are the files in the corresponding >> >>> directories yet? >> >>> >> >>> >> >>> Zheng >> >>> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda >> >>> <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of Yours, Zheng http://nandz.blogspot.com http://foodieforlife.blogspot.com http://nandz.blogspot.com http://foodieforlife.blogspot.com http://nandz.blogspot.com http://foodieforlife.blogspot.com http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
RE: First import into new partition disappearsAshish Thusoo 2010-01-04, 20:25
That actually is the software version # and not the svn number. So +1 to this.
Ashish ________________________________ From: Ashish Thusoo [mailto:[EMAIL PROTECTED]] Sent: Monday, January 04, 2010 10:38 AM To: [EMAIL PROTECTED] Subject: RE: First import into new partition disappears This is should already be there. http://issues.apache.org/jira/browse/HIVE-760 Ashish ________________________________ From: Carl Steinbach [mailto:[EMAIL PROTECTED]] Sent: Monday, January 04, 2010 8:57 AM To: [EMAIL PROTECTED] Subject: Re: First import into new partition disappears Hi Saurabh, I don't think there is currently a way of determining the SVN revision number by looking at the contents of the Hive build directory, but it is possible to include the SVN revision number in JAR manifests, and we should probably start doing this. I filed HIVE-1025 to track this improvement request. Thanks. Carl On Mon, Jan 4, 2010 at 4:18 AM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: For that I'll have to do an 'svn revert'. I have a directory with Hive built from the old svn revision. Is there a file in the build directory which stores the svn revision? Saurabh. On Thu, Dec 31, 2009 at 1:15 AM, Vijay <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: You can try "svn info" in the directory to get detailed information. On Tue, Dec 29, 2009 at 11:02 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: I reverted back to the old build and the same query is working fine now. How do I find out the SVN revision of the old build? Saurabh. On Wed, Dec 30, 2009 at 12:24 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: The rate at which "Map input bytes" and "Map input records" is growing is extremely slow. Is something wrong with the HDFS configuration? But, it was working perfectly fine with the previous Hive version. Saurabh. On Wed, Dec 30, 2009 at 12:20 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Attached are the task logs of one of the tasks. Saurabh. On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: This should be compiled into a single map-only job. Can you take a look at the progress and the task logs of the job? We are not aware of any changes that might cause this problem. Zheng On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > Picking up data from the 'raw' table, filtering the unwanted lines and > inserting into 'raw_compressed' table which is stored as sequencefile: > > insert overwrite table raw_compressed partition(dt='2009-04-01') select line > from raw where dt='2009-04-01' and lower(line) rlike '.*get > .*/confirmation.*http.*' and not lower(line) rlike > '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)'; > > Saurabh. > > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >> >> What is the import query? Do you mean "load data"? >> Can you give an example? >> >> Zheng >> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> wrote: >> > Also has something changed drastically in Hive over the last 2-3 months? >> > A >> > simply import query seems to be taking forever now! >> > >> > Saurabh. >> > >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> > wrote: >> >> >> >> I'm taking a look at the HDFS directories through the web interface and >> >> I >> >> can see only 5 files there, not 6. I tried creating the partition using >> >> the >> >> ADD PARTITION command. After that all 6 files get imported >> >> successfully. >> >> >> >> Saurabh. >> >> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >> >>> >> >>> Can you list the HDFS directories? Are the files in the corresponding Yours, Zheng http://nandz.blogspot.com http://foodieforlife.blogspot.com http://nandz.blogspot.com http://foodieforlife.blogspot.com http://nandz.blogspot.com http://foodieforlife.blogspot.com http://nandz.blogspot.com http://foodieforlife.blogspot.com |