|
Keith Wiley
2012-03-09, 23:44
Steven Wong
2012-03-10, 00:46
王锋
2012-03-10, 01:12
Keith Wiley
2012-03-10, 03:42
Keith Wiley
2012-03-12, 16:51
Balaji Rao
2012-03-12, 17:07
Keith Wiley
2012-03-12, 17:25
|
-
Basic statement problemsKeith Wiley 2012-03-09, 23:44
I successfully installed and used Hive to create basic tables (on one of my two machines; another discussion describes the problems I'm having with the other machine). However, basic queries aren't working. I brought a typical CSV file into a Hive table and it seemed fine. Here's how I did it:
CREATE EXTERNAL TABLE stringmap (ObjectTypeCode INT, AttributeName STRING, AttributeValue INT, LangId INT, Value STRING, DisplayOrder INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/crm/records/all/hiveTest/StringMap.csv'; "show tables" and "describe stringmap" return correct results. However, if I run a really simple query, it returns incorrect results. For example, a row count query returns a 0. Observe: hive> select count(*) from stringmap; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201202221500_0103, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201202221500_0103 Kill Command = /media/sdb1/kwiley/hadoop/hadoop-0.20.2-cdh3u3/bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201202221500_0103 2012-03-09 16:20:26,243 Stage-1 map = 0%, reduce = 0% 2012-03-09 16:20:29,258 Stage-1 map = 0%, reduce = 100% 2012-03-09 16:20:32,278 Stage-1 map = 100%, reduce = 100% Ended Job = job_201202221500_0103 OK 0 Time taken: 15.969 seconds The Hadoop job runs without error, but it returns a 0. The job tracker indicates that Hive create a job with 0 mappers and 1 reducer. I don't see any useful output in the reducer task log however. The following is the from the hive log: 2012-03-09 16:20:17,057 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2012-03-09 16:20:17,057 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2012-03-09 16:20:17,059 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2012-03-09 16:20:17,059 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2012-03-09 16:20:17,060 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2012-03-09 16:20:17,060 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2012-03-09 16:20:22,781 WARN mapred.JobClient (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2012-03-09 16:20:22,946 WARN snappy.LoadSnappy (LoadSnappy.java:<clinit>(36)) - Snappy native library is available I admit, that does look rather erroneous, but I'm not sure what to make of it. I looked those errors up online but didn't find much that seemed to suggest a cause or solution. Any ideas? ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't scratch. All together this implies: He scratched the itch from the scratch that itched but would never itch the scratch from the itch that scratched." -- Keith Wiley ________________________________________________________________________________
-
RE: Basic statement problemsSteven Wong 2012-03-10, 00:46
The LOCATION clause has to specify the directory that contains (only) your data files.
-----Original Message----- From: Keith Wiley [mailto:[EMAIL PROTECTED]] Sent: Friday, March 09, 2012 3:44 PM To: [EMAIL PROTECTED] Subject: Basic statement problems I successfully installed and used Hive to create basic tables (on one of my two machines; another discussion describes the problems I'm having with the other machine). However, basic queries aren't working. I brought a typical CSV file into a Hive table and it seemed fine. Here's how I did it: CREATE EXTERNAL TABLE stringmap (ObjectTypeCode INT, AttributeName STRING, AttributeValue INT, LangId INT, Value STRING, DisplayOrder INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/crm/records/all/hiveTest/StringMap.csv'; "show tables" and "describe stringmap" return correct results. However, if I run a really simple query, it returns incorrect results. For example, a row count query returns a 0. Observe: hive> select count(*) from stringmap; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201202221500_0103, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201202221500_0103 Kill Command = /media/sdb1/kwiley/hadoop/hadoop-0.20.2-cdh3u3/bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201202221500_0103 2012-03-09 16:20:26,243 Stage-1 map = 0%, reduce = 0% 2012-03-09 16:20:29,258 Stage-1 map = 0%, reduce = 100% 2012-03-09 16:20:32,278 Stage-1 map = 100%, reduce = 100% Ended Job = job_201202221500_0103 OK 0 Time taken: 15.969 seconds The Hadoop job runs without error, but it returns a 0. The job tracker indicates that Hive create a job with 0 mappers and 1 reducer. I don't see any useful output in the reducer task log however. The following is the from the hive log: 2012-03-09 16:20:17,057 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2012-03-09 16:20:17,057 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2012-03-09 16:20:17,059 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2012-03-09 16:20:17,059 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2012-03-09 16:20:17,060 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2012-03-09 16:20:17,060 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2012-03-09 16:20:22,781 WARN mapred.JobClient (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2012-03-09 16:20:22,946 WARN snappy.LoadSnappy (LoadSnappy.java:<clinit>(36)) - Snappy native library is available I admit, that does look rather erroneous, but I'm not sure what to make of it. I looked those errors up online but didn't find much that seemed to suggest a cause or solution. Any ideas? ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't scratch. All together this implies: He scratched the itch from the scratch that itched but would never itch the scratch from the itch that scratched." -- Keith Wiley ________________________________________________________________________________
-
Re:RE: Basic statement problems王锋 2012-03-10, 01:12
support that location must be your data directory.
At 2012-03-10 08:46:01,"Steven Wong" <[EMAIL PROTECTED]> wrote: >The LOCATION clause has to specify the directory that contains (only) your data files. > >-----Original Message----- >From: Keith Wiley [mailto:[EMAIL PROTECTED]] >Sent: Friday, March 09, 2012 3:44 PM >To: [EMAIL PROTECTED] >Subject: Basic statement problems > >I successfully installed and used Hive to create basic tables (on one of my two machines; another discussion describes the problems I'm having with the other machine). However, basic queries aren't working. I brought a typical CSV file into a Hive table and it seemed fine. Here's how I did it: > >CREATE EXTERNAL TABLE stringmap (ObjectTypeCode INT, AttributeName STRING, AttributeValue INT, LangId INT, Value STRING, DisplayOrder INT) >ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' >STORED AS TEXTFILE >LOCATION '/crm/records/all/hiveTest/StringMap.csv'; > >"show tables" and "describe stringmap" return correct results. However, if I run a really simple query, it returns incorrect results. For example, a row count query returns a 0. Observe: > >hive> select count(*) from stringmap; >Total MapReduce jobs = 1 >Launching Job 1 out of 1 >Number of reduce tasks determined at compile time: 1 >In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> >In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> >In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> >Starting Job = job_201202221500_0103, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201202221500_0103 >Kill Command = /media/sdb1/kwiley/hadoop/hadoop-0.20.2-cdh3u3/bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201202221500_0103 >2012-03-09 16:20:26,243 Stage-1 map = 0%, reduce = 0% >2012-03-09 16:20:29,258 Stage-1 map = 0%, reduce = 100% >2012-03-09 16:20:32,278 Stage-1 map = 100%, reduce = 100% >Ended Job = job_201202221500_0103 >OK >0 >Time taken: 15.969 seconds > >The Hadoop job runs without error, but it returns a 0. The job tracker indicates that Hive create a job with 0 mappers and 1 reducer. I don't see any useful output in the reducer task log however. The following is the from the hive log: > >2012-03-09 16:20:17,057 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. >2012-03-09 16:20:17,057 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. >2012-03-09 16:20:17,059 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. >2012-03-09 16:20:17,059 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. >2012-03-09 16:20:17,060 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. >2012-03-09 16:20:17,060 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. >2012-03-09 16:20:22,781 WARN mapred.JobClient (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. >2012-03-09 16:20:22,946 WARN snappy.LoadSnappy (LoadSnappy.java:<clinit>(36)) - Snappy native library is available > >I admit, that does look rather erroneous, but I'm not sure what to make of it. I looked those errors up online but didn't find much that seemed to suggest a cause or solution. > >Any ideas? > >________________________________________________________________________________ >Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
-
Re: Basic statement problemsKeith Wiley 2012-03-10, 03:42
So a directory, not a specific file. I thought I tried it both ways, but I'll switch it back the other way and try again.
Thanks. On Mar 9, 2012, at 16:46 , Steven Wong wrote: > The LOCATION clause has to specify the directory that contains (only) your data files. ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "And what if we picked the wrong religion? Every week, we're just making God madder and madder!" -- Homer Simpson ________________________________________________________________________________
-
Re: Basic statement problemsKeith Wiley 2012-03-12, 16:51
On Mar 9, 2012, at 16:46 , Steven Wong wrote:
> The LOCATION clause has to specify the directory that contains (only) your data files. I've tried it both ways: CREATE EXTERNAL TABLE stringmap (ObjectTypeCode INT, AttributeName STRING, AttributeValue INT, LangId INT, Value STRING, DisplayOrder INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/crm/records/all/hiveTest'; CREATE EXTERNAL TABLE stringmap (ObjectTypeCode INT, AttributeName STRING, AttributeValue INT, LangId INT, Value STRING, DisplayOrder INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/crm/records/all/hiveTest/StringMap.csv'; In both cases, "show tables" lists stringmap and "describe stringmap" describes the columns shown above, but a basic query doesn't return any results. What else should I try here? ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems weird and scary to me." -- Abe (Grandpa) Simpson ________________________________________________________________________________
-
Re: Basic statement problemsBalaji Rao 2012-03-12, 17:07
What does a simple "select * from stringmap limit 10" return ?
On Mon, Mar 12, 2012 at 12:51 PM, Keith Wiley <[EMAIL PROTECTED]> wrote: > On Mar 9, 2012, at 16:46 , Steven Wong wrote: > >> The LOCATION clause has to specify the directory that contains (only) your data files. > > I've tried it both ways: > > CREATE EXTERNAL TABLE stringmap (ObjectTypeCode INT, AttributeName STRING, AttributeValue INT, LangId INT, Value STRING, DisplayOrder INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > LOCATION '/crm/records/all/hiveTest'; > > CREATE EXTERNAL TABLE stringmap (ObjectTypeCode INT, AttributeName STRING, AttributeValue INT, LangId INT, Value STRING, DisplayOrder INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > LOCATION '/crm/records/all/hiveTest/StringMap.csv'; > > In both cases, "show tables" lists stringmap and "describe stringmap" describes the columns shown above, but a basic query doesn't return any results. > > What else should I try here? > > ________________________________________________________________________________ > Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com > > "I used to be with it, but then they changed what it was. Now, what I'm with > isn't it, and what's it seems weird and scary to me." > -- Abe (Grandpa) Simpson > ________________________________________________________________________________ >
-
Re: Basic statement problemsKeith Wiley 2012-03-12, 17:25
It has started working now. I don't know what I changed. I dropped every single table from hive, explicitly created a new directory on HDFS and moved the .csv file to that directory, ran hive again and created the table. This time it worked. I can perform queries against the directory. Maybe hadoop and hive confused each other about the other directory (it got corrupted or something)...or maybe I screwed something up, I dunno.
I would have expected better error-detection. Instead of simply returning 0-length queries, it would be nice if hive would actually produce an error message if I create the table in an invalid or incorrect fashion...but perhaps it couldn't tell; maybe the database just looked empty at hive's level of abstraction. I mean, even if I did screw something up (an option I am entirely open to), I never really got an error about it. Hive gladly wrapped a "hive" table around the directory, and the csv file in question, without an error of any kind. Hive could see the table, it listed and could describe it, but would then return empty queries against the table. When I moved the exact same csv file to a brand new HDFS directory and tried again from scratch, everything started working. I probably did something wrong, but some sort of error message would have been very helpful. Anyway, these tools are still pretty young. I understand that they will continue to evolve. The ability to detect and report errors will almost certainly improve with time. Thanks for the concerted efforts to help. Cheers! ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "What I primarily learned in grad school is how much I *don't* know. Consequently, I left grad school with a higher ignorance to knowledge ratio than when I entered." -- Keith Wiley ________________________________________________________________________________ |