|
shaik ahamed
2012-06-26, 07:42
Bejoy KS
2012-06-26, 08:14
shaik ahamed
2012-07-06, 11:39
Bejoy KS
2012-07-06, 11:52
shaik ahamed
2012-07-06, 12:47
Nitin Pawar
2012-07-06, 12:57
shaik ahamed
2012-07-11, 14:39
Mohammad Tariq
2012-07-11, 14:45
Bejoy KS
2012-07-11, 14:47
Mapred Learn
2012-07-11, 14:48
|
-
hi allshaik ahamed 2012-06-26, 07:42
Hi Users,
As i created an hive table with the below syntax CREATE EXTERNAL TABLE vender_part(vender string, supplier string,quantity int ) PARTITIONED BY (order_date string) row format delimited fields terminated by ',' stored as textfile; And inserted the 100GB of data with the below command INSERT OVERWRITE TABLE vender_part PARTITION (order_date) SELECT vender,supplier,order_date,quantity FROM vender; then im getting the below output Vendor_1 Supplier_111 2012-03-07 4240 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_112 2012-03-07 1237 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_113 2012-03-07 2970 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_114 2012-03-07 4652 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_115 2012-03-07 7414 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_116 2012-03-07 2334 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_117 2012-03-07 10522 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_118 2012-03-07 1776 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_119 2012-03-07 8344 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_120 2012-03-07 10362 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_121 2012-03-07 4579 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_122 2012-03-07 8020 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_123 2012-03-07 3520 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_124 2012-03-07 9124 NULL NULL __HIVE_DEFAULT_PARTITION__ please tell me that the above output is correct or not and why the 2 columns are null and there is a column with __HIVE_DEFAULT_PARTITION__ And if i select the partition table then the time taken to retrieve the data should be less ,when compare to before partition right that not happening for me. Time taken for 100GB of data is : 2192.416 seconds 3.If i select the partition table order_date im not getting the data. select * from vender_part where order_date='2012-03-07'; hive> select * from vender_part where order_date='2012-03-07'; OK Time taken: 2.801 seconds Please reply back to my above questions and help me out in going further with the clear output who it will come when we do the hive table partitioning And why im not getting the data for the partitoned table if i select the order_date. Thanks in advance shaik.
-
Re: hi allBejoy KS 2012-06-26, 08:14
Hi Shaik
On a first look, since you are using Dynamic Partition Insert, the partition column should be the last column on select query used in Insert Overwrite. Modify your Insert as INSERT OVERWRITE TABLE vender_part PARTITION (order_date) SELECT vender,supplier,quantity,order_date FROM vender; Once the insert job is complete verify your partitions You can view your partitions in any table using Show Paritions <TableName>; Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: shaik ahamed <[EMAIL PROTECTED]> Date: Tue, 26 Jun 2012 13:12:28 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: hi all Hi Users, As i created an hive table with the below syntax CREATE EXTERNAL TABLE vender_part(vender string, supplier string,quantity int ) PARTITIONED BY (order_date string) row format delimited fields terminated by ',' stored as textfile; And inserted the 100GB of data with the below command INSERT OVERWRITE TABLE vender_part PARTITION (order_date) SELECT vender,supplier,order_date,quantity FROM vender; then im getting the below output Vendor_1 Supplier_111 2012-03-07 4240 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_112 2012-03-07 1237 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_113 2012-03-07 2970 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_114 2012-03-07 4652 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_115 2012-03-07 7414 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_116 2012-03-07 2334 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_117 2012-03-07 10522 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_118 2012-03-07 1776 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_119 2012-03-07 8344 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_120 2012-03-07 10362 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_121 2012-03-07 4579 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_122 2012-03-07 8020 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_123 2012-03-07 3520 NULL NULL __HIVE_DEFAULT_PARTITION__ Vendor_1 Supplier_124 2012-03-07 9124 NULL NULL __HIVE_DEFAULT_PARTITION__ please tell me that the above output is correct or not and why the 2 columns are null and there is a column with __HIVE_DEFAULT_PARTITION__ And if i select the partition table then the time taken to retrieve the data should be less ,when compare to before partition right that not happening for me. Time taken for 100GB of data is : 2192.416 seconds 3.If i select the partition table order_date im not getting the data. select * from vender_part where order_date='2012-03-07'; hive> select * from vender_part where order_date='2012-03-07'; OK Time taken: 2.801 seconds Please reply back to my above questions and help me out in going further with the clear output who it will come when we do the hive table partitioning And why im not getting the data for the partitoned table if i select the order_date. Thanks in advance shaik.
-
hi allshaik ahamed 2012-07-06, 11:39
*Hi users,*
** * As im selecting the distinct column from the vender Hive table * ** *Im getting the below error plz help me in this* ** *hive> select distinct supplier from vender_sample;* Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=md-trngpoc1:54311 -kill job_201207061535_0005 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2012-07-06 17:03:13,978 Stage-1 map = 0%, reduce = 0% 2012-07-06 17:03:20,001 Stage-1 map = 100%, reduce = 0% 2012-07-06 17:04:20,248 Stage-1 map = 100%, reduce = 0% 2012-07-06 17:04:23,262 Stage-1 map = 100%, reduce = 100% Ended Job = job_201207061535_0005 with errors Error during job, obtaining debugging information... Examining task ID: task_201207061535_0005_m_000002 (and more) from job job_201207061535_0005 Task with the most failures(4): ----- Task ID: task_201207061535_0005_r_000000 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 99143041 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec Regards shaik.
-
Re: hi allBejoy KS 2012-07-06, 11:52
Hi Shaik
There is some error while MR jobs are running. To get the root cause please post in the error log from the failed task. You can browse the Job Tracker web UI and choose the right job Id and drill down to the failed tasks to get the error logs. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: shaik ahamed <[EMAIL PROTECTED]> Date: Fri, 6 Jul 2012 17:09:26 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: hi all *Hi users,* ** * As im selecting the distinct column from the vender Hive table * ** *Im getting the below error plz help me in this* ** *hive> select distinct supplier from vender_sample;* Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=md-trngpoc1:54311 -kill job_201207061535_0005 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2012-07-06 17:03:13,978 Stage-1 map = 0%, reduce = 0% 2012-07-06 17:03:20,001 Stage-1 map = 100%, reduce = 0% 2012-07-06 17:04:20,248 Stage-1 map = 100%, reduce = 0% 2012-07-06 17:04:23,262 Stage-1 map = 100%, reduce = 100% Ended Job = job_201207061535_0005 with errors Error during job, obtaining debugging information... Examining task ID: task_201207061535_0005_m_000002 (and more) from job job_201207061535_0005 Task with the most failures(4): ----- Task ID: task_201207061535_0005_r_000000 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 99143041 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec Regards shaik.
-
Re: hi allshaik ahamed 2012-07-06, 12:47
Hi ,
Below is the error,i found in the Job Tracker log file : *Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out* Please help me in this ... *Thanks in Advance* *Shaik.* On Fri, Jul 6, 2012 at 5:22 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > ** > Hi Shaik > > There is some error while MR jobs are running. To get the root cause > please post in the error log from the failed task. > > You can browse the Job Tracker web UI and choose the right job Id and > drill down to the failed tasks to get the error logs. > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > *From: *shaik ahamed <[EMAIL PROTECTED]> > *Date: *Fri, 6 Jul 2012 17:09:26 +0530 > *To: *<[EMAIL PROTECTED]> > *ReplyTo: *[EMAIL PROTECTED] > *Subject: *hi all > > *Hi users,* > ** > * As im selecting the distinct column from the vender Hive > table * > ** > *Im getting the below error plz help me in this* > ** > *hive> select distinct supplier from vender_sample;* > > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks not specified. Estimated from input data size: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Kill Command = /usr/local/hadoop/bin/../bin/hadoop job > -Dmapred.job.tracker=md-trngpoc1:54311 -kill job_201207061535_0005 > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 1 > 2012-07-06 17:03:13,978 Stage-1 map = 0%, reduce = 0% > 2012-07-06 17:03:20,001 Stage-1 map = 100%, reduce = 0% > 2012-07-06 17:04:20,248 Stage-1 map = 100%, reduce = 0% > 2012-07-06 17:04:23,262 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201207061535_0005 with errors > Error during job, obtaining debugging information... > Examining task ID: task_201207061535_0005_m_000002 (and more) from job > job_201207061535_0005 > > Task with the most failures(4): > ----- > Task ID: > task_201207061535_0005_r_000000 > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask > MapReduce Jobs Launched: > Job 0: Map: 1 Reduce: 1 HDFS Read: 99143041 HDFS Write: 0 FAIL > Total MapReduce CPU Time Spent: 0 msec > > Regards > shaik. >
-
Re: hi allNitin Pawar 2012-07-06, 12:57
can you tell us
1) how many nodes are there in the cluster? 2) is there any connectivity problems if the # nodes > 3 3) if you have just one slave do you have a higher replication factor? 4) what is the compression you are using for the tables? 5) if you have a dhcp based network, did your slave machines changed the ip? Thanks, Nitin On Fri, Jul 6, 2012 at 6:17 PM, shaik ahamed <[EMAIL PROTECTED]> wrote: > Hi , > > Below is the error,i found in the Job Tracker log file : > > > *Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out* > > Please help me in this ... > > *Thanks in Advance* > > *Shaik.* > > > On Fri, Jul 6, 2012 at 5:22 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > >> ** >> Hi Shaik >> >> There is some error while MR jobs are running. To get the root cause >> please post in the error log from the failed task. >> >> You can browse the Job Tracker web UI and choose the right job Id and >> drill down to the failed tasks to get the error logs. >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> ------------------------------ >> *From: *shaik ahamed <[EMAIL PROTECTED]> >> *Date: *Fri, 6 Jul 2012 17:09:26 +0530 >> *To: *<[EMAIL PROTECTED]> >> *ReplyTo: *[EMAIL PROTECTED] >> *Subject: *hi all >> >> *Hi users,* >> ** >> * As im selecting the distinct column from the vender Hive >> table * >> ** >> *Im getting the below error plz help me in this* >> ** >> *hive> select distinct supplier from vender_sample;* >> >> Total MapReduce jobs = 1 >> Launching Job 1 out of 1 >> Number of reduce tasks not specified. Estimated from input data size: 1 >> In order to change the average load for a reducer (in bytes): >> set hive.exec.reducers.bytes.per.reducer=<number> >> In order to limit the maximum number of reducers: >> set hive.exec.reducers.max=<number> >> In order to set a constant number of reducers: >> set mapred.reduce.tasks=<number> >> Kill Command = /usr/local/hadoop/bin/../bin/hadoop job >> -Dmapred.job.tracker=md-trngpoc1:54311 -kill job_201207061535_0005 >> Hadoop job information for Stage-1: number of mappers: 1; number of >> reducers: 1 >> 2012-07-06 17:03:13,978 Stage-1 map = 0%, reduce = 0% >> 2012-07-06 17:03:20,001 Stage-1 map = 100%, reduce = 0% >> 2012-07-06 17:04:20,248 Stage-1 map = 100%, reduce = 0% >> 2012-07-06 17:04:23,262 Stage-1 map = 100%, reduce = 100% >> Ended Job = job_201207061535_0005 with errors >> Error during job, obtaining debugging information... >> Examining task ID: task_201207061535_0005_m_000002 (and more) from job >> job_201207061535_0005 >> >> Task with the most failures(4): >> ----- >> Task ID: >> task_201207061535_0005_r_000000 >> FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.MapRedTask >> MapReduce Jobs Launched: >> Job 0: Map: 1 Reduce: 1 HDFS Read: 99143041 HDFS Write: 0 FAIL >> Total MapReduce CPU Time Spent: 0 msec >> >> Regards >> shaik. >> > > -- Nitin Pawar
-
hi allshaik ahamed 2012-07-11, 14:39
Hi All,
As i have a data of 100GB in HDFS as i want this 100 gb file to move or copy to the hive directory or path how can i achieve this . Is there any cmd to run this. Please provide me a solution where i can load fast ... Thanks in advance Shaik
-
Re: hi allMohammad Tariq 2012-07-11, 14:45
Try it out using "distcp" command.
Regards, Mohammad Tariq On Wed, Jul 11, 2012 at 8:09 PM, shaik ahamed <[EMAIL PROTECTED]> wrote: > Hi All, > > As i have a data of 100GB in HDFS as i want this 100 gb file to > move or copy to the hive directory or path how can i achieve this . > > Is there any cmd to run this. > > Please provide me a solution where i can load fast ... > > > Thanks in advance > > Shaik
-
Re: hi allBejoy KS 2012-07-11, 14:47
Hi Shaik
If you already have the data in hdfs then just create an External Table with that hdfs location. You'll have the data in your hive table. Or if you want to have a managed table then also it is good use a Load data statement. It'd be faster as well since it is a hdfs move operation under the hood that requires just some change in hdfs metadata. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: shaik ahamed <[EMAIL PROTECTED]> Date: Wed, 11 Jul 2012 20:09:07 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: hi all Hi All, As i have a data of 100GB in HDFS as i want this 100 gb file to move or copy to the hive directory or path how can i achieve this . Is there any cmd to run this. Please provide me a solution where i can load fast ... Thanks in advance Shaik
-
Re: hi allMapred Learn 2012-07-11, 14:48
You can create an external table to make your Data visible in hive.
Sent from my iPhone On Jul 11, 2012, at 7:39 AM, shaik ahamed <[EMAIL PROTECTED]> wrote: > Hi All, > > As i have a data of 100GB in HDFS as i want this 100 gb file to move or copy to the hive directory or path how can i achieve this . > > Is there any cmd to run this. > > Please provide me a solution where i can load fast ... > > > Thanks in advance > > Shaik |