Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: Bucketing broken in hive 0.9.0?


Copy link to this message
-
Re: Bucketing broken in hive 0.9.0?
I confirmed on hive 0.7.0 and hive 0.9.0 In non local mode query
creates three output tables.

I opened:
https://issues.apache.org/jira/browse/HIVE-3083

Because the unit testing uses a local mode likely the tests are
retuning false positives.

Edward
On 6/4/12, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> How come only a single output file is being produced here? Shouldnt
> this bucking produce 3 files? LOCAL MODE BTW
>
> [edward@tablitha hive-0.9.0-bin]$ bin/hive
> hive> create table numbersflat(number int);
> hive> load data local inpath '/home/edward/numbers' into table numbersflat;
> Copying data from file:/home/edward/numbers
> Copying file: file:/home/edward/numbers
> Loading data to table default.numbersflat
> OK
> Time taken: 0.288 seconds
> hive> select * from numbersflat;
> OK
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> Time taken: 0.274 seconds
> hive> CREATE TABLE numbers_bucketed(number int,number1 int) CLUSTERED
> BY (number) INTO 3 BUCKETS;
> OK
> Time taken: 0.082 seconds
> hive> set hive.enforce.bucketing = true;
> hive> set hive.exec.reducers.max = 200;
> hive> set hive.merge.mapfiles=false;
> hive>
>     > insert OVERWRITE table numbers_bucketed select number,number+1
> as number1 from numbersflat;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 3
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> 12/06/04 00:50:35 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
> Execution log at:
> /tmp/edward/edward_20120604005050_e17eb952-af76-4cf3-aee1-93bd59e74517.log
> Job running in-process (local Hadoop)
> Hadoop job information for null: number of mappers: 0; number of reducers:
> 0
> 2012-06-04 00:50:47,938 null map = 0%,  reduce = 0%
> 2012-06-04 00:50:48,940 null map = 100%,  reduce = 0%
> 2012-06-04 00:50:49,942 null map = 100%,  reduce = 100%
> Ended Job = job_local_0001
> Execution completed successfully
> Mapred Local Task Succeeded . Convert the Join into MapJoin
> Loading data to table default.numbers_bucketed
> Deleted file:/user/hive/warehouse/numbers_bucketed
> Table default.numbers_bucketed stats: [num_partitions: 0, num_files:
> 1, num_rows: 10, total_size: 43, raw_data_size: 33]
> OK
> Time taken: 16.722 seconds
> hive> dfs -ls /user/hive/warehouse/numbers_bucketed;
> Found 1 items
> -rwxrwxrwx   1 edward edward         43 2012-06-04 00:50
> /user/hive/warehouse/numbers_bucketed/000000_0
> hive> dfs -ls /user/hive/warehouse/numbers_bucketed/000000_0;
> Found 1 items
> -rwxrwxrwx   1 edward edward         43 2012-06-04 00:50
> /user/hive/warehouse/numbers_bucketed/000000_0
> hive> cat /user/hive/warehouse/numbers_bucketed/000000_0;
> FAILED: Parse Error: line 1:0 cannot recognize input near 'cat' '/' 'user'
>
> hive> dfs -cat /user/hive/warehouse/numbers_bucketed/000000_0;
> 1 2
> 2 3
> 3 4
> 4 5
> 5 6
> 6 7
> 7 8
> 8 9
> 9 10
> 10 11
> hive>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB