Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Restrictions on Tables in Map side join in Hive


Copy link to this message
-
Restrictions on Tables in Map side join in Hive
Hi,

 I have 2 tables:

hive> describe extended idtablerc;
 

id    string  from deserializer

                

Detailed Table Information      Table(tableName:idtablerc,
dbName:default, owner:viraj, createTime:1277418576, lastAccessTime:0,
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:yuid,
type:string, comment:null)], location:hdfs://nn1/projects/idtablerc,
inputFormat:org.apache.hadoop.hive.ql.io.RCFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.RCFileOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe,
parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
parameters:{}), partitionKeys:[],
parameters:{transient_lastDdlTime=1277418576}, viewOriginalText:null,
viewExpandedText:null, tableType:MANAGED_TABLE)

Time taken: 0.414 seconds

 

 

hive> describe extended t2;        

OK

c1      int

c2      string

                

Detailed Table Information      Table(tableName:t2, dbName:default,
owner:viraj, createTime:1277334757, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:c1, type:int, comment:null),
FieldSchema(name:c2, type:string, comment:null)],
location:hdfs://nn1/user/viraj/t2table,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
parameters:{}), partitionKeys:[],
parameters:{EXTERNAL=TRUE,transient_lastDdlTime=1277334757},
viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)

Time taken: 0.244 seconds

 

 

I try to join these tables as follows:

 

This works:

 

select /*+ MAPJOIN(t2) */ t2.c1, t2.c2 from t2 join idtablerc on (t2.c2
= idtablerc.id);

 

 

This fails : Caused by: java.io.IOException: java.io.EOFException

            at
org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExtern
al(MapJoinObjectValue.java:109)

 

 

select /*+ MAPJOIN(idtablerc) */ t2.c1, t2.c2 from t2 join idtablerc on
(t2.c2 = idtablerc.id);

 

Both tables are less than 1 MB in size.

Are there some restrictions on table types?

 

 

Viraj

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB