Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Accessing Table Properies from InputFormat


Copy link to this message
-
Re: Accessing Table Properies from InputFormat
On Tue, May 28, 2013 at 9:27 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> The question we are diving into is how much of hive is going to be
> designed around edge cases? Hive really was not made for columnar formats,
> or self describing data-types. For the most part it handles them fairly
> well.
>

I don't view columnar or self describing data-types as an edge case. I
think in a couple years, the various columnar stores (ORC, Parquet, or new
ones) and text will be the primary formats. Given the performance advantage
of binary formats, text should only be used for staging tables.
> I am not sure what I believe about refactoring all of hive's guts. How
> much refactoring and complexity are we going to add to support special
> cases? I do not think we can justify sweeping API changes for the sake of
> one new input format, or something that can be done in some other way.
>

The problem is actually, much bigger. We have a wide range of nested
abstractions for input/output that all interact in various ways.

org.apache.hadoop.mapred.InputFormat
org.apache.hadoop.hive.ql.io.HiveInputFormat
org.apache.hadoop.hive.ql.meta.HiveStorageHandler
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
org.apache.hadoop.hive.serde2.SerDe

I would suggest that there is a lot of confusion about the current state of
what is allowed and what will break things. Furthermore, because critical
functionality like accessing table properties, partition properties,
columnar projection, and predicate pushdown has been added incrementally,
it isn't clear at all how to users what is available and how to take
advantage of them.

-- Owen
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB