Thanks for your response.
My structure is like:
CREATE EXTERNAL TABLE test_textfile (
) PARTITIONED BY (
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 's3://test/textfile/';
Using block level compression and bzip2codec for output.
b) With the above set of columns, just i have changed as STORED AS ORC for
creating ORC. Not using any compression option
c)Inserted 7256852 records in both the tables
d)Space occupied in S3:
Storing as ORC(3 files):153.4MB *3=460.2MB
TEXT(single file in bz2 format)=306MB
I need to check ORC with compression enabled.
Please let me know, if i miss anything.
On Mon, Aug 12, 2013 at 8:50 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> I've never seen a table that was larger with ORC than with text. Can you
> share your text's file schema with us? Is the table very small? How many
> rows and GB are the tables? The overhead for ORC is typically small, but as
> Ed says it is possible for rare cases for the overhead to dominate the data
> size itself.
> -- Owen
> On Mon, Aug 12, 2013 at 6:52 AM, pandees waran <[EMAIL PROTECTED]> wrote:
>> Thanks Edward. I shall try compression besides orc and let you know. And
>> also, it looks like the cpu usage is lesser while querying orc rather
>> than text file.
>> But the total time taken by the query time is slightly more in orc than
>> text file. Could you please explain the difference between cumulative cpu
>> time and the total time taken (usually in last line in terms or secs)?
>> Which one should we give preference?
>> On Aug 12, 2013 7:01 PM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote:
>>> Colmnar formats do not always beat row wise storage. Many times gzip
>>> plus block storage will compress something better then columnar storage
>>> especially when you have repeated data in different columns.
>>> Based on what you are saying it could be possible that you missed a
>>> setting and the ocr are not compressed.
>>> On Monday, August 12, 2013, pandees waran <[EMAIL PROTECTED]> wrote:
>>> > Hi,
>>> > Currently, we use TEXTFILE format in hive 0.8 ,while creating the
>>> > external tables in intermediate processing .
>>> > I have read about ORC in 0.11. I have created the same table in 0.11
>>> > with ORC format.
>>> > Without any compression, the ORC file(totally 3 files) occupied the
>>> > space twice more than the TEXTFILE(only one file).
>>> > Even, when i query the data from ORC:
>>> > Select count(*) from orc_table
>>> > It took more time than the same query against textfile.