Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: tar or hadoop archive


+
Manhee Jo 2011-07-07, 01:52
+
Rita 2011-06-27, 10:06
+
Joey Echeverria 2011-06-27, 14:10
+
Rita 2011-06-27, 23:36
Copy link to this message
-
Re: tar or hadoop archive
Yes, you can see a picture describing HAR files in this old blog post:

http://www.cloudera.com/blog/2009/02/the-small-files-problem/

-Joey

On Mon, Jun 27, 2011 at 4:36 PM, Rita <[EMAIL PROTECTED]> wrote:
> So, it does an index of the file?
>
>
>
> On Mon, Jun 27, 2011 at 10:10 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
>
>> The advantage of a hadoop archive files is it lets you access the
>> files stored in it directly. For example, if you archived three files
>> (a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
>> of the three files using the hadoop command line:
>>
>> hadoop fs -cat har:///user/joey/out/foo.har/a.txt
>>
>> You can also copy files out of the archive or use files in the archive
>> as input to map reduce jobs.
>>
>> -Joey
>>
>> On Mon, Jun 27, 2011 at 3:06 AM, Rita <[EMAIL PROTECTED]> wrote:
>> > We use hadoop/hdfs to archive data. I archive a lot of file by creating
>> one
>> > large tar file and then placing to hdfs. Is it better to use hadoop
>> archive
>> > for this or is it essentially the same thing?
>> >
>> > --
>> > --- Get your facts first, then you can distort them as you please.--
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB