Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Large Files in Column Qualifier


Copy link to this message
-
RE: Large Files in Column Qualifier
Vladimir Rodionov 2013-09-21, 16:38
HBase is not  a file storage. It was not designed to be a file storage. Depending on your usage pattern I would suggest you another approach:

Store your files in a large "upload bundles"  on HDFS. You will need a collector(s) process for that.  Store references (Upload file name, offset and size)
in HBase.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Geovanie Marquez [[EMAIL PROTECTED]]
Sent: Saturday, September 21, 2013 6:05 AM
To: [EMAIL PROTECTED]
Subject: Large Files in Column Qualifier

I am evaluating an HBase design that would require that it rarely house a
1GB file in the column qualifier. Files range from 1GB - 1KB. These files
are raw files being ingested from clients and to be kept for some period of
time (several years) for quality control purposes. The application does not
depend on these files being in HBase, the files would be used by QA
personnel for data forensics to find out why data behaved unexpectedly in
the app or in our QC processes. That being said a lot of the reasons I've
read for not maintaining the data in HBase doesn't apply: compaction
storms, or performance degradation, since we can throttle how we place the
data in here.

I'd like to use HBase because it offers potential for indexing the data
later and potential for total data population analysis over solutions
involving HDFS as well as the use case where we receive tiny KB files more
often than not which would contribute to the Namenodes memory restrictions.
I could HAR these in HDFS but then indexing and more flexible options for
data analysis go out the window.

Does anyone see some glaring oversight I may be making in this design
consideration?

Thanks for your time.

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.