Hi All Kudu developers,
I have been using kudu in projects. It’s been amazing. A few projects have
recently posted requirements on how to use Kudu store large binary files
(images, documents, etc). We used to propose Kudu + HDFS (or other file
system before) as a workaround but it is really a good solution. The main
scenario of the needs are
1). Use Kudu as the only storage layer. As we are storing larger amount of
data and growing the kudu cluster, the kudu cluster should support both
structured and unstructured data to avoid managing another storage tier for
images or documents.
2). It’s be great to simply the architecture from business application
point of view to have a single data access layer (either in Impala/Spark
SQL level, or at kudu API level) to manage business data object or entity
and its related images/documents.
We are thinking to maybe to find ways to extend Kudu to support large
files, either through the current Binary data type, which there are size
limitations (64K) due to known issues, or maybe introduce new data type
like BLOB for storing images or documents that have sizes from a few
hundred KBs to a few MBs, or extend Kudu API to store the files into a file
system (which might be more suitable for even larger files). Many
relational DB or NoSQL DB have different levels of support, or different
design, like HBase, Cassandra, MapR-DB etc.
I’d like ask your feedback or opinions:
1). Do you have a need to store larger content (like image or documents)
into Kudu (in MBs level)?
2). Do you have any opinions on storing the large content inside the
database or in file system?
Much appreciated your comments. Thanks!