Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Problems while exporting from Hbase to CSV file


Copy link to this message
-
Re: Problems while exporting from Hbase to CSV file
Phoenix, Hive, Pig, Java would all work.
But to Azury Yu's post...

The OP is doing a simple scan() to get rows.
If the OP is hitting an OOM exception then its a code issue on the part of the OP.
On Jun 27, 2013, at 2:22 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:

> Sorry, maybe Phonex is not suitable for you.
>
>
> On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>
>> 1) Scan.setCaching() to specify the number of rows for caching that will
>> be passed to scanners.
>>    and what's your block cache size?
>>
>>    but if OOM from the client, not sever side, then I don't think this is
>> Scan related, please check your client code.
>>
>> 2) we cannot add default value from HBase,  but you can add it on your
>> client when iterate the Result.
>>
>> Also, you can using Phonex, this is cool for your scenario.
>> https://github.com/forcedotcom/phoenix
>>
>>
>>
>> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>> I am trying to export from hbase to a CSV file.
>>> I am using "Scan" class to scan all data  in the table.
>>> But i am facing some problems while doing it.
>>>
>>> 1) My table has around 1.5 million rows  and around 150 columns for each
>>> row , so i can not use default scan() constructor as it will scan whole
>>> table in one go which results in OutOfMemory error in client process.I
>>> heard of using setCaching() and setBatch() but i am not able to understand
>>> how it will solve OOM error.
>>>
>>> I thought of providing startRow and stopRow in scan object but i want to
>>> scan whole table so how will this help ?
>>>
>>> 2) As hbase stores data for a row only when we explicitly provide it and
>>> their is no concept of default value as found in RDBMS , i want to have
>>> each and evey column in the CSV file i generate for every user.In case
>>> column values are not there in hbase , i want to use default  values for
>>> them(I have list of default values for each column). Is there any method
>>> in
>>> Result class or any other class to accomplish this ?
>>>
>>>
>>> Please help here.
>>>
>>> --
>>> Thanks and Regards,
>>> Vimal Jain
>>>
>>
>>