Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Problems while exporting from Hbase to CSV file


Copy link to this message
-
Re: Problems while exporting from Hbase to CSV file
Phoenix, Hive, Pig, Java would all work.
But to Azury Yu's post...

The OP is doing a simple scan() to get rows.
If the OP is hitting an OOM exception then its a code issue on the part of the OP.
On Jun 27, 2013, at 2:22 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:

> Sorry, maybe Phonex is not suitable for you.
>
>
> On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>
>> 1) Scan.setCaching() to specify the number of rows for caching that will
>> be passed to scanners.
>>    and what's your block cache size?
>>
>>    but if OOM from the client, not sever side, then I don't think this is
>> Scan related, please check your client code.
>>
>> 2) we cannot add default value from HBase,  but you can add it on your
>> client when iterate the Result.
>>
>> Also, you can using Phonex, this is cool for your scenario.
>> https://github.com/forcedotcom/phoenix
>>
>>
>>
>> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>> I am trying to export from hbase to a CSV file.
>>> I am using "Scan" class to scan all data  in the table.
>>> But i am facing some problems while doing it.
>>>
>>> 1) My table has around 1.5 million rows  and around 150 columns for each
>>> row , so i can not use default scan() constructor as it will scan whole
>>> table in one go which results in OutOfMemory error in client process.I
>>> heard of using setCaching() and setBatch() but i am not able to understand
>>> how it will solve OOM error.
>>>
>>> I thought of providing startRow and stopRow in scan object but i want to
>>> scan whole table so how will this help ?
>>>
>>> 2) As hbase stores data for a row only when we explicitly provide it and
>>> their is no concept of default value as found in RDBMS , i want to have
>>> each and evey column in the CSV file i generate for every user.In case
>>> column values are not there in hbase , i want to use default  values for
>>> them(I have list of default values for each column). Is there any method
>>> in
>>> Result class or any other class to accomplish this ?
>>>
>>>
>>> Please help here.
>>>
>>> --
>>> Thanks and Regards,
>>> Vimal Jain
>>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB