Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Problems while exporting from Hbase to CSV file


+
Vimal Jain 2013-06-27, 07:11
+
Azuryy Yu 2013-06-27, 07:21
+
Azuryy Yu 2013-06-27, 07:22
+
Michael Segel 2013-06-27, 22:32
+
Anoop John 2013-06-28, 12:23
Copy link to this message
-
Re: Problems while exporting from Hbase to CSV file
Michael Segel 2013-06-28, 12:45
Yeah, that's the point.

You fetch, you iterate through the returned set, you get the next batch.
The only way he could get OOM is in his code.
On Jun 28, 2013, at 7:23 AM, Anoop John <[EMAIL PROTECTED]> wrote:

>> so i can not use default scan() constructor as it will scan whole
> table in one go which results in OutOfMemory error in client process
>
> Not getting what you mean by this. Client calls next() on the Scanner and
> gets the rows. The setCaching() and setBatch() determines how much of data
> (rows, cells) will get retrieved from RS to client in one next() call to
> server.  So if caching is set as 100 you will be having 100 rows in the
> ClientScanner cache. Which version you are using? In older versions the
> caching default value was 1 only. Later it is changed to 100 .
>
>
> -Anoop-
>
> On Fri, Jun 28, 2013 at 4:02 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
>
>> Phoenix, Hive, Pig, Java would all work.
>> But to Azury Yu's post...
>>
>> The OP is doing a simple scan() to get rows.
>> If the OP is hitting an OOM exception then its a code issue on the part of
>> the OP.
>>
>>
>> On Jun 27, 2013, at 2:22 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>>
>>> Sorry, maybe Phonex is not suitable for you.
>>>
>>>
>>> On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>>>
>>>> 1) Scan.setCaching() to specify the number of rows for caching that will
>>>> be passed to scanners.
>>>>   and what's your block cache size?
>>>>
>>>>   but if OOM from the client, not sever side, then I don't think this
>> is
>>>> Scan related, please check your client code.
>>>>
>>>> 2) we cannot add default value from HBase,  but you can add it on your
>>>> client when iterate the Result.
>>>>
>>>> Also, you can using Phonex, this is cool for your scenario.
>>>> https://github.com/forcedotcom/phoenix
>>>>
>>>>
>>>>
>>>> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi,
>>>>> I am trying to export from hbase to a CSV file.
>>>>> I am using "Scan" class to scan all data  in the table.
>>>>> But i am facing some problems while doing it.
>>>>>
>>>>> 1) My table has around 1.5 million rows  and around 150 columns for
>> each
>>>>> row , so i can not use default scan() constructor as it will scan whole
>>>>> table in one go which results in OutOfMemory error in client process.I
>>>>> heard of using setCaching() and setBatch() but i am not able to
>> understand
>>>>> how it will solve OOM error.
>>>>>
>>>>> I thought of providing startRow and stopRow in scan object but i want
>> to
>>>>> scan whole table so how will this help ?
>>>>>
>>>>> 2) As hbase stores data for a row only when we explicitly provide it
>> and
>>>>> their is no concept of default value as found in RDBMS , i want to have
>>>>> each and evey column in the CSV file i generate for every user.In case
>>>>> column values are not there in hbase , i want to use default  values
>> for
>>>>> them(I have list of default values for each column). Is there any
>> method
>>>>> in
>>>>> Result class or any other class to accomplish this ?
>>>>>
>>>>>
>>>>> Please help here.
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Vimal Jain
>>>>>
>>>>
>>>>
>>
>>