Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Re: sqoop job


[Including [EMAIL PROTECTED]]
On Tue, Apr 23, 2013 at 12:21 PM, Kathleen Ting <[EMAIL PROTECTED]> wrote:

> Hi Jay, this use-case seems to be beyond the scope of Sqoop, which is
> meant to just transfer data between a structured datastore and
> Hadoop. Including [EMAIL PROTECTED] to solicit more opinions.
>
> Regards, Kate
>
>
> On Mon, Apr 22, 2013 at 11:04 PM, jaikumar krishna <[EMAIL PROTECTED]>wrote:
>
>> Thanks Kate,
>>
>> My use case ::i am  doing  to do .
>> I have two table of inputs Table1 and Table 2 . In Table 1(like master)
>> i have *"25 lakhs" *records of  "*company name , address, city,state
>> ,zip ,phone nember, fax ,Mailid,company website url* ".
>>
>> In Table_2   i have " *5 lakhs"* records of  *company name , address,
>> city,state ,zip ,phone nember, fax ,Mailid,company website url* like
>> Table1. i want to check Table2 recods match with Table1 for verifying
>> (whether it's correct or not ).
>>
>> Before matching i have to put normalization's like below
>>
>> *Company name                                      Normalized _Company
>> name*
>> Century Tool & Gage           becomes   Century Tool and Gage
>> News-Gazette Printing Co     =>            News Gazette Printing
>> Punch Networks Inc               =>           Punch Networks
>> Omni Print Inc                       =>            Omni Print
>>
>> for Address_1 column
>> *Address_1             =>       Address_1_Normalized*
>> 15 Sproat St          =>        15 Sproat Street
>> 1 Preble Rd          =>        1 Preble Road
>> 90 Everett Ave      =>        90 Everett Avenue
>>
>> Kindly check for attached excel sheet for* normalization of remaining
>> fields *..(Both tables normalized before verifying )
>>
>> Then i have some condition for result accuracy by score those entities by
>> matching
>>
>> *1.company name == 100 and  (address == 100 or phone number == 100) ) *
>> * 2. ( company name>=75 and  address >=75  and city == 100  and  state
>> == 100 )*
>>
>> if any anyone satisfies i can put its verified one.
>>
>> in another case
>>  *if company name and phone number did not matched with  Table1 which
>> means i can add it in new entity (which means its not in Ttable1)*
>>
>> i have attached sample records of Table1 and table 2 and my current
>> output (which includes scores of my current process without hadoop takes
>> more and more time)
>>
>>
>> i hope you  understand my usecase.
>>
>> The main problem is how can i compare each row having  6 fields (comp
>> name, city ,street,state ,phone .mailid) with another table and get score
>> and finally get max... i am totally frustrated. ...
>>
>> Thanks,
>> Jay'
>>
>>
>> On Tue, Apr 23, 2013 at 4:49 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote:
>>
>>> Hi Jay, can you share your use-case behind verifying the table in
>>> Sqoop rather than in HDFS? Generally speaking, you can verify if the
>>> table transferred successfully by inspecting the file's contents via
>>> issuing $ hadoop fs -cat <tablename>/part-m-00000
>>>
>>> You can also verify the return value from the Sqoop command ($ echo
>>> $?), which should be 0.
>>>
>>> Regards, Kate
>>>
>>>
>>> On Monday, April 22, 2013 10:19:20 AM UTC-7, jaikumar krishna wrote:
>>>>
>>>> hi,
>>>>     how can i find the table is moved successfully  or not in sqoop(not
>>>> in hdfs) ?
>>>>
>>>> Thanks,
>>>> Jay'
>>>>
>>>  --
>>>
>>>
>>>
>>>
>>
>>  --
>>
>>
>>
>>
>
>