Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Re: sqoop job


Hi Jay, this use-case seems to be beyond the scope of Sqoop, which is meant
to just transfer data between a structured datastore and Hadoop. Including
[EMAIL PROTECTED] to solicit more opinions.

Regards, Kate

On Mon, Apr 22, 2013 at 11:04 PM, jaikumar krishna <[EMAIL PROTECTED]>wrote:

> Thanks Kate,
>
> My use case ::i am  doing  to do .
> I have two table of inputs Table1 and Table 2 . In Table 1(like master) i
> have *"25 lakhs" *records of  "*company name , address, city,state ,zip
> ,phone nember, fax ,Mailid,company website url* ".
>
> In Table_2   i have " *5 lakhs"* records of  *company name , address,
> city,state ,zip ,phone nember, fax ,Mailid,company website url* like
> Table1. i want to check Table2 recods match with Table1 for verifying
> (whether it's correct or not ).
>
> Before matching i have to put normalization's like below
>
> *Company name                                      Normalized _Company
> name*
> Century Tool & Gage           becomes   Century Tool and Gage
> News-Gazette Printing Co     =>            News Gazette Printing
> Punch Networks Inc               =>           Punch Networks
> Omni Print Inc                       =>            Omni Print
>
> for Address_1 column
> *Address_1             =>       Address_1_Normalized*
> 15 Sproat St          =>        15 Sproat Street
> 1 Preble Rd          =>        1 Preble Road
> 90 Everett Ave      =>        90 Everett Avenue
>
> Kindly check for attached excel sheet for* normalization of remaining
> fields *..(Both tables normalized before verifying )
>
> Then i have some condition for result accuracy by score those entities by
> matching
>
> *1.company name == 100 and  (address == 100 or phone number == 100) ) *
> * 2. ( company name>=75 and  address >=75  and city == 100  and  state => 100 )*
>
> if any anyone satisfies i can put its verified one.
>
> in another case
>  *if company name and phone number did not matched with  Table1 which
> means i can add it in new entity (which means its not in Ttable1)*
>
> i have attached sample records of Table1 and table 2 and my current output
> (which includes scores of my current process without hadoop takes more and
> more time)
>
>
> i hope you  understand my usecase.
>
> The main problem is how can i compare each row having  6 fields (comp
> name, city ,street,state ,phone .mailid) with another table and get score
> and finally get max... i am totally frustrated. ...
>
> Thanks,
> Jay'
>
>
> On Tue, Apr 23, 2013 at 4:49 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote:
>
>> Hi Jay, can you share your use-case behind verifying the table in
>> Sqoop rather than in HDFS? Generally speaking, you can verify if the
>> table transferred successfully by inspecting the file's contents via
>> issuing $ hadoop fs -cat <tablename>/part-m-00000
>>
>> You can also verify the return value from the Sqoop command ($ echo
>> $?), which should be 0.
>>
>> Regards, Kate
>>
>>
>> On Monday, April 22, 2013 10:19:20 AM UTC-7, jaikumar krishna wrote:
>>>
>>> hi,
>>>     how can i find the table is moved successfully  or not in sqoop(not
>>> in hdfs) ?
>>>
>>> Thanks,
>>> Jay'
>>>
>>  --
>>
>>
>>
>>
>
>  --
>
>
>
>