Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Review Request 16812: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format


Copy link to this message
-
Re: Review Request 16812: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format


> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> >

Would it be ok to have an interface and an abstract class representing the intermediate data format? This seems like a developer experience thing. Interfaces seem to define an API where as an abstract class provides some implemention. An example of this is the Collection interface and the AbstractCollection class.
> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java, line 111
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line111>
> >
> >     Won't work with: 1,"Hi, here is Jarcec"
> >    
> >     Potentially use apache commons csv?
>
> Jarek Cecho wrote:
>     I'm afraid that Apache Commons CSV do not have any official build that we can use. Is there any light other CSV parsing library?

Yikes. I didn't catch that... folks on the interwebz seem open to using opencsv.
> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java, line 113
> > <https://reviews.apache.org/r/16812/diff/2/?file=421141#file421141line113>
> >
> >     "Can we make this a Class instead of String?"
> >    
> >     It seems we want a string representation of the classname so that we can pass it to different contexts during the execution of the job?
>
> Jarek Cecho wrote:
>     We are using "Class" for this purpose on other places and it seems more natural then string. Having class instead of string will also ensure that the class is indeed available in the JVM (e.g. fail sooner rather then later).

Hmm I think I'm missing something. What do you mean make it a class? It's included in the form of a jar and the full class name (org.apache...) is the string we're passing right? That way we can build an instance of this class when the ETL framework is running.

Is there another way to do it?
> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java, lines 313-315
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line313>
> >
> >     The conditional escaped argument seems quite dangerous to me - what if the user has saved the data on disk with escaped = true and is using this class to read the data from disk? The default value is false, so this will remove some actual data, right?
>
> Abraham Elmahrek wrote:
>     In the case of CSV, do you feel that it should always be escaped? That does make sense so that we don't run into any issues, but it also would require the consumer of the imported data to always unescape.
>
> Jarek Cecho wrote:
>     The intermediate data format should be very strict without much configuration options to make it as fast as possible. Having said that serialization format on disk is a different question.

It seems if we remove "escaped", we can escape all data without fear. The output format should be able to unescape later.
- Abraham
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16812/#review31600
-----------------------------------------------------------
On Jan. 13, 2014, 1:17 a.m., Abraham Elmahrek wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16812/
> -----------------------------------------------------------
>
> (Updated Jan. 13, 2014, 1:17 a.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
>
>
> Repository: sqoop-sqoop2
>
>
> Description
> -------
>
> Previous review from hari is in first iteration of this review only.
>
> commit 1c5122611fbd4a46a629421f7c55746cfc14f136
> Author: Hari Shreedharan <[EMAIL PROTECTED]>
> Date:   Fri Jan 10 15:07:08 2014 -0800