|
Johnny Zhang
2012-11-03, 02:56
Mark Grover
2012-11-04, 04:59
Johnny Zhang
2012-11-09, 23:08
Johnny Zhang
2012-11-09, 23:10
Mark Grover
2012-11-10, 04:18
|
-
Review Request: implement a udf to keep hive session alive for certain amount of timeJohnny Zhang 2012-11-03, 02:56
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7848/ ----------------------------------------------------------- Review request for hive. Description ------- To make testing issues like HIVE-3590 convenient, we can implement a UDF to keep hive session alive for a given time. The patch introduce a new UDF sleep() which does this without introducing any data/load to cluster. This addresses bug HIVE-3666. https://issues.apache.org/jira/browse/HIVE-3666 Diffs ----- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 1405251 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java PRE-CREATION Diff: https://reviews.apache.org/r/7848/diff/ Testing ------- have tested it with Hive CLI and Hive Server session, and it can keep them alive by the given seconds Thanks, Johnny Zhang
-
Re: Review Request: implement a udf to keep hive session alive for certain amount of timeMark Grover 2012-11-04, 04:59
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7848/#review13082 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28160> I don't have a strong opinion about this but is sleep the right name for this UDF? Sleep is how this UDF keeps the Hive session alive but it might not convey to a user what this UDF does. How about something like session_keep_alive? I am open to other suggestions as well. Again, not a deal-breaker:-) However, if you do decide to change the name, don't forget to change all references of "sleep" in the code (log statements, exception messages, etc.). http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28166> I am not too picky about this but is sleep the right name for this UDF? Sleep is how are keeping the session alive but would it be clear to a user using it by just seeing the name what this UDF does? How about something like session_keep_alive? If you do decide to change the name, make sure all references in the code (exception messages, etc.) to "sleep" or similar are updated http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28161> Specify in the explain statement what the units of the duration being specified are (seconds?) http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28167> Please specify the units for the duration being passed as the argument here (seconds?) http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28162> Better to use GenericUDFSleep.class as argument http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28168> Consider passing GenericUDFSleep.class as the argument instead. Given the present code, if you happen to change the name of class to something and forgot to change this String, this code will compile and logger would still emit out the old name. If you use *.class argument, the code wouldn't compile until you change the * part of the argument. http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28163> I am being nitpicky here but a better exception to throw here would be: UDFArgumentTypeException. Also, when seeing an error message as a user, it's always nice to contrast the actual vs. expected. Here is the expected type is int but it will nice to print out the type of the argument that the UDF received. You can retrieve by arguments[i].getTypeName() http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28169> 1. A better exception to throw here is UDFArgumentLengthException 2. It's always nice to see as a user what was the expected and the actual value when something goes wrong. Consider printing out the type of the argument received in the exception message. This type can be retrieved by arguments[0].getTypeName() http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28159> The UDF is returning a Map<Int, Int> even though you don't really want to return anything. I think you should use a void object inspector. For details, look at http://svn.apache.org/viewvc/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java?view=markup http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28170> The UDF is returning a Map<Int, Int> when it really shouldn't be returning anything. I think void object inspector is what you are looking for. For details, see http://svn.apache.org/viewvc/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java?view=markup http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28165> Consider using an ObjectInspectorConverter for reading int instead of paying the string parsing overhead. For reference, look at http://svn.apache.org/viewvc/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java?view=markup http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28171> Better to use ObjectInspectorConverter to avoid the string parsing penalty. For reference, take a look at how this UDF reads an integer argument: http://svn.apache.org/viewvc/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java?view=markup http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28172> Since the argument is being divided by 2 in line 62, does this mean we are only sleeping for received_argument/2 seconds? Am I missing something? http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUD
-
Re: Review Request: implement a udf to keep hive session alive for certain amount of timeJohnny Zhang 2012-11-09, 23:08
> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote: > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java, line 36 > > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line36> > > > > I don't have a strong opinion about this but is sleep the right name for this UDF? Sleep is how this UDF keeps the Hive session alive but it might not convey to a user what this UDF does. How about something like session_keep_alive? I am open to other suggestions as well. > > > > Again, not a deal-breaker:-) However, if you do decide to change the name, don't forget to change all references of "sleep" in the code (log statements, exception messages, etc.). the reason give a name 'sleep' is because Hadoop used to have a similar example job http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-examples/0.20.2-737/org/apache/hadoop/examples/SleepJob.java which does nothing but keeping running a MR job, which does nothing. Let's see what's other people's opinion > On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote: > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java, line 37 > > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line37> > > > > Specify in the explain statement what the units of the duration being specified are (seconds?) agree, will fix it > On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote: > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java, line 41 > > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line41> > > > > Better to use GenericUDFSleep.class as argument agree, will fix it > On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote: > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java, line 52 > > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line52> > > > > 1. A better exception to throw here is UDFArgumentLengthException > > 2. It's always nice to see as a user what was the expected and the actual value when something goes wrong. Consider printing out the type of the argument received in the exception message. This type can be retrieved by arguments[0].getTypeName() agree that need to use 'arguments[0].getTypeName()' to print out what's the argument type of user input, will fix it. > On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote: > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java, line 52 > > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line52> > > > > I am being nitpicky here but a better exception to throw here would be: UDFArgumentTypeException. Also, when seeing an error message as a user, it's always nice to contrast the actual vs. expected. Here is the expected type is int but it will nice to print out the type of the argument that the UDF received. You can retrieve by arguments[i].getTypeName() agree, will fix it > On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote: > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java, line 55 > > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line55> > > > > The UDF is returning a Map<Int, Int> even though you don't really want to return anything. I think you should use a void object inspector. For details, look at http://svn.apache.org/viewvc/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java?view=markup agree, will change it to return PrimitiveObjectInspectorFactory.writableStringObjectInspector since will print message in the end > On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote: > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java, line 62 agree, will fix it. yes, that's the reason. Just want to print something so that people know what's going on, especially when it sleeps for a while. agree, will fix it the reason I divide it by 2 is because I found it the evaluate() function run before and during the MR job. As you suggested offline, after adding annotation @UDFType(deterministic = false), I am be able to make it run only during the MR job, so divide by 2 is not needed anymore. Thanks. - Johnny This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7848/#review13082 On Nov. 3, 2012, 2:56 a.m., Johnny Zhang wrote:
-
Re: Review Request: implement a udf to keep hive session alive for certain amount of timeJohnny Zhang 2012-11-09, 23:10
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7848/ ----------------------------------------------------------- (Updated Nov. 9, 2012, 11:10 p.m.) Review request for hive. Changes ------- I update the diff to address Mark's comments. Thanks, Mark. Description ------- To make testing issues like HIVE-3590 convenient, we can implement a UDF to keep hive session alive for a given time. The patch introduce a new UDF sleep() which does this without introducing any data/load to cluster. This addresses bug HIVE-3666. https://issues.apache.org/jira/browse/HIVE-3666 Diffs (updated) ----- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 1405251 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java PRE-CREATION Diff: https://reviews.apache.org/r/7848/diff/ Testing ------- have tested it with Hive CLI and Hive Server session, and it can keep them alive by the given seconds Thanks, Johnny Zhang
-
Re: Review Request: implement a udf to keep hive session alive for certain amount of timeMark Grover 2012-11-10, 04:18
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7848/#review13318 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java <https://reviews.apache.org/r/7848/#comment28583> This might be confusing to some users. The method may call Thread.sleep but the UDF's argument is in seconds. So perhaps get rid "long millis" and mention that the UDF argument is in seconds - Mark Grover On Nov. 9, 2012, 11:10 p.m., Johnny Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/7848/ > ----------------------------------------------------------- > > (Updated Nov. 9, 2012, 11:10 p.m.) > > > Review request for hive. > > > Description > ------- > > To make testing issues like HIVE-3590 convenient, we can implement a UDF to keep hive session alive for a given time. The patch introduce a new UDF sleep() which does this without introducing any data/load to cluster. > > > This addresses bug HIVE-3666. > https://issues.apache.org/jira/browse/HIVE-3666 > > > Diffs > ----- > > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 1405251 > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java PRE-CREATION > > Diff: https://reviews.apache.org/r/7848/diff/ > > > Testing > ------- > > have tested it with Hive CLI and Hive Server session, and it can keep them alive by the given seconds > > > Thanks, > > Johnny Zhang > > |