BTW, we need to consider the case where the result is large in a design
time. In my experience, If we implement this feature, users could use it
with large data.

On Fri, Jul 13, 2018 at 11:51 AM, Sanjay Dasgupta <[EMAIL PROTECTED]
> wrote:

> I prefer 2.b also. Could we use (save*Result*AsTable=people) instead?
>
> There are a few typos in the example note shared:
>
> 1) The line val peopleDF = spark.read.format("zeppelin").load() should
> mention the table name (possibly as argument to load?)
> 2) The python line val peopleDF = z.getTable("people").toPandas() should
> not have the val
>
>
> The z.getTable(<table-name>) method could be a very good tool to judge
> which use-cases are important in the community. It is easy to implement for
> the in-memory data case, and could be very useful for many situations where
> a small amount of data is being transferred across interpreters (like the
> jdbc -> matplotlib case mentioned).
>
> Thanks,
> Sanjay
>
> On Fri, Jul 13, 2018 at 8:07 AM, Jongyoul Lee <[EMAIL PROTECTED]> wrote:
>
>> Yes, it's similar to 2.b.
>>
>> Basically, my concern is to handle all kinds of data. But in your case,
>> it looks like focusing on table data. It's also useful but it would be
>> better to handle all of the data including table or plain text as well.
>> WDYT?
>>
>> About storage, we could discuss it later.
>>
>> On Fri, Jul 13, 2018 at 11:25 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> I think your use case is the same of 2.b.  Personally I don't recommend
>>> to use z.get(noteId, paragraphId) to get the shared data for 2 reasons
>>> 1.  noteId, paragraphId is meaningless, which is not readable
>>> 2. The note will break if we clone it as the noteId is changed.
>>> That's why I suggest to use paragraph property to save paragraph's result
>>>
>>> Regarding the intermediate storage, I also though about it and agree
>>> that in the long term we should provide such layer to support large data,
>>> currently we put the shared data in memory which is not a scalable
>>> solution.  One candidate in my mind is alluxio [1], and regarding the data
>>> format I think apache arrow [2] is another good option for zeppelin to
>>> share table data across interpreter processes and different languages. But
>>> these are all implementation details, I think we can talk about them in
>>> another thread. In this thread, I think we should focus on the user facing
>>> api.
>>>
>>>
>>> [1] http://www.alluxio.org/
>>> [2] https://arrow.apache.org/
>>>
>>>
>>>
>>> Jongyoul Lee <[EMAIL PROTECTED]>于2018年7月13日周五 上午10:11写道:
>>>
>>>> I have a bit different idea to share data.
>>>>
>>>> In my case,
>>>>
>>>> It would be very useful to get a paragraph's result as an input of
>>>> other paragraphs.
>>>>
>>>> e.g.
>>>>
>>>> -- Paragrph 1
>>>> %jdbc
>>>> select * from some_table;
>>>>
>>>> -- Paragraph 2
>>>> %spark
>>>> val rdd = z.get("noteId", "paragraphId").parse.makeRddByMyself
>>>> spark.read(table).select....
>>>>
>>>> If paragraph 1's result is too big to show on FE, it would be saved in
>>>> Zeppelin Server with proper way and pass to SparkInterpreter when Paragraph
>>>> 2 is executed.
>>>>
>>>> Basically, I think we need to intermediate storage to store paragraph's
>>>> results to share them. We can introduce another layer or extend
>>>> NotebootRepo. In some cases, we might change notebook repos as well.
>>>>
>>>> JL
>>>>
>>>>
>>>>
>>>> On Fri, Jul 13, 2018 at 10:39 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi Folks,
>>>>>
>>>>> Recently, there's several tickets [1][2][3] about sharing data in
>>>>> zeppelin.
>>>>> Zeppelin's goal is to be an unified data analyst platform which could
>>>>> integrate most of the big data tools and help user to switch between
>>>>> tools
>>>>> and share data between tools easily. So sharing data is a very
>>>>> critical and
>>>>> killer feature of Zeppelin IMHO.
>>>>>
>>>>> I raise this ticket to discuss about the scenario of sharing data and
>>>>> how
>>>>> to do that. Although zeppelin already provides tools and api to share
이종열, Jongyoul Lee, 李宗烈
http://madeng.net
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB