git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Long-term goal of making flink-table Scala-free


Hi all,

I also think about this problem these days and here are my thoughts.

1) We must admit that it’s really a tough task to interoperate with Java and Scala. E.g., they have different collection types (Scala collections v.s. java.util.*) and in Java, it's hard to implement a method which takes Scala functions as parameters. Considering the major part of the code base is implemented in Java, +1 for this goal from a long-term view.

2) The ideal solution would be to just expose a Scala API and make all the other parts Scala-free. But I am not sure if it could be achieved even in a long-term. Thus as Timo suggested, keep the Scala codes in "flink-table-core" would be a compromise solution.

3) If the community makes the final decision, maybe any new features should be added in Java (regardless of the modules), in order to prevent the Scala codes from growing.

Best,
Xingcan


> On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx> wrote:
> 
> Bumping the topic.
> 
> If we want to do this, the sooner we decide, the less code we will have to rewrite. I have some objections/counter proposals to Fabian's proposal of doing it module wise and one module at a time. 
> 
> First, I do not see a problem of having java/scala code even within one module, especially not if there are clean boundaries. Like we could have API in Scala and optimizer rules/logical nodes written in Java in the same module. However I haven’t previously maintained mixed scala/java code bases before, so I might be missing something here.
> 
> Secondly this whole migration might and most like will take longer then expected, so that creates a problem for a new code that we will be creating. After making a decision to migrate to Java, almost any new Scala line of code will be immediately a technological debt and we will have to rewrite it to Java later. 
> 
> Thus I would propose first to state our end goal - modules structure and which parts of modules we want to have eventually Scala-free. Secondly taking all steps necessary that will allow us to write new code complaint with our end goal. Only after that we should/could focus on incrementally rewriting the old code. Otherwise we could be stuck/blocked for years writing new code in Scala (and increasing technological debt), because nobody have found a time to rewrite some non important and not actively developed part of some module.
> 
> Piotrek
> 
>> On 14 Jun 2018, at 15:34, Fabian Hueske <fhueske@xxxxxxxxx> wrote:
>> 
>> Hi,
>> 
>> In general, I think this is a good effort. However, it won't be easy and I
>> think we have to plan this well.
>> I don't like the idea of having the whole code base fragmented into Java
>> and Scala code for too long.
>> 
>> I think we should do this one step at a time and focus on migrating one
>> module at a time.
>> IMO, the easiest start would be to port the runtime to Java.
>> Extracting the API classes into an own module, porting them to Java, and
>> removing the Scala dependency won't be possible without breaking the API
>> since a few classes depend on the Scala Table API.
>> 
>> Best, Fabian
>> 
>> 
>> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrmann@xxxxxxxxxx>:
>> 
>>> I think that is a noble and honorable goal and we should strive for it.
>>> This, however, must be an iterative process given the sheer size of the
>>> code base. I like the approach to define common Java modules which are used
>>> by more specific Scala modules and slowly moving classes from Scala to
>>> Java. Thus +1 for the proposal.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <piotr@xxxxxxxxxxxxxxxxx>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I do not have an experience with how scala and java interacts with each
>>>> other, so I can not fully validate your proposal, but generally speaking
>>> +1
>>>> from me.
>>>> 
>>>> Does it also mean, that we should slowly migrate `flink-table-core` to
>>>> Java? How would you envision it? It would be nice to be able to add new
>>>> classes/features written in Java and so that they can coexist with old
>>>> Scala code until we gradually switch from Scala to Java.
>>>> 
>>>> Piotrek
>>>> 
>>>>> On 13 Jun 2018, at 11:32, Timo Walther <twalthr@xxxxxxxxxx> wrote:
>>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> as you all know, currently the Table & SQL API is implemented in Scala.
>>>> This decision was made a long-time ago when the initital code base was
>>>> created as part of a master's thesis. The community kept Scala because of
>>>> the nice language features that enable a fluent Table API like
>>>> table.select('field.trim()) and because Scala allows for quick
>>> prototyping
>>>> (e.g. multi-line comments for code generation). The committers enforced
>>> not
>>>> splitting the code-base into two programming languages.
>>>>> 
>>>>> However, nowadays the flink-table module more and more becomes an
>>>> important part in the Flink ecosystem. Connectors, formats, and SQL
>>> client
>>>> are actually implemented in Java but need to interoperate with
>>> flink-table
>>>> which makes these modules dependent on Scala. As mentioned in an earlier
>>>> mail thread, using Scala for API classes also exposes member variables
>>> and
>>>> methods in Java that should not be exposed to users [1]. Java is still
>>> the
>>>> most important API language and right now we treat it as a second-class
>>>> citizen. I just noticed that you even need to add Scala if you just want
>>> to
>>>> implement a ScalarFunction because of method clashes between `public
>>> String
>>>> toString()` and `public scala.Predef.String toString()`.
>>>>> 
>>>>> Given the size of the current code base, reimplementing the entire
>>>> flink-table code in Java is a goal that we might never reach. However, we
>>>> should at least treat the symptoms and have this as a long-term goal in
>>>> mind. My suggestion would be to convert user-facing and runtime classes
>>> and
>>>> split the code base into multiple modules:
>>>>> 
>>>>>> flink-table-java {depends on flink-table-core}
>>>>> Implemented in Java. Java users can use this. This would require to
>>>> convert classes like TableEnvironment, Table.
>>>>> 
>>>>>> flink-table-scala {depends on flink-table-core}
>>>>> Implemented in Scala. Scala users can use this.
>>>>> 
>>>>>> flink-table-common
>>>>> Implemented in Java. Connectors, formats, and UDFs can use this. It
>>>> contains interface classes such as descriptors, table sink, table source.
>>>>> 
>>>>>> flink-table-core {depends on flink-table-common and
>>>> flink-table-runtime}
>>>>> Implemented in Scala. Contains the current main code base.
>>>>> 
>>>>>> flink-table-runtime
>>>>> Implemented in Java. This would require to convert classes in
>>>> o.a.f.table.runtime but would improve the runtime potentially.
>>>>> 
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Timo
>>>>> 
>>>>> [1]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.
>>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into-traits-tp21335.html
>>>>> 
>>>> 
>>>> 
>>> 
>