git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Analytics in Rust [was Re: Timeline for Arrow 0.12.0 release]


Changing the subject as we've veered off topic

On Mon, Dec 10, 2018 at 8:04 AM Andy Grove <andygrove73@xxxxxxxxx> wrote:
>
> Cool. I will continue to add primitive operations but I am now adding this
> in a separate source file to keep it separate from the core array code.
>
> I'm not sure how important it will be to support Rust data sources with
> Gandiva. I can see that each language should be able to construct the
> logical query plan to submit to Gandiva and let Gandiva handle execution.

Note: Gandiva isn't an execution engine. It generates compiled
function kernels given an expression tree. It depends on an execution
engine to invoke the kernels in a database runtime-type environment --
Dremio is doing so in production already IIUC.

It might be that Rust developers would choose someday to develop a
Rust-native query runtime, in which case the Gandiva JIT-compiling
could be used to generate custom kernels in a similar fashion to how
they're being used by Dremio in Java.

> I think the more interesting part is how do we support language-specific
> lambda functions as part of that logical query plan. Maybe it is possible
> to compile the lambda down to LLVM (I haven't started learning about LLVM
> in detail yet so this is wild speculation on my part).

Generally database systems define operator nodes for each type of
user-defined function, and the user code is invoked dynamically
similar to interpreted languages. Compiling to LLVM isn't possible in
generality.

> Another option is for Gandiva to support calling into shared libraries and that maybe is
> simpler for languages that support building C-native shared libraries (Rust
> supports this with zero overhead).

These would be C UDFs. I'm familiar with Impala's UDF system, for example:

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_udf.html

There you can declare a new function that is looked up in a shared
library using dlopen / dlsym

- Wes

>
> Andy.
>
>
>
>
> On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>
> > hi Andy,
> >
> > I can see an argument for having some basic native function kernel
> > support in Rust. One of the things that Gandiva has begun is a
> > Protobuf-based serialized representation representation of projection
> > and filter expressions. In the long run I would like to see a more
> > complete relational algebra / logical query plan that can be submitted
> > for execution. There's complexities, though, such as bridging
> > iteration of data sources written in Rust, say, with a query engine
> > written in C++. You would need to provide some kind of a callback
> > mechanism for the query engine to request the next chunk of a dataset
> > to be materialized.
> >
> > It will be interested to see what contributors will be motivated
> > enough to build over the next few years. At the end of the day, Apache
> > projects are do-ocracies.
> >
> > - Wes
> > On Fri, Dec 7, 2018 at 6:22 AM Andy Grove <andygrove73@xxxxxxxxx> wrote:
> > >
> > > I've added one PR to the list (https://github.com/apache/arrow/pull/3119
> > )
> > > to update the project to use Rust 2018 Edition.
> > >
> > > I'm also considering removing one PR from the list and would like to get
> > > opinions here.
> > >
> > > I have a PR (https://github.com/apache/arrow/pull/3033) to add some
> > basic
> > > math and comparison operators to primitive arrays. These are baby steps
> > > towards implementing more query execution capabilities such as
> > projection,
> > > selection, etc but Chao made a good point that other Rust implementations
> > > don't have these kind of capabilities and I am now wondering if this is a
> > > distraction. We already have Gandiva and the new efforts in Ursa labs and
> > > it would probably make more sense to look at having Rust bindings for the
> > > query execution capabilities there rather than having a competing (and
> > less
> > > capable) implementation in Rust.
> > >
> > > Thoughts?
> > >
> > > Andy.
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 6, 2018 at 8:42 PM paddy horan <paddyhoran@xxxxxxxxxxx>
> > wrote:
> > >
> > > > Other than Andy’s PR below I’m going to try and find time to work on
> > > > ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week.
> > > > There is nothing else in the 0.12 backlog for Rust.  It would be nice
> > to
> > > > get the parquet merge in though.
> > > >
> > > >
> > > >
> > > > Paddy
> > > >
> > > >
> > > >
> > > > ________________________________
> > > > From: Andy Grove <andygrove73@xxxxxxxxx>
> > > > Sent: Thursday, December 6, 2018 10:20:48 AM
> > > > To: dev@xxxxxxxxxxxxxxxx
> > > > Subject: Re: Timeline for Arrow 0.12.0 release
> > > >
> > > > I have PRs pending for all the Rust issues that I want to get into
> > 0.12.0
> > > > and would appreciate some reviews so I can go ahead and merge:
> > > >
> > > > https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and
> > > > ARROW-3881
> > > > - add math and comparison operations to primitive arrays)
> > > > https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release
> > > > process)
> > > > https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer)
> > > >
> > > > With these in place I plan on writing a tutorial for reading a CSV
> > file,
> > > > performing some operations on primitive arrays and writing the output
> > to a
> > > > new CSV file.
> > > >
> > > > I am deferring ARROW-3882 (casting for primitive arrays) to 0.13.0
> > > >
> > > > Thanks,
> > > >
> > > > Andy.
> > > >
> > > > On Tue, Dec 4, 2018 at 7:57 PM Andy Grove <andygrove73@xxxxxxxxx>
> > wrote:
> > > >
> > > > > I'd love to tackle the three related issues for supporting simple
> > > > > math/comparison operations on primitive arrays and casting primitive
> > > > arrays
> > > > > but since the change to use Rust specialization feature I'm a bit
> > stuck
> > > > and
> > > > > need some assistance applying the math operations to the numeric
> > types
> > > > and
> > > > > not the boolean primitives. I have added a comment to
> > > > > https://github.com/apache/arrow/pull/3033 ... if I can get help
> > solving
> > > > > for this PR then I should be able to handle the others. I'll also do
> > some
> > > > > research and try and figure this out myself.
> > > > >
> > > > > Andy.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 4, 2018 at 7:03 PM Wes McKinney <wesmckinn@xxxxxxxxx>
> > wrote:
> > > > >
> > > > >> Andy, Paddy, or other Rust developers -- could you review the 6
> > issues
> > > > >> in TODO in the 0.12 backlog and either assign them or move them to
> > the
> > > > >> next release if they aren't going to be completed this week or next?
> > > > >>
> > > > >>
> > > > >> On Fri, Nov 30, 2018 at 4:34 PM Wes McKinney <wesmckinn@xxxxxxxxx>
> > > > wrote:
> > > > >> >
> > > > >> > hi folks,
> > > > >> >
> > > > >> > Tomorrow is December 1. The last major Arrow release (0.11.0) took
> > > > >> > place on October 8. Given how much work has happened in the
> > project in
> > > > >> > the last ~2 months, I think it would be great to complete the next
> > > > >> > major release before the end-of-year holidays set in.
> > > > >> >
> > > > >> > I've been curating the JIRA backlog the last couple of weeks, and
> > have
> > > > >> > just created a 0.12.0 release wiki page to help us stay organized
> > > > >> >
> > > > >> >
> > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.12.0+Release
> > > > >> >
> > > > >> > Given that there are only 3 full working weeks between now and
> > > > >> > Christmas, I think we should be in position to cut a release by
> > the
> > > > >> > end of the week of December 10, i.e. by Friday December 14. Not
> > all of
> > > > >> > the TODO issues have to be completed to make the release, but it
> > would
> > > > >> > be good to push to complete as much as possible. Please help by
> > > > >> > reviewing the backlog, and if possible, assigning issues to
> > yourself
> > > > >> > that you'd like to pursue in the next 2 weeks.
> > > > >> >
> > > > >> > Let me know if this sounds reasonable, or any concerns.
> > > > >> >
> > > > >> > Thanks
> > > > >> > Wes
> > > > >>
> > > > >
> > > >
> >