Polars.jl

:bear: Julia wrapper around the polars library
Author Pangoraw
Popularity
30 Stars
Updated Last
4 Months Ago
Started In
August 2023

Polars.jl

Polars.jl is a thin wrapper for Julia around the dataframe manipulation library polars.

Example

julia> using Polars

julia> customers = read_parquet("NONE_pandas_pyarrow_customer.parquet") |> lazy;

julia> nations = read_parquet("NONE_pandas_pyarrow_nation.parquet") |> lazy;

julia> customers_nations = innerjoin(customers, nations, col("nation_key"));

julia> gb = groupby(customers_nations, [col("nation_key")]);

julia> gbagg = agg(gb,
           col("name") |> alias("customer_names"),
           col("name_right") |> first |> Strings.lowercase,
           col("acctbal") |> mean,
       );

julia> gbagg_sorted = sort(gbagg, "name_right");

julia> select(gbagg_sorted,
           col("name_right") |> alias("nation_name"),
           col("customer_names"),
           col("acctbal"),
        ) |> collect
25×3 DataFrame
 nation_name  customer_names                    acctbal 
 String       Series{Union{Missing, String}}    Float64 
────────────────────────────────────────────────────────
     algeria  ["Customer#000000029", "Custome…   4442.7
   argentina  ["Customer#000000003", "Custome…   4485.0
      brazil  ["Customer#000000017", "Custome…  4471.02
      canada  ["Customer#000000005", "Custome…  4489.26
       china  ["Customer#000000007", "Custome…  4438.95
       egypt  ["Customer#000000004", "Custome…  4520.49
    ethiopia  ["Customer#000000010", "Custome…  4467.37
      france  ["Customer#000000018", "Custome…  4436.01
                                                 
                                         17 rows omitted

References

Julia already has a very good dataframe story with DataFrames.jl, which provides a more Julian experience since any types of collections can be used as a column. On the other hand, Polars works through the Arrow data format and therefore only supports certain physical vectors (materialized in memory) such as Vector{Int}. Polars.jl focuses on wrapping operations on lazy frames since it is one of the main differentiating factor with DataFrames. Indeed eager operations are implemented as collect∘op∘lazy. Consider trying DataFrames.jl if your problem involves a lot of Julia "interopability" where Polars would not offer the same level of interopability.

Polars C-API

To build the polars c-api, run the following commands:

cd c-polars
cargo build # --release

This is mostly helpful for development to test C-API changes with the Julia version. A header file is also included if one wants to use the API from C directly.