ParquetFiles.jl

FileIO.jl integration for Parquet files
Popularity
19 Stars
Updated Last
2 Years Ago
Started In
December 2017

ParquetFiles

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Build Status Build status codecov.io

Overview

This package provides load support for Parquet files under the FileIO.jl package.

Installation

Use ] add ParquetFiles in Julia to install ParquetFiles and its dependencies.

Usage

Load a Parquet file

To read a Parquet file into a DataFrame, use the following julia code:

using ParquetFiles, DataFrames

df = DataFrame(load("data.parquet"))

The call to load returns a struct that is an IterableTable.jl, so it can be passed to any function that can handle iterable tables, i.e. all the sinks in IterableTable.jl. Here are some examples of materializing a Parquet file into data structures that are not a DataFrame:

using ParquetFiles, IndexedTables, TimeSeries, Temporal, VegaLite

# Load into an IndexedTable
it = IndexedTable(load("data.parquet"))

# Load into a TimeArray
ta = TimeArray(load("data.parquet"))

# Load into a TS
ts = TS(load("data.parquet"))

# Plot directly with Gadfly
@vlplot(:point, data=load("data.parquet"), x=:a, y=:b)

Using the pipe syntax

load also support the pipe syntax. For example, to load a Parquet file into a DataFrame, one can use the following code:

using ParquetFiles, DataFrame

df = load("data.parquet") |> DataFrame

The pipe syntax is especially useful when combining it with Query.jl queries, for example one can easily load a Parquet file, pipe it into a query, then pipe it to the save function to store the results in a new file.