Standardising geospatial data in the Julia language
Author rafaqz
70 Stars
Updated Last
2 Years Ago
Started In
June 2019


Build Status Codecov Aqua.jl Quality Assurance

GeoData.jl defines common types and methods for reading, writing and manipulating spatial data, currently raster arrays like GeoTIFF and NetCDF, multi-array stacks, and series of stacks or geoarrays spread over multiple files. It provides a standardised interface that allows many source data types to be used with identical syntax.

Data loaded with GeoData.jl has some special properties:

  • Plots are always oriented the right way. Even if you reverse or permute a GeoArray it will still plot the right way!
  • Regions and points selected with Between and Contains select the right points or whole intervals no matter the order of the index or it's position in the cell.
  • For Projected mode GRDarray and GDALarray You can index in any projection you want to by setting the mappedcrs keyword on construction. You don't even need to know the underlying projection, the conversion is handled automatically. This means lat/lon EPSG(4326) can be used across all sources seamlessly if you need that.
  • Packages building on GeoData.jl can treat AbstractGeoSeries, AbstractGeoStack, and AbstrackGeoArray as black boxes:
    • The data could hold tiff or netcdf files, Arrays in memory or CuArrays on the GPU - they will all behave in the same way.
    • AbstractGeoStack can be a Netcdf or HDF5 file, or a NamedTuple of GDALarray holding .tif files, or all GeoArray in memory, but be treated as if they are all the same thing.
    • Modelling packages do not have to deal with the specifics of spatial file types directly.

GeoData.jl extends DimensionalData.jl so that spatial data can be indexed using named dimensions like Lat and Lon, Ti (time), which can also be used in most Base and Statistics methods like mean and reduce where dims arguments are required. Much of the behaviour is covered in the DimensionalData docs.

GeoData.jl provides general types for holding spatial data: GeoArray, GeoStack, and GeoSeries, and types specific to various backends for loading disk-based data. All can be loaded using the functions geoarray, stack and series, that will guess the backend from the file type. R .grd files can be loaded natively, GDAL when ArchGDAL.jl (v0.5 or higher) is imported, and NetCDF can be loaded when NCDatasets.jl is imported.

When HDF5.jl is imported, files from the Soil Moisture Active Passive (SMAP) dataset can be loaded with stack or series. This is both useful for users of SMAP, and a demonstration of the potential to build standardised interfaces for custom spatial dataset formats like those used in SMAP.

Files can be written to disk in all formats using write, and can (with some caveats) be written to to different formats providing file-type conversion for spatial data.

Warning: this is an MVP.

It works quite well but spatial data is very complicated. Some things may break. Currently saving a Netcdf to a GDAL tif, or the reverse, projections are not totally accurate.

Eventually they will be, but converting projections and index conventions between formats is difficult. with many edge case problems. For now, assume the index is not exactly correct. Between, Contains and bounds are close approximations, but may contain errors.


We'll load a file from disk, and do some manipulations and plotting.

Load GeoData, and NCDatasets, download file and load it to an array. This netcdf file only has one layer, if it has more we could use NCDstack instead.

julia> using GeoData, NCDatasets

julia> url = "";

julia> filename = download(url, "");

julia> A = geoarray(filename)
NCDarray (named tos) with dimensions:
 Longitude (type Lon): Float64[1.0, 3.0, …, 357.0, 359.0] (Converted: Ordered Regular Intervals)
 Latitude (type Lat): Float64[-79.5, -78.5, …, 88.5, 89.5] (Converted: Ordered Regular Intervals)
 Time (type Ti): DateTime360Day[DateTime360Day(2001-01-16T00:00:00), DateTime360Day(2001-02-16T00:00:00), …, DateTime360Day(2002-11-16T00:00:00), DateTime360Day(2002-12-16T00:00:00)] (Sampled: Ordered Irregular Points)
and data: 180×170×24 Array{Union{Missing, Float32},3}
[:, :, 1]
 missing  missing     missing271.437  271.445  271.459
 missing  missing     missing     271.438  271.445  271.459

Now plot every third month in the first year, just using the regular index:

julia> using Plots

juila> A[Ti(1:3:12)] |> plot

Global ocean surface temperatures

Now plot Australia in the first month of 2001. Notice we are using lat/lon coordinates and date/time instead of regular indexes:

julia> A[Ti(Near(DateTime360Day(2001, 01, 17))), 
         Lat(Between(0.0, -50.0)), 
         Lon(Between(100.0, 160.0))] |> plot

Australia regional ocean surface temperature

Now get the mean over the timespan, then save it to disk, and plot it :

julia> using Statistics
julia> mean_tos = mean(A; dims=Ti)
GeoArray (named tos) with dimensions:
 Longitude (type Lon): Float64[1.0, 3.0, …, 357.0, 359.0] (Converted: Ordered Regular Intervals)
 Latitude (type Lat): Float64[-79.5, -78.5, …, 88.5, 89.5] (Converted: Ordered Regular Intervals)
 Time (type Ti): DateTime360Day[2001-01-16T00:00:00] (Sampled: Ordered Irregular Points)
and data: 180×170×1 Array{Union{Missing, Float32},3}
[:, :, 1]
 missing  missing     missing     missing271.434  271.443  271.454
 missing  missing     missing     missing     271.434  271.443  271.454

julia> write("mean.ncd", NCDarray, mean_tos)
    Writing netcdf...
        key: "longitude" of type: Float64
        key: "latitude" of type: Float64
        key: "time" of type: DateTime360Day
        key: "tos" of type: Union{Missing, Float32}

julia> plot(mean_tos; color=:viridis) 

Mean temperatures

Plotting recipes in DimensionalData.jl are the fallback for GedData.jl when the object doesn't have both Lat and Lon dimensions. So (as a random example) we could plot a transect of ocean surface temperature at 20 degree latitude :

A[Lat(Near(20.0)), Ti(1)] |> plot

Temperatures at lattitude 20-21

GeoData.jl provides a range of other methods that are being added to over time.

  • agregate: aggregate data by the same or different amounts for each axis.
  • disaggregate: disaggregate data by the same or different amounts for each axis
  • resample can resample data to a different size and projection, and snap to an existing AbstractGeoArray.

For example, aggregate:

julia> aggregate(mean, A, (Ti(12), Lat(20), Lon(20))

GeoArray (named tos) with dimensions:
 Longitude (type Lon): Float64[21.0, 61.0, …, 301.0, 341.0] (Converted: Ordered Regular Intervals)
 Latitude (type Lat): Float64[-69.5, -49.5, …, 50.5, 70.5] (Converted: Ordered Regular Intervals)
 Time (type Ti): DateTime360Day[2001-01-16T00:00:00, 2002-01-16T00:00:00] (Sampled: Ordered Irregular Points)
and data: 9×8×2 Array{Union{Missing, Float32},3}
[:, :, 1]
 missing  277.139        missing     missing     missing     missing  missing  missing
 missing  277.126        missing     missing     missing     missing  missing  missing

These methods will also work for entire GeoStacks and GeoSeries using the same syntax.

Works in progress

  • Integration with Vector/DataFrame spatial types and point/line/polygon data types. It should be possible to select polygons of data, and convert between linear datasets and array formats.
  • Standardised handling and conversion of spatial metadata between data formats
  • Handling complex projections: Affine transformation of dimensions to indices. AffineMaps will be stored as a wrapper dimension in dims.
  • Load and write the NetCDF projection format.

Used By Packages