GeoData
GeoData.jl defines common types and methods for reading, writing and manipulating spatial data, currently raster arrays like GeoTIFF and NetCDF, multi-array stacks, and series of stacks or geoarrays spread over multiple files. It provides a standardised interface that allows many source data types to be used with identical syntax.
Data loaded with GeoData.jl has some special properties:
- Plots are always oriented the right way. Even if you reverse or permute a
GeoArray
it will still plot the right way! - Regions and points selected with
Between
andContains
select the right points or whole intervals no matter the order of the index or it's position in the cell. - For
Projected
modeGRDarray
andGDALarray
You can index in any projection you want to by setting themappedcrs
keyword on construction. You don't even need to know the underlying projection, the conversion is handled automatically. This means lat/lonEPSG(4326)
can be used across all sources seamlessly if you need that. - Packages building on GeoData.jl can treat
AbstractGeoSeries
,AbstractGeoStack
, andAbstrackGeoArray
as black boxes:- The data could hold tiff or netcdf files,
Array
s in memory orCuArray
s on the GPU - they will all behave in the same way. AbstractGeoStack
can be a Netcdf or HDF5 file, or aNamedTuple
ofGDALarray
holding.tif
files, or allGeoArray
in memory, but be treated as if they are all the same thing.- Modelling packages do not have to deal with the specifics of spatial file types directly.
- The data could hold tiff or netcdf files,
GeoData.jl extends
DimensionalData.jl so that
spatial data can be indexed using named dimensions like Lat
and Lon
, Ti
(time), which can also be used in most Base
and Statistics
methods like
mean
and reduce
where dims
arguments are required. Much of the behaviour
is covered in the DimensionalData
docs.
GeoData.jl provides general types for holding spatial data: GeoArray
,
GeoStack
, and GeoSeries
, and types specific to various backends for loading
disk-based data. All can be loaded using the functions geoarray
, stack
and
series
, that will guess the backend from the file type. R .grd
files can be
loaded natively, GDAL when ArchGDAL.jl
(v0.5 or higher) is imported, and NetCDF can be loaded when
NCDatasets.jl is imported.
When HDF5.jl is imported, files from the Soil Moisture Active Passive
(SMAP) dataset can be loaded with stack
or
series
. This is both useful for users of SMAP, and a demonstration of the
potential to build standardised interfaces for custom spatial dataset formats
like those used in SMAP.
Files can be written to disk in all formats using write
, and can (with some caveats)
be written to to different formats providing file-type conversion for spatial data.
Warning: this is an MVP.
It works quite well but spatial data is very complicated. Some things may break. Currently saving a Netcdf to a GDAL tif, or the reverse, projections are not totally accurate.
Eventually they will be, but converting projections and index conventions between formats
is difficult. with many edge case problems. For now, assume the index is not exactly correct.
Between
, Contains
and bounds
are close approximations, but may contain errors.
Examples
We'll load a file from disk, and do some manipulations and plotting.
Load GeoData, and NCDatasets, download file and load it to
an array. This netcdf file only has one layer, if it has more we
could use NCDstack
instead.
julia> using GeoData, NCDatasets
julia> url = "https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc";
julia> filename = download(url, "tos_O1_2001-2002.nc");
julia> A = geoarray(filename)
NCDarray (named tos) with dimensions:
Longitude (type Lon): Float64[1.0, 3.0, …, 357.0, 359.0] (Converted: Ordered Regular Intervals)
Latitude (type Lat): Float64[-79.5, -78.5, …, 88.5, 89.5] (Converted: Ordered Regular Intervals)
Time (type Ti): DateTime360Day[DateTime360Day(2001-01-16T00:00:00), DateTime360Day(2001-02-16T00:00:00), …, DateTime360Day(2002-11-16T00:00:00), DateTime360Day(2002-12-16T00:00:00)] (Sampled: Ordered Irregular Points)
and data: 180×170×24 Array{Union{Missing, Float32},3}
[:, :, 1]
missing missing missing … 271.437 271.445 271.459
missing missing missing 271.438 271.445 271.459
...
Now plot every third month in the first year, just using the regular index:
julia> using Plots
juila> A[Ti(1:3:12)] |> plot
Now plot Australia in the first month of 2001. Notice we are using lat/lon coordinates and date/time instead of regular indexes:
julia> A[Ti(Near(DateTime360Day(2001, 01, 17))),
Lat(Between(0.0, -50.0)),
Lon(Between(100.0, 160.0))] |> plot
Now get the mean over the timespan, then save it to disk, and plot it :
julia> using Statistics
julia> mean_tos = mean(A; dims=Ti)
GeoArray (named tos) with dimensions:
Longitude (type Lon): Float64[1.0, 3.0, …, 357.0, 359.0] (Converted: Ordered Regular Intervals)
Latitude (type Lat): Float64[-79.5, -78.5, …, 88.5, 89.5] (Converted: Ordered Regular Intervals)
Time (type Ti): DateTime360Day[2001-01-16T00:00:00] (Sampled: Ordered Irregular Points)
and data: 180×170×1 Array{Union{Missing, Float32},3}
[:, :, 1]
missing missing missing missing … 271.434 271.443 271.454
missing missing missing missing 271.434 271.443 271.454
...
julia> write("mean.ncd", NCDarray, mean_tos)
Writing netcdf...
key: "longitude" of type: Float64
key: "latitude" of type: Float64
key: "time" of type: DateTime360Day
key: "tos" of type: Union{Missing, Float32}
"mean.ncd"
julia> plot(mean_tos; color=:viridis)
Plotting recipes in DimensionalData.jl are the fallback for GedData.jl when
the object doesn't have both Lat
and Lon
dimensions. So (as a random example) we
could plot a transect of ocean surface temperature at 20 degree latitude :
A[Lat(Near(20.0)), Ti(1)] |> plot
GeoData.jl provides a range of other methods that are being added to over time.
agregate
: aggregate data by the same or different amounts for each axis.disaggregate
: disaggregate data by the same or different amounts for each axisresample
can resample data to a different size and projection, and snap to an existingAbstractGeoArray
.
For example, aggregate
:
julia> aggregate(mean, A, (Ti(12), Lat(20), Lon(20))
GeoArray (named tos) with dimensions:
Longitude (type Lon): Float64[21.0, 61.0, …, 301.0, 341.0] (Converted: Ordered Regular Intervals)
Latitude (type Lat): Float64[-69.5, -49.5, …, 50.5, 70.5] (Converted: Ordered Regular Intervals)
Time (type Ti): DateTime360Day[2001-01-16T00:00:00, 2002-01-16T00:00:00] (Sampled: Ordered Irregular Points)
and data: 9×8×2 Array{Union{Missing, Float32},3}
[:, :, 1]
missing 277.139 missing missing missing missing missing missing
missing 277.126 missing missing missing missing missing missing
These methods will also work for entire GeoStacks
and GeoSeries
using the same syntax.
Works in progress
- Integration with Vector/DataFrame spatial types and point/line/polygon data types. It should be possible to select polygons of data, and convert between linear datasets and array formats.
- Standardised handling and conversion of spatial metadata between data formats
- Handling complex projections: Affine transformation of dimensions to indices.
AffineMaps will be stored as a wrapper dimension in
dims
. - Load and write the NetCDF projection format.