9 Bridges to GIS software
- This chapter requires the following packages:
An important feature of R is that users must use the command-line interface (CLI) to interact with the computer:
you type commands and hit
Enter to execute them interactively.
The most popular open-source GIS software packages, by contrast, feature a prominent graphical user interface (GUI):
you can interact with QGIS, SAGA and gvSIG by typing into (dockable) command-lines, but most users interact with such programs by ‘pointing and clicking their way through life’ as Gary Sherman puts it below (Sherman 2008):40
With the advent of ‘modern’ GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line. Many times you can do something on the command line in a fraction of the time you can do it with a GUI.
Gary Sherman is well-qualified to comment on this matter as the creator of QGIS, the world’s premier open source GUI-based GIS!
The ‘CLI vs GUI’ debate is often adversarial and polarized but it does not have to be: both options are great if well chosen in accordance with your needs and tasks. The advantages of a good CLI such as that provided by R are numerous. Among others, a CLI:
- Facilitates the automation of repetitive tasks.
- Ensures transparency and reproducibility (which also is the backbone of good scientific practice), and therefore is the preferred option for doing geographic data science.
- Encourages extending existing software by making it easy to modify, extend and implement new functions.
- Professional and advanced technical skills will certainly enhance your career prospects, and are in dire need across a wide range of disciplines.
- Is fun, but admittedly that is a subjective argument.
On the other hand, GUI-based GIS systems (particularly QGIS) are also advantageous. For instance, think of:
- The really user-friendly graphical interface which spares the user from programming and facilitates interactive geographic data exploration and interactive map production.
- Digitizing and all related tools (trace, snap, topological rules, etc.).41
- Georeferencing with ground control points and orthorectification. The process of locating control points, choosing the right transformation model, and evaluating the registration error are inherently interactive, and thus unsuitable to be done with R, or any other programming language.
- Stereoscopic mapping (e.g., LiDAR and structure from motion).
- The built-in geodatabase management system often integrated in Desktop GIS (ArcMap, GRASS GIS) and all related advantages such as object-oriented relational data modeling, topology, fast (spatial) querying, among others.
Another advantage of GIS software is that they provide access to hundreds of ‘geoalgorithms’ (computational recipes to solve geographic problems — see Chapter 10), many of which are unavailable from the R command line. The good news is that such geoalgorithms can be accessed from the R command via ‘GIS bridges’, hence the title of (and motivation for) this chapter.42
The R language was originally designed as an interface to and extension of other languages, especially C and FORTRAN, to enable access to statistical algorithms in a user-friendly and intuitive read-evaluate-print loop (REPL) (Chambers 2016). R was not originally intended to be a GIS. This makes the breadth of R’s geospatial capabilities astonishing to many who are unaware of its ability to replicate established GIS software for many operations on spatial data. There are some domains where R can now outperform desktop GIS including spatial statistical modeling, online interactive visualization and the generation of animated or faceted maps.
Instead of implementing existing GIS algorithms in R, it makes sense to avoid ‘reinventing the wheel’ by taking advantage of R’s ability to interface with other languages (especially C++, which is used for much low-level and high-performance GIS work). Using compiled code for new geoprocessing functionality (particularly with the help of the excellent Rcpp package) could form the basis of new R packages, building on the success of sf (Pebesma 2018). However, there is already a wide range of algorithms that can be accessed via R’s interfaces to dedicated GIS software. It makes sense to understand these before moving to develop your own optimized algorithms. For this reason this chapter focuses on ‘bridges’ to the mature GIS products QGIS (via the package RQGIS), SAGA (RSAGA) and GRASS (rgrass7) from within R (Table 9.1). Obviously, we here focus on open-source software solutions, however, there is also a bridge to the commercial GIS leader ArcGIS through the RPyGeo package. And the so-called R-ArcGIS bridge allows to use R from within ArcGIS. As a final note, we would like to point out that aside from interfaces to desktop GIS there are also interfaces to geospatial libraries such as GDAL (gdalUtils, rgdal, sf) and GEOS (rgeos, sf). By the end of the chapter you should have a working knowledge of the functionality such packages open up, and a more nuanced understanding of the ‘CLI vs GUI’ debate. As mentioned in Chapter 1, doing GIS at the command-line makes it more reproducible, in-line with the principles of Geographic Data Science.
|GIS||first release||no. functions||support|
Note: a Comparing downloads of different providers is rather difficult (see http://spatialgalaxy.net/2011/12/19/qgis-users-around-the-world/), and here also useless since every Windows QGIS download automatically also downloads SAGA and GRASS.
QGIS is one of the most popular open-source GIS (Table 9.1; ???).
Its main advantage lies in the fact that it provides a unified interface to several other open-source GIS.
This means that you have access to GDAL/OGR, GRASS and SAGA through QGIS (???).
To run all these geoalgorithms (frequently more than 1000 depending on your set up) outside of the QGIS GUI, QGIS provides a Python API.
RQGIS establishes a tunnel to this Python API through the reticulate package.
open_app() are doing this.
Note that it is optional to run
open_app() since all functions depending on their output will run them automatically if needed.
Before running RQGIS you have to make sure to have installed QGIS and all its (third-party) dependencies such as SAGA and GRASS.
To install RQGIS a number of dependencies are required, as described in the
install_guide vignette, which covers installation on Windows, Linux and Mac.
Please install the long-term release of QGIS, i.e. 2.18, since RQGIS so far does not support QGIS 3.
set_env() unspecified will search the computer for a QGIS installation.
Hence, it is faster to specify explicitly the path to your QGIS installation.
open_app sets all paths necessary to run QGIS from within R, and finally creates a so-called QGIS custom application (see http://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/intro.html#using-pyqgis-in-custom-applications).
We are now ready for some QGIS geoprocessing from within R!
Unioning polygons often produces so-called sliver polygons. This is the case when the borders of the polygons to union do not overlap completely which is frequently the case when combining data from different sources. Here, we will reuse the incongruent polygons we have already encountered in section 4.2.5. Both polygon datasets are available in the spData package, and for both we would like to use a geographic CRS (see also Chapter 6).
First, we will need a QGIS geoalgorithm that unions polygons.
find_algorithms() searches all QGIS geoalgorithms with the help of regular expressions.
Assuming that the short description of the function contains the word “union”, we can run:
If you also want to have a short description for each geoalgorithm, set the
If one has no clue at all what the name of a geoalgorithm might be, one can leave the
search_term-argument empty which will return a list of all available QGIS geoalgorithms.
One can find the algorithms also in the QGIS online documentation.
The next step is to find out how
qgis:union can be used.
open_help() opens the online help of the geoalgorithm in question.
get_usage() returns all function parameters and default values.
Finally, we can let QGIS do the work.
Note that the workhorse function
run_qgis() accepts R named arguments, i.e., you can specify the parameter names as returned by
run_qgis() as you would do in any other regular R function.
Note also that
run_qgis() accepts spatial objects residing in R’s global environment as input (here:
But of course, you could also specify paths to spatial vector files stored on disk.
TRUE automatically loads the QGIS output as an sf-object into R.
Note that the QGIS union operation merges the two input layers into one layer by using the intersection and the symmetrical difference of the two input layers (which by the way is also the default when doing a union operation in GRASS and SAGA).
This is not the same as
st_union(incongruent, aggregating_zones) (see Exercises)!
The QGIS output contains empty geometries and multipart polygons.
Empty geometries might lead to problems in subsequent geoprocessing tasks which is why they will be deleted.
NA if a geometry is empty, and can therefore be used as a filter.
Next we convert multipart polygons into single part polygons (also known as explode geometries or casting). This is necessary for the deletion of sliver polygons later on.
One way to identify slivers is to find polygons with comparatively very small areas, here, e.g., 25000 m2 (see left panel of Figure 9.1).
The next step is to find a function that makes the slivers disappear. Assuming the function or its short description contains the word “sliver”, we can run:
This returns only one geoalgorithm whose parameters can be accessed with the help of
alg = "qgis:eliminatesliverpolygons" get_usage(alg) #>ALGORITHM: Eliminate sliver polygons #> INPUT <ParameterVector> #> KEEPSELECTION <ParameterBoolean> #> ATTRIBUTE <parameters from INPUT> #> COMPARISON <ParameterSelection> #> COMPARISONVALUE <ParameterString> #> MODE <ParameterSelection> #> OUTPUT <OutputVector> #> ...
Conveniently, the user has not to specify each single parameter.
In case a parameter is left unspecified,
run_qgis() will automatically use the corresponding default value as argument if available.
To find out about the default values, run
To finally get rid off the slivers, we specify that all polygons with an area less or equal to 25000 m2 should be joined to that neighboring polygon with the largest area (see right panel of Figure 9.1).
- Leaving the output parameter(s) unspecified, saves the resulting QGIS output to a temporary folder created by QGIS.
run_qgisprints these paths to the console after successfully running the QGIS engine.
- If the output consists of multiple files and you have set
run_qgiswill return a list with each element corresponding to one output file.
To learn more about RQGIS please refer to Muenchow, Schratz, and Brenning (2017).
The System for Automated Geoscientific Analyses (SAGA; Table 9.1) provides the possibility to execute SAGA modules via the command line interface (saga_cmd.exe under Windows and just saga_cmd under Linux) (see also [https://sourceforge.net/p/saga-gis/wiki/Executing%20Modules%20with%20SAGA%20CMD/(https://sourceforge.net/p/saga-gis/wiki/Executing%20Modules%20with%20SAGA%20CMD/)) In addition, there is a Python interface (SAGA Python API). RSAGA uses the former to run SAGA from within R.
Though SAGA is a hybrid GIS, its main focus has been on raster processing, and here particularly on digital elevation models (soil properties, terrain attributes, climate parameters).
Hence, SAGA is especially good at the fast processing of large (high-resolution) raster datasets (Conrad et al. 2015).
Therefore, we will introduce RSAGA with a raster use case from Muenchow, Brenning, and Richter (2012).
Specifically, we would like to compute the SAGA wetness index from a digital elevation model.
First of all, we need to make sure that RSAGA will find SAGA on the computer when called.
For this, all RSAGA functions using SAGA in the background make use of
rsaga.env() will detect SAGA automatically by searching several likely directories (see its help for more information).
However, it is possible to have ‘hidden’ SAGA in a location
rsaga.env() does not search automatically.
linkSAGA searches your computer for a valid SAGA installation.
If it finds one, it adds the newest version to the PATH environment variable thereby making sure that
rsaga.env() runs successfully.
It is only necessary to run the next code chunk if
rsaga.env() was unsuccessful (see previous code chunk).
Secondly, we need to write the digital elevation model to a SAGA-format.
Note that calling
data(landslides) attaches two object to the global environment -
dem, a digital elevation model in the form of a
data.frame containing observations representing the presence or absence of a landslide:
The organization of SAGA is modular. Libraries contain so-called modules, i.e., geoalgorithms. To find out which libraries are available, run:
We choose the library
ta is the abbreviation for terrain analysis).
Subsequently, we can access the available modules of a specific library (here:
ta_hydrology) as follows:
rsaga.get.usage() prints the function parameters of a specific geoalgorithm, e.g., the
SAGA Wetness Index, to the console.
Finally, you can run SAGA from within R using RSAGA’s geoprocessing workhorse function
The function expects a parameter-argument list in which you have specified all necessary parameters.
To facilitate the access to the SAGA interface, RSAGA frequently provides user-friendly wrapper-functions with meaningful default values (see RSAGA documentation for examples, e.g.,
So the function call for calculating the ‘SAGA Wetness Index’ becomes as simple as:
Of course, we would like to inspect our result visually (Figure 9.2). To load and plot the SAGA output file, we use the raster package.
You can find a much more extended version of the presented example in the RSAGA vignette
This example includes statistical geocomputing, i.e., it uses a GIS to derive terrain attributes as predictors for a non-linear Generalized Additive Model (GAM) to predict spatially landslide susceptibility (Muenchow, Brenning, and Richter 2012).
The term statistical geocomputation emphasizes the strength of combining R’s data science power with the geoprocessing power of a GIS which is at the very heart of building a bridge from R to GIS.
9.4 GRASS through rgrass7
The U.S. Army - Construction Engineering Research Laboratory (USA-CERL) created the core of the Geographical Resources Analysis Support System (GRASS) (Table 9.1; Neteler and Mitasova 2008) from 1982 to 1995. Academia continued this work since 1997. Similar to SAGA, GRASS focused on raster processing in the beginning while only later, since GRASS 6.0, adding advanced vector functionality (???).
We will introduce rgrass7 with one of the most interesting problems in GIScience - the traveling salesman problem.
Suppose a traveling salesman would like to visit 24 customers.
Additionally, he would like to start and finish his journey at home which makes a total of 25 locations while covering the shortest distance possible.
There is a single best solution to this problem, however, to find it, is even for modern computers (mostly) impossible (Longley 2015).
In our case, the number of possible solutions correspond to
(25 - 1)! / 2, i.e., the factorial of 24 divided by 2 (since we do not differentiate between forward or backward direction).
Even if one iteration can be done in a nanosecond this still corresponds to 9837145 years.
Luckily, there are clever, almost optimal solutions which run in a tiny fraction of this inconceivable amount of time.
GRASS GIS provides one of these solutions (for more details see v.net.salesman).
In our use case, we would like to find the shortest path between the first 25 bicycle stations (instead of customers) on London’s streets (and we simply assume that the first bike station corresponds to the home of our traveling salesman).
Aside from the cycle hire points data, we will need the OpenStreetMap data of London.
We download it with the help of the osmdata package (see also section 7.2).
We constrain the download of the street network (in OSM language called “highway”) to the bounding box of the cycle hire data, and attach the corresponding data as an
osmdata_sf returns a list with several spatial objects (points, lines, polygons, etc.).
Here, we will only keep the line objects.
OpenStreetMap objects come with a lot of columns,
streets features almost 500.
In fact, we are only interested the geometry column.
Nevertheless, we are keeping one attribute column, otherwise we later on run into trouble when trying to provide
writeVECT() only with a geometry object (see further below and
?writeVECT for more details).
Remember that the geometry column is sticky, hence, even though we are just selecting one attribute, the geometry column will be also returned (see section 2.1.1).
As a convenience to the reader, one can attach
london_streets to the global environment using
data("london_streets", package = "spDataLarge").
Now that we have the data, we can go on and initiate a GRASS session, i.e., we have to create a GRASS geodatabase. The GRASS geodatabase system is based on SQLite. Consequently, different users can easily work on the same project, possibly with different read/write permissions. However, one has to set up this geodatabase (also from within R), and users used to a GIS GUI popping up by one click might find this process a bit intimidating in the beginning. First of all, the GRASS database requires its own directory, and contains a location (see the GRASS GIS Database help pages at grass.osgeo.org for further information). The location in turn simply contains the geodata for one project. Within one location several mapsets can exist, and typically refer to different users. PERMANENT is a mandatory mapset, and created automatically. It stores the projection, the spatial extent and the default resolution for raster data. In order to share spatial data with all users of a project, the database owner can add spatial data to the PERMANENT mapset. Please refer to Neteler and Mitasova (2008) and the GRASS GIS quick start for more information on the GRASS geodatabase system.
You have to set up a location and a mapset if you want to use GRASS from within R. First of all, we need to find out if and where GRASS7 is installed on the computer.
link is a
data.frame which contains in its rows the GRASS 7 installations on your computer.
Here, we will use a GRASS7 installation.
If you have not installed GRASS7 on your computer, we recommend that you do so now.
Assuming that we have found a working installation on your computer, we use the corresponding path in
Additionally, we specify where to store the geodatabase (gisDbase), name the location
london, and use the PERMANENT mapset.
library(rgrass7) # find a GRASS7 installation, and use the first one ind = grep("7", link$version) # next line of code only necessary if we want to use GRASS as installed by # OSGeo4W. Among others, this adds some paths to PATH, which are also needed # for running GRASS. link2GI::paramGRASSw(link[ind, ]) grass_path = ifelse(test = !is.null(link$installation_type) && link$installation_type[ind] == "osgeo4W", yes = file.path(link$instDir[ind], "apps/grass", link$version[ind]), no = link$instDir) initGRASS(gisBase = grass_path, # home parameter necessary under UNIX-based systems home = tempdir(), gisDbase = tempdir(), location = "london", mapset = "PERMANENT", override = TRUE)
Subsequently, we define the projection, the extent and the resolution.
execGRASS("g.proj", flags = c("c", "quiet"), proj4 = sf::st_crs(london_streets)$proj4string) b_box = sf::st_bbox(london_streets) execGRASS("g.region", flags = c("quiet"), n = as.character(b_box["ymax"]), s = as.character(b_box["ymin"]), e = as.character(b_box["xmax"]), w = as.character(b_box["xmin"]), res = "1")
Once you are familiar how to set up the GRASS environment, it becomes tedious to do so over and over again.
linkGRASS7() of the link2GI packages lets you do it with one line of code.
The only thing you need to provide is a spatial object which determines the projection and the extent of the geodatabase.
linkGRASS7() finds all GRASS installations on your computer.
Since we have set
TRUE, we can interactively choose one of the found GRASS-installations.
If there is just one installation, the
linkGRASS7() automatically chooses this one.
linkGRASS7() establishes a connection to GRASS7.
Before we can use GRASS geoalgorithms, we need to add data to GRASS’s spatial database.
Luckily, the convenience function
writeVECT() does this for us.
writeRast() in the case of raster data.)
In our case we add the street and cycle hire point data while using only the first attribute column, and name them also
Note that we are converting the sf-objects into objects of class
In time, rgrass7 will also work with sf-objects.
To perform our network analysis, we need a topological clean street network.
v.clean takes care of the removal of duplicates, small angles and dangles, among others.
Here, we break lines at each intersection to ensure that the subsequent routing algorithm can actually turn right or left at an intersection, and save the output in a GRASS object named
Probably a few of our cycling station points do not exactly lie on a street segment.
However, to find the shortest route between them, we need to connect them to the nearest streets segment.
v.net’s connect-operator does exactly this.
We save its output in
The resulting clean dataset serves as input for the
v.net.salesman-algorithm, which finally finds the shortest route between all cycle hire stations.
center_cats requires a numeric range as input.
This range represents the points for which a shortest route should be calculated.
Since we would like to calculate the route for all cycle stations, we set it to
To access the GRASS help page of the traveling salesman algorithm, run
execGRASS("g.manual", entry = "v.net.salesman").
To visualize our result, we import the output layer into R, convert it into an sf-object , just keep the geometry and visualize it with the help of the mapview package (Figure 9.3).
- Please note that we have used GRASS’s geodatabase (based on SQLite) which allows faster processing.
That means we have only exported spatial data in the beginning.
Then we created new objects but only imported the final result back into R.
To find out which datasets are currently available, run
execGRASS("g.list", type = "vector,raster", flags = "p").
- Of course, we could have also accessed an already existing GRASS geodatabase from within R.
Prior to importing data into R, you might want to perform some (spatial) subsetting.
v.extractfor vector data.
db.selectlets you select subsets of the attribute table of a vector layer without returning the corresponding geometry.
- You can start R also from within a running GRASS session (for more information please refer to ??? and this wiki).
- Refer to the excellent GRASS online help or
execGRASS("g.manual", flags = "i")for more information on each available GRASS geoalgorithm.
- If you would like to use GRASS 6 from within R, use the R package spgrass6.
9.5 When to use what?
To recommend a single R-GIS interface is hard since the usage depends on personal preferences, the tasks at hand and your familiarity with different GIS software packages which in turn probably depends on your field of study. As mentioned previously, SAGA is especially good at the fast processing of large (high-resolution) raster datasets, and frequently used by hydrologists, climatologists and soil scientists (Conrad et al. 2015). GRASS GIS, on the other hand, is the only GIS presented here supporting a topologically based spatial database which is especially useful for network analyses but also simulation studies (see further below). QGIS is much more user-friendly compared to GRASS- and SAGA-GIS, especially for first-time GIS users, and probably the most popular open-source GIS. Therefore, RQGIS is an appropriate choice for most use cases. Its main advantages are:
- A unified access to several GIS, and therefore the provision of >1000 geoalgorithms (Table 9.1). Of course, this includes duplicated functionality, e.g., you can perform overlay-operations using QGIS-, SAGA- or GRASS-geoalgorithms.
- The automatic data format conversions.
For instance, SAGA uses
.sdatgrid files and GRASS uses its own database format but QGIS will handle the corresponding conversions for you on the fly.
- RQGIS can also handle spatial objects residing in R as input for geoalgorithms, and loads QGIS output automatically back into R if desired.
- Its convenience functions to support the access of the online help, R named arguments and automatic default value retrieval. Please note that rgrass7 inspired the latter two features.
By all means, there are use cases when you certainly should use one of the other R-GIS bridges.
Though QGIS is the only GIS providing a unified interface to several GIS software packages, it only provides access to a subset of the corresponding third-party geoalgorithms (for more information please refer to Muenchow, Schratz, and Brenning (2017)).
Therefore, to use the complete set of SAGA and GRASS functions, stick with RSAGA and rgrass7.
When doing so, make advantage of RSAGA’s numerous user-friendly functions.
Note also, that RSAGA offers native R functions for geocomputation such as
pick.from.grid and many more.
RSAGA supports much more SAGA versions than (R)QGIS.
Finally, if you need topological correct data and/or geodatabase-management functionality such as multi-user access, we recommend the usage of GRASS.
In addition, if you would like to run simulations with the help of a geodatabase (Krug, Roura-Pascual, and Richardson 2010), use rgrass7 directly since RQGIS always starts a new GRASS session for each call.
Please note that there are a number of further GIS software packages that have a scripting interface but for which there is no dedicated R package that accesses these: gvSig, OpenJump, Orfeo Toolbox and TauDEM.
9.6 A short note on further spatial APIs
There are many further interfaces to other spatial software packages aside from the interfaces to Desktop GIS software presented in detail in this chapter. This includes interfaces to spatial libraries (section 9.6.1), spatial databases (section 9.6.2) and web mapping services (see section 8). In this section we will only have a short glance at two selected examples. Hopefully, this will give an impression of all the spatial power and flexibility that is readily available on our fingertips.
9.6.1 Spatial library API - GDAL
GDAL/OGR is especially known for its support for numerous spatial drivers (data formats).
In fact, there is barely a geospatial software that is not using GDAL in the background for importing and exporting geographic data.
But GDAL offers more than I/O functions, e.g., geoprocessing tools for vector and raster data (for an overview have a look at http://www.gdal.org/pages.html).
One can easily access these tools via the command line.
In the code chunk below
linkGDAL() searches the computer for a working GDAL installation and adds the path of the executable files to the PATH variable.
This allows us to easily run GDAL/OGR via system calls.
ogrinfo gives back meta information of vector data.
link2GI::linkGDAL() cmd = paste("ogrinfo -ro -so -al", system.file("shape/nc.shp", package = "sf")) system(cmd) #>INFO: Open of `C:/Users/pi37pat/Documents/R/win-library/3.5/sf/shape/nc.shp' #> using driver `ESRI Shapefile' successful. #> #>Layer name: nc #>Metadata: #> DBF_DATE_LAST_UPDATE=2016-10-26 #>Geometry: Polygon #>Feature Count: 100 #>Extent: (-84.323853, 33.881992) - (-75.456978, 36.589649) #>Layer SRS WKT: #>GEOGCS["GCS_North_American_1927", #> DATUM["North_American_Datum_1927", #> SPHEROID["Clarke_1866",6378206.4,294.9786982]], #> PRIMEM["Greenwich",0], #> UNIT["Degree",0.017453292519943295], #> AUTHORITY["EPSG","4267"]] #>AREA: Real (24.15) #>PERIMETER: Real (24.15) #>CNTY_: Real (24.15) #>CNTY_ID: Real (24.15) #>NAME: String (80.0) #>FIPS: String (80.0) #>FIPSNO: Real (24.15) #>CRESS_ID: Integer (9.0) #>BIR74: Real (24.15) #>SID74: Real (24.15) #>NWBIR74: Real (24.15) #>BIR79: Real (24.15) #>SID79: Real (24.15) #>NWBIR79: Real (24.15)
This is of course a very simple example, and we could have achieved the same using
Still, it shows how we can use GDAL via the command-line from within R independently of other packages, and nothing stops us from using more advanced and/or not already implemented GDAL/OGR utilities in R.
Note also that the RSAGA package uses the command line interface to use SAGA geoalgorithms from within R (see section 9.3).
TauDEM (http://hydrology.usu.edu/taudem/taudem5/index.html) and the Orfeo Toolbox (https://www.orfeo-toolbox.org/) are further examples for offering a command line interface.
At the time of writing it appears that there is only a developer version of an R/TauDEM interface on R-Forge (https://r-forge.r-project.org/R/?group_id=956)).
In any case, you now have the tools to build an interface yourself.
Please keep in mind that the sf package already brings most of the power provided by GDAL, GEOS and proj.4 to R. Instead of using the command-line interface to access the functionality of these external dependencies, sf uses an Rcpp interface. This is an even better approach since it circumvents the command-line and allows a direct access to GDAL/GEOS/Proj.4 all of which are written in either C++ or C.
9.6.2 Spatial database API - PostgreSQL/PostGIS
Spatial database management systems (spatial DBMS) allow to store spatial and non-spatial data in a structured way (as opposed to loose collections of data somewhere stored on disk) and to relate tables (entities) to each other explicitly via unique identifiers (primary and foreign keys) and implicitly via space (think for instance of a spatial join). Geographic data tends to become big quite quickly. Databases are especially useful for large and big data and querying it efficiently spatially and non-spatially. Additionally, spatial DBMS provide multi-user access and topology support. The most important open source spatial database is probably the combination of PostgreSQL with its spatial extension PostGIS (Obe and Hsu 2015).43
From within R, one can easily query data from a (spatial) DBMS which is needed for a specific analysis. Hence, it is not necessary to attach several gigabyte of geographic data to R’s global environment which most likely would crash the R session anyway. To give you an idea how to work with R and PostGIS in tandem, we present SQL queries from “1.4 Hello real word” of the book PostGIS in Action, Second Edition published with Manning (Obe and Hsu 2015).44
The first thing to find out is which tables can be found in the database.
We are only interested in the
restaurants and the
The former represents the locations of fast-food restaurants in the US and the latter are principal US highways.
To find out about attributes available in a table, we can run:
The first query will select
US Route 1 in Maryland (
st_read() allows to read spatial data from a database if it is provided with an open connection to a database and a query.
st_read() needs to know which column represents the geometry (here:
This results in an sf-object named
us_route of type
The next step is to add a 20-mile buffer around the selected highway (Figure 9.4).
Note that this was a spatial query using functions (
ST_Buffer()) you should be already familiar with since you find them also in the sf-package though here they are written in lowercase characters (
In fact, function names of the sf package largely follow the PostGIS naming conventions.46
The last query will find all Hardee restaurants (
HDE) within the buffer zone (Figure 9.4).
Please refer to Obe and Hsu (2015) for a detailed explanation of the spatial SQL query. Finally, it is good practice to close the database connection. Here, it is even more important since the free plan of QGIS Cloud allows only ten users to access the database at the same time.
Unlike PostGIS, sf only supports spatial vector data.
To query and manipulate raster data stored in a PostGIS database, use the rpostgis package (Bucklin and Basille 2018) and/or use command-line tools such as
rastertopgsql which comes as part of the PostGIS installation.
Of course, this subsection was only a very brief introduction to PostgreSQL/PostGIS. Nevertheless, we would like to encourage the practice of storing geographic and non-geographic data in a spatial DBMS while only attaching those subsets to R’s global environment which are needed for further (geo-)statistical analysis. Please refer to Obe and Hsu (2015) for a much more detailed description of the here presented SQL queries and a much more comprehensive introduction to PostgreSQL/PostGIS in general. PostgreSQL/PostGIS is a formidable choice as an open source spatial database. But the same is true for the lightweight SQLite/SpatiaLite database engine and GRASS which uses SQLite in the background (see section ??).
As a final note, if your data is even getting to big for PostgreSQL/PostGIS and you require massive spatial data management and query performance, then the logical next step is to use large-scale geographic querying on distributed computing systems, as for example, provided by GeoMesa (http://www.geomesa.org/) or GeoSpark (http://geospark.datasyslab.org/; Huang et al. 2017).
Create two overlapping polygons (
poly_2) with the help of the sf-package (see Chapter 2).
qgis:union. What is the difference between the two union operations? How can we use the sf package to obtain the same result as QGIS?
Calculate the intersection of
- RQGIS, RSAGA and rgrass7
data(dem, package = "RQGIS")and
data(random_points, package = "RQGIS"). Select randomly a point from
random_pointsand find all
dempixels that can be seen from this point (hint: viewshed). Visualize your result. For example, plot a hillshade, and on top of it the digital elevation model, your viewshed output and the point. Additionally, give
gdalinfovia a system call for a raster file stored on disk of your choice.
Query all Californian highways from the PostgreSQL/PostGIS database living in the QGIS Cloud introduced in this chapter.
Sherman, Gary. 2008. Desktop GIS: Mapping the Planet with Open Source Tools. Pragmatic Bookshelf.
Chambers, John M. 2016. Extending R. CRC Press.
Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal.
Muenchow, Jannes, Patrick Schratz, and Alexander Brenning. 2017. “RQGIS: Integrating R with QGIS for Statistical Geocomputing.” The R Journal 9 (2): 409–28.
Conrad, O., B. Bechtel, M. Bock, H. Dietrich, E. Fischer, L. Gerlitz, J. Wehberg, V. Wichmann, and J. Böhner. 2015. “System for Automated Geoscientific Analyses (SAGA) V. 2.1.4.” Geosci. Model Dev. 8 (7): 1991–2007. https://doi.org/10.5194/gmd-8-1991-2015.
Muenchow, Jannes, Alexander Brenning, and Michael Richter. 2012. “Geomorphic Process Rates of Landslides Along a Humidity Gradient in the Tropical Andes.” Geomorphology 139-140 (February): 271–84. https://doi.org/10.1016/j.geomorph.2011.10.029.
Neteler, Markus, and Helena Mitasova. 2008. Open Source GIS: A GRASS GIS Approach. 3. ed. New York, NY: Springer.
Longley, Paul. 2015. Geographic Information Science & Systems. Fourth edition. Hoboken, NJ: Wiley.
Krug, Rainer M., Núria Roura-Pascual, and David M. Richardson. 2010. “Clearing of Invasive Alien Plants Under Different Budget Scenarios: Using a Simulation Model to Test Efficiency.” Biological Invasions 12 (12): 4099–4112.
Obe, Regina O., and Leo S. Hsu. 2015. PostGIS in Action. Second edition. Shelter Island, NY: Manning.
Bucklin, David, and Mathieu Basille. 2018. “Rpostgis: Linking R with a PostGIS Spatial Database.” The R Journal.
Huang, Zhou, Yiran Chen, Lin Wan, and Xia Peng. 2017. “GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark.” ISPRS International Journal of Geo-Information 6 (9): 285. https://doi.org/10.3390/ijgi6090285.
GRASS GIS and PostGIS are popular in academia and industry and can be seen as products which buck this trend as they are built around the command-line. ↩
Note that there is also the mapedit package but its intention is to allow the quick editing of a few spatial features, and not professional, large-scale cartographic digitizing.↩
An early use of the term ‘bridge’ referred the coupling of R with GRASS (Neteler and Mitasova 2008). Roger Bivand elaborated on this in his talk “Bridges between GIS and R”, delivered at the 2016 GEOSTAT summer school (see slides at http://spatial.nhh.no/misc/). ↩
We are thankful to Manning Publications and Regina Obe as well as Leo Hsu for giving us permission to reproduce their example here.↩
QGIS Cloud lets you store geographic data and maps in the cloud. In the background it uses QGIS Server and PostgreSQL/PostGIS. This way, the reader can follow the PostGIS example without the need to have PostgreSQL/PostGIS installed on a local machine. We are thankful to the QGIS Cloud team for hosting our PostGIS example.↩
By the way, the prefix
ststands for space/time.↩