9 Bridges to GIS software

9.1 Introduction

An important feature of R is that users must use the command-line interface (CLI) to interact with the computer: you type commands and hit Enter to execute them interactively. The most popular open-source GIS software packages, by contrast, feature a prominent graphical user interface (GUI): you can interact with QGIS, SAGA and gvSIG by typing into (dockable) command-lines, but most users interact with such programs by ‘pointing and clicking their way through life’ as Gary Sherman puts it below (Sherman 2008):39

With the advent of ‘modern’ GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line. Many times you can do something on the command line in a fraction of the time you can do it with a GUI.

Gary Sherman is well-qualified to comment on this matter as the creator of QGIS, the world’s premier open source GUI-based GIS!

The ‘CLI vs GUI’ debate is often adversarial and polarized but it does not have to be: both options are great if well chosen in accordance with your needs and tasks. The advantages of a good CLI such as that provided by R are numerous. Among others, a CLI:

  • Facilitates the automation of repetitive tasks. We strongly dislike the manual execution of iterations, and would recommend to always think about how to solve such tasks programmatically. This way, you most likely discover new programming features and concepts, i.e., you learn and enhance your skill set. By contrast, what are you learning from executing the same tasks a 1000 times? As a nice side-effect, automation is surely more effective and less error-prone.
  • Ensures transparency and reproducibility (which also is the backbone of good scientific practice). In short, it is easier to share your R script than explain a sequence of 10+ ‘points and clicks’.
  • Encourages extending existing software by making it easy to modify and extend functions. If you are missing an algorithm you need, the CLI gives you the freedom to write your own!
  • Is the way to (geo-)data science. Professional and advanced technical skills will certainly enhance your career prospects, and are in dire need across a wide range of disciplines.
  • Is fun, but admittedly that is a subjective argument.

On the other hand, GUI-based GIS systems (particularly QGIS) are also advantageous. For instance, think of:

  • The GUI. The really user-friendly graphical interface spares the user from programming. Though you probably wouldn’t read the book if this were your main objective.
  • Digitizing and all related tools (trace, snap, topological rules, etc.). Note that there is also the new mapedit package but its intention is to allow the quick editing of a few spatial features, and not professional, large-scale cartographic digitizing.
  • Georeferencing with ground control points and orthorectification. The process of locating control points, choosing the right transformation model, and evaluating the registration error are inherently interactive, and thus unsuitable to be done with R, or any other programming language for that matter
  • Stereoscopic mapping (e.g., LiDAR and structure from motion).
  • The built-in geodatabase management system often integrated in Desktop GIS (ArcGIS, GRASS GIS) and all related advantages such as object relational modeling, topology, fast (spatial) querying, etc.
  • Map production, in case you only want to create a beautiful map once. If you have to produce it over and over again, then maybe the CLI approach is better.
  • Zooming and dragging on WMS (though this is also possible with mapview and leaflet).

This general overview already points out the differences between R’s CLI and desktop GIS. However, there is more: dedicated GIS software provides hundreds of geoalgorithms that are simply missing in R. The good news is that ‘GIS bridges’ enable the access to these with the comfort of the R command line.40

The R language was originally designed as an interface to and extension of other languages, especially C and FORTRAN, to enable access to statistical algorithms in a user-friendly and intuitive read-evaluate-print loop (REPL) (Chambers 2016). R was not originally intended to be a GIS. This makes the breadth of R’s geospatial capabilities astonishing to many who are unaware of its ability to replicate established GIS software for many operations on spatial data. There are some domains where R can now outperform desktop GIS including spatial statistical modeling, online interactive visualization and the generation of animated or faceted maps.

Instead of implementing existing GIS algorithms in R, it makes sense to avoid ‘reinventing the wheel’ by taking advantage of R’s ability to interface with other languages (especially C++, which is used for much low-level and high-performance GIS work). Using compiled code for new geoprocessing functionality (particularly with the help of the excellent Rcpp package) could form the basis of new R packages, building on the success of sf (E. Pebesma 2018). However, there is already a wide range of algorithms that can be accessed via R’s interfaces to dedicated GIS software. It makes sense to understand these before moving to develop your own optimized algorithms. For this reason this chapter focuses on ‘bridges’ to the mature GIS products QGIS (via the package RQGIS), SAGA (RSAGA) and GRASS (rgrass7) from within R (Table 9.1). Obviously, we here focus on open-source software solutions, however, there is also a bridge to the commercial GIS leader ArcGIS through the RPyGeo package. And the so-called R-ArcGIS bridge allows to use R from within ArcGIS. As a final note, we would like to point out that aside from interfaces to desktop GIS there are also interfaces to geospatial libraries such as GDAL (gdalUtils, rgdal, sf) and GEOS (rgeos, sf). By the end of the chapter you should have a working knowledge of the functionality such packages open up, and a more nuanced understanding of the ‘CLI vs GUI’ debate. As mentioned in Chapter 1, doing GIS at the command-line makes it more reproducible, in-line with the principles of Geographic Data Science.

Table 9.1: Comparison between three open-source GIS. Hybrid refers to the support of vector and raster operations.
GIS first release no. functions support
GRASS 1984 >500 hybrid
QGIS 2002 >1000 hybrid
SAGA 2004 >600 hybrid

Note: a Comparing downloads of different providers is rather difficult (see http://spatialgalaxy.net/2011/12/19/qgis-users-around-the-world/), and here also useless since every Windows QGIS download automatically also downloads SAGA and GRASS.

9.2 (R)QGIS

QGIS is one of the most popular open-source GIS (Table 9.1; Graser and Olaya 2015). Its main advantage lies in the fact that it provides a unified interface to several other open-source GIS. This means that you have access to GDAL/OGR, GRASS and SAGA through QGIS (Graser and Olaya 2015). To run all these geoalgorithms (frequently more than 1000 depending on your set up) outside of the QGIS GUI, QGIS provides a Python API. RQGIS establishes a tunnel to this Python API through the reticulate package. Basically, functions set_env() and open_app() are doing this. Note that it is optional to run set_env() and open_app() since all functions depending on their output will run them automatically if needed. Before running RQGIS you have to make sure to have installed QGIS and all its (third-party) dependencies such as SAGA and GRASS. To install RQGIS a number of dependencies are required, as described in the install_guide vignette, which covers installation on Windows, Linux and Mac. Please install the long-term release of QGIS, i.e. 2.18, since RQGIS so far does not support QGIS 3.

Leaving the path-argument of set_env() unspecified will search the computer for a QGIS installation. Hence, it is faster to specify explicitly the path to your QGIS installation. Subsequently, open_app sets all paths necessary to run QGIS from within R, and finally creates a so-called QGIS custom application (see http://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/intro.html#using-pyqgis-in-custom-applications).

We are now ready for some QGIS geoprocessing from within R!

Unioning polygons often produces so-called sliver polygons. This is the case when the borders of the polygons to union do not overlap completely which is frequently the case when combining data from different sources. Here, we will reuse the incongruent polygons we have already encountered in section 4.2.5. Both polygon datasets are available in the spData package, and for both we would like to use a geographic CRS (see also Chapter 6).

First, we will need a QGIS geoalgorithm that unions polygons. find_algorithms() searches all QGIS geoalgorithms with the help of regular expressions. Assuming that the short description of the function contains the word “union”, we can run:

If you also want to have a short description for each geoalgorithm, set the name_only-parameter to FALSE. If one has no clue at all what the name of a geoalgorithm might be, one can leave the search_term-argument empty which will return a list of all available QGIS geoalgorithms.

The next step is to find out how qgis:union can be used. open_help() opens the online help of the geoalgorithm in question. get_usage() returns all function parameters and default values.

Finally, we can let QGIS do the work. Note that the workhorse function run_qgis() accepts R named arguments, i.e., you can specify the parameter names as returned by get_usage() in run_qgis() as you would do in any other regular R function. Note also that run_qgis() accepts spatial objects residing in R’s global environment as input (here: aggregating_zones and incongruent). But of course, you could also specify paths to spatial vector files stored on disk. Setting the load_output to TRUE automatically loads the QGIS output as an sf-object into R.

Note that the QGIS union operation merges the two input layers into one layer by using the intersection and the symmetrical difference of the two input layers (which by the way is also the default when doing a union operation in GRASS and SAGA). This is not the same as st_union(incongruent, aggregating_zones) (see Exercises)! The QGIS output contains empty geometries and multipart polygons. Empty geometries might lead to problems in subsequent geoprocessing tasks which is why they will be deleted. st_dimension() returns NA if a geometry is empty, and can therefore be used as a filter.

Next we convert multipart polygons into single part polygons (also known as explode geometries or casting). This is necessary for the deletion of sliver polygons later on.

One way to identify slivers is to find polygons with comparatively very small areas, here, e.g., 25000 m2 (see left panel of Figure 9.1).

The next step is to find a function that makes the slivers disappear. Assuming the function or its short description contains the word “sliver”, we can run:

This returns only one geoalgorithm whose parameters can be accessed with the help of get_usage() again.

Conveniently, the user has not to specify each single parameter. In case a parameter is left unspecified, run_qgis() will automatically use the corresponding default value as argument if available. To find out about the default values, run get_args_man().

To finally get rid off the slivers, we specify that all polygons with an area less or equal to 25000 m2 should be joined to that neighboring polygon with the largest area (see right panel of Figure 9.1).

Sliver polygons colored in blue (left panel). Cleaned polygons (right panel).

Figure 9.1: Sliver polygons colored in blue (left panel). Cleaned polygons (right panel).

Other notes:

  • Leaving the output parameter(s) unspecified, saves the resulting QGIS output to a temporary folder created by QGIS. run_qgis prints these paths to the console after successfully running the QGIS engine.
  • If the output consists of multiple files and you have set load_output to TRUE, run_qgis will return a list with each element corresponding to one output file.

To learn more about RQGIS please refer to Muenchow, Schratz, and Brenning (2017).

9.3 (R)SAGA

The System for Automated Geoscientific Analyses (SAGA; Table 9.1) provides the possibility to execute SAGA modules via the command line interface (saga_cmd.exe) (see also https://sourceforge.net/p/saga-gis/wiki/Executing%20Modules%20with%20SAGA%20CMD/) In addition, there is a Python interface (SAGA Python API). RSAGA uses the former to run SAGA from within R.

Though SAGA is a hybrid GIS, its main focus has been on raster processing, and here particularly on digital elevation models (soil properties, terrain attributes, climate parameters). Hence, SAGA is especially good at the fast processing of large (high-resolution) raster datasets (Conrad et al. 2015). Therefore, we will introduce RSAGA with a raster use case from Muenchow, Brenning, and Richter (2012). Specifically, we would like to compute the SAGA wetness index from a digital elevation model. First of all, we need to make sure that RSAGA will find SAGA on the computer when called. For this, all RSAGA functions using SAGA in the background make use of rsaga.env(). Usually, rsaga.env() will detect SAGA automatically by searching several likely directories (see its help for more information).

However, it is possible to have ‘hidden’ SAGA in a location rsaga.env() does not search automatically. linkSAGA searches your computer for a valid SAGA installation. If it finds one, it adds the newest version to the PATH environment variable thereby making sure that rsaga.env() runs successfully. It is only necessary to run the next code chunk if rsaga.env() was unsuccessful (see previous code chunk).

Secondly, we need to write the digital elevation model to a SAGA-format. Note that calling data(landslides) attaches two object to the global environment - dem, a digital elevation model in the form of a list, and landslides, a data.frame containing observations representing the presence or absence of a landslide:

The organization of SAGA is modular. Libraries contain so-called modules, i.e., geoalgorithms. To find out which libraries are available, run:

We choose the library ta_hydrology (ta is the abbreviation for terrain analysis). Subsequently, we can access the available modules of a specific library (here: ta_hydrology) as follows:

rsaga.get.usage() prints the function parameters of a specific geoalgorithm, e.g., the SAGA Wetness Index, to the console.

Finally, you can run SAGA from within R using RSAGA’s geoprocessing workhorse function rsaga.geoprocessor(). The function expects a parameter-argument list in which you have specified all necessary parameters.

To facilitate the access to the SAGA interface, RSAGA frequently provides user-friendly wrapper-functions with meaningful default values (see RSAGA documentation for examples, e.g., ?rsaga.wetness.index). So the function call for calculating the ‘SAGA Wetness Index’ becomes as simple as:

Of course, we would like to inspect our result visually (Figure 9.2). To load and plot the SAGA output file, we use the raster package.

SAGA wetness index of Mount Mongón, Peru.

Figure 9.2: SAGA wetness index of Mount Mongón, Peru.

You can find a much more extended version of the presented example in the RSAGA vignette vignette("RSAGA-landslides"). This example includes statistical geocomputing, i.e., it uses a GIS to derive terrain attributes as predictors for a non-linear Generalized Additive Model (GAM) to predict spatially landslide susceptibility (Muenchow, Brenning, and Richter 2012). The term statistical geocomputation emphasizes the strength of combining R’s data science power with the geoprocessing power of a GIS which is at the very heart of building a bridge from R to GIS.

9.4 GRASS through rgrass7

The U.S. Army - Construction Engineering Research Laboratory (USA-CERL) created the core of the Geographical Resources Analysis Support System (GRASS) (Table 9.1; Neteler and Mitasova 2008) from 1982 to 1995. Academia continued this work since 1997. Similar to SAGA, GRASS focused on raster processing in the beginning while only later, since GRASS 6.0, adding advanced vector functionality (Bivand, Pebesma, and Gómez-Rubio 2013).

We will introduce rgrass7 with one of the most interesting problems in GIScience - the traveling salesman problem. Suppose a traveling salesman would like to visit 24 customers while covering the shortest distance possible. Additionally, the salesman would like to set out his journey at home, and come back to it after having finished all customer visits. There is a single best solution to this problem, however, to find it, is even for modern computers (mostly) impossible (Longley 2015). In our case, the number of possible solutions correspond to (25 - 1)! / 2, i.e., the factorial of 24 divided by 2 (since we do not differentiate between forward or backward direction). Even if one iteration can be done in a nanosecond this still corresponds to 9837145 years. Luckily, there are clever, almost optimal solutions which run in a tiny fraction of this inconceivable amount of time. GRASS GIS provides one of these solutions (for more details see v.net.salesman). In our use case, we would like to find the shortest path between the first 25 bicycle stations (instead of customers) on London’s streets.

Aside from the cycle hire points data, we will need the OpenStreetMap data of London. We download it with the help of the osmdata package (see also section 7.2). We constrain the download of the street network (in OSM language called “highway”) to the bounding box of the cycle hire data, and attach the corresponding data as an sf-object. osmdata_sf returns a list with several spatial objects (points, lines, polygons, etc.). Here, we will only keep the line objects. OpenStreetMap objects come with a lot of columns, streets features almost 500. In fact, we are only interested the geometry column. Nevertheless, we are keeping one attribute column, otherwise we later on run into trouble when trying to provide writeVECT() only with a geometry object (see further below and ?writeVECT for more details). Remember that the geometry column is sticky, hence, even though we are just selecting one attribute, the geometry column will be also returned (see section 2.1.1).

Now that we have the data, we can go on and initiate a GRASS session, i.e., we have to create a GRASS geodatabase. The GRASS geodatabase system is based on SQLite. Consequently, different users can easily work on the same project, possibly with different read/write permissions. However, one has to set up this geodatabase (also from within R), and users used to a GIS GUI popping up by one click might find this process a bit intimidating in the beginning. First of all, the GRASS database requires its own directory, and contains a location (see the GRASS GIS Database help pages at grass.osgeo.org for further information). The location in turn simply contains the geodata for one project. Within one location several mapsets can exist, and typically refer to different users. PERMANENT is a mandatory mapset, and created automatically. It stores the projection, the spatial extent and the default resolution for raster data. In order to share spatial data with all users of a project, the database owner can add spatial data to the PERMANENT mapset. Please refer to Neteler and Mitasova (2008) and the GRASS GIS quick start for more information on the GRASS geodatabase system.

You have to set up a location and a mapset if you want to use GRASS from within R. First of all, we need to find out if and where GRASS7 is installed on the computer.

link is a data.frame which contains in its rows the GRASS 7 installations on your computer. Here, we will use a GRASS7 installation. If you have not installed GRASS7 on your computer, we recommend that you do so now. Assuming that we have found a working installation on your computer, we use the corresponding path in initGRASS. Additionally, we specify where to store the geodatabase (gisDbase), name the location london, and use the PERMANENT mapset.

Subsequently, we define the projection, the extent and the resolution.

Once you are familiar how to set up the GRASS environment, it becomes tedious to do so over and over again. Luckily, linkGRASS7() of the link2GI packages lets you do it with one line of code. The only thing you need to provide is a spatial object which determines the projection and the extent of the geodatabase. First, linkGRASS7() finds all GRASS installations on your computer. Since we have set ver_select to TRUE, we can interactively choose one of the found GRASS-installations. If there is just one installation, the linkGRASS7() automatically chooses this one. Secondly, linkGRASS7() establishes a connection to GRASS7.

Before we can use GRASS geoalgorithms, we need to add data to GRASS’s spatial database. Luckily, the convenience function writeVECT() does this for us. (Use writeRast() in the case of raster data.) In our case we add the street and cycle hire point data while using only the first attribute column, and name them also london_streets and points. Note that we are converting the sf-objects into objects of class Spatial*. In time, rgrass7 will also work with sf-objects.

To perform our network analysis, we need a topological clean street network. GRASS’s v.clean takes care of the removal of duplicates, small angles and dangles, among others. Here, we break lines at each intersection to ensure that the subsequent routing algorithm can actually turn right or left at an intersection, and save the output in a GRASS object named streets_clean. Probably a few of our cycling station points do not exactly lie on a street segment. However, to find the shortest route between them, we need to connect them to the nearest streets segment. v.net’s connect-operator does exactly this. We save its output in streets_points_con.

The resulting clean dataset serves as input for the v.net.salesman-algorithm, which finally finds the shortest route between all cycle hire stations. center_cats requires a numeric range as input. This range represents the points for which a shortest route should be calculated. Since we would like to calculate the route for all cycle stations, we set it to 1-25. To access the GRASS help page of the traveling salesman algorithm, run execGRASS("g.manual", entry = "v.net.salesman").

To visualize our result, we import the output layer into R, convert it into an sf-object , just keep the geometry and visualize it with the help of the mapview package (Figure 9.3).

Shortest route between 25 cycle hire station on the OSM street network of London.

Figure 9.3: Shortest route between 25 cycle hire station on the OSM street network of London.

Further notes:

  • Please note that we have used GRASS’s geodatabase (based on SQLite) which allows faster processing. That means we have only exported spatial data in the beginning. Then we created new objects but only imported the final result back into R. To find out which datasets are currently available, run execGRASS("g.list", type = "vector,raster", flags = "p").
  • Of course, we could have also accessed an already existing GRASS geodatabase from within R. Prior to importing data into R, you might want to perform some (spatial) subsetting. Use v.select and v.extract for vector data. db.select lets you select subsets of the attribute table of a vector layer without returning the corresponding geometry.
  • You can start R also from within a running GRASS session (for more information please refer to Bivand, Pebesma, and Gómez-Rubio 2013 and this wiki).
  • Refer to the excellent GRASS online help or execGRASS("g.manual", flags = "i") for more information on each available GRASS geoalgorithm.
  • If you would like to use GRASS 6 from within R, use the R package spgrass6.

9.5 When to use what?

To recommend a single R-GIS interface is hard since the usage depends on personal preferences, the tasks at hand and your familiarity with different GIS. The latter means if you have already preferred the GUI of a certain GIS, you are quite likely to use the corresponding interface. That being said, RQGIS is an appropriate choice for most use cases. Its main advantages are:

  • An unified access to several GIS, and therefore the provision of >1000 geoalgorithms. Of course, this includes duplicated functionality, e.g., you can perform overlay-operations using QGIS-, SAGA- or GRASS-geoalgorithms.
  • The automatic data format conversions. For instance, SAGA uses .sdat grid files and GRASS uses its own database format but QGIS will handle the corresponding conversions for you on the fly.
  • RQGIS can also handle spatial objects residing in R as input for geoalgorithms, and loads QGIS output automatically back into R if desired.
  • Its convenience functions to support the access of the online help, R named arguments and automatic default value retrieval. Please note that rgrass7 inspired the latter two features.
  • Currently (but this might change), RQGIS supports newer SAGA (2.3.1) versions than RSAGA (2.2.3).

However, there are use cases when you certainly should use one of the other R-GIS bridges. QGIS only provides access to a subset of GRASS and SAGA functionality. Therefore, to use the complete set of SAGA and GRASS functions, stick with RSAGA and rgrass7. When doing so, make advantage of RSAGA’s numerous user-friendly functions. Note also, that RSAGA offers native R functions for geocomputation such as multi.local.function, pick.from.grid and many more. Finally, if you need topological correct data and/or geodatabase-management functionality, we recommend the usage of GRASS. In addition, if you would like to run simulations with the help of a geodatabase (Krug, Roura-Pascual, and Richardson 2010), use rgrass7 directly since RQGIS always starts a new GRASS session for each call.

9.6 Exercises

  1. Create two overlapping polygons (poly_1 and poly_2) with the help of the sf-package (see chapter 2). Calculate the intersection using:

    • RQGIS, RSAGA and rgrass7
    • sf
  2. Run data(dem, package = "RQGIS") and data(random_points, package = "RQGIS"). Select randomly a point from random_points and find all dem pixels that can be seen from this point (hint: viewshed). Visualize your result. For example, plot a hillshade, and on top of it the digital elevation model, your viewshed output and the point. Additionally, give mapview a try.

References

Sherman, Gary. 2008. Desktop GIS: Mapping the Planet with Open Source Tools. Pragmatic Bookshelf.

Chambers, John M. 2016. Extending R. CRC Press.

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal.

Graser, Anita, and Victor Olaya. 2015. “Processing: A Python Framework for the Seamless Integration of Geoprocessing Tools in QGIS.” ISPRS International Journal of Geo-Information 4 (4): 2219–45. https://doi.org/10.3390/ijgi4042219.

Muenchow, Jannes, Patrick Schratz, and Alexander Brenning. 2017. “RQGIS: Integrating R with QGIS for Statistical Geocomputing.” The R Journal 9 (2): 409–28.

Conrad, O., B. Bechtel, M. Bock, H. Dietrich, E. Fischer, L. Gerlitz, J. Wehberg, V. Wichmann, and J. Böhner. 2015. “System for Automated Geoscientific Analyses (SAGA) V. 2.1.4.” Geosci. Model Dev. 8 (7): 1991–2007. https://doi.org/10.5194/gmd-8-1991-2015.

Muenchow, Jannes, Alexander Brenning, and Michael Richter. 2012. “Geomorphic Process Rates of Landslides Along a Humidity Gradient in the Tropical Andes.” Geomorphology 139-140 (February): 271–84. https://doi.org/10.1016/j.geomorph.2011.10.029.

Neteler, Markus, and Helena Mitasova. 2008. Open Source GIS: A GRASS GIS Approach. 3. ed. New York, NY: Springer.

Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. 2nd ed. Springer Science & Business Media.

Longley, Paul. 2015. Geographic Information Science & Systems. Fourth edition. Hoboken, NJ: Wiley.

Krug, Rainer M., Núria Roura-Pascual, and David M. Richardson. 2010. “Clearing of Invasive Alien Plants Under Different Budget Scenarios: Using a Simulation Model to Test Efficiency.” Biological Invasions 12 (12): 4099–4112. https://doi.org/10.1007/s10530-010-9827-3.


  1. GRASS GIS and PostGIS are popular in academia and industry and can be seen as products which buck this trend as they are built around the command-line.

  2. An early use of the term ‘bridge’ referred the coupling of R with GRASS (Neteler and Mitasova 2008). Roger Bivand elaborated on this in his talk “Bridges between GIS and R”, delivered at the 2016 GEOSTAT summer school (see http://spatial.nhh.no/misc/).