10 Bridges to GIS software

An important feature of R is that users must use the command-line interface (CLI) to interact with the computer: you type commands and hit Enter to execute them interactively. The most popular open-source GIS software packages, by contrast, feature a prominent graphical user interface (GUI): you can interact with QGIS, SAGA and gvSIG by typing into (dockable) command-lines, but most users interact with such programs by ‘pointing and clicking their way through life’ as Gary Sherman puts it below (Sherman 2008):62

With the advent of ‘modern’ GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line. Many times you can do something on the command line in a fraction of the time you can do it with a GUI.

Gary Sherman is well-qualified to comment on this matter as the creator of QGIS, the world’s premier open source GUI-based GIS!

The ‘CLI vs GUI’ debate is often adversarial and polarized but it does not have to be: both options are great if well chosen in accordance with your needs and tasks. The advantages of a good CLI such as that provided by R are numerous. Among others, a CLI:

  • Facilitates the automation of repetitive tasks. We strongly dislike the manual execution of iterations, and would recommend to always think about how to solve such tasks programmatically. This way, you most likely discover new programming features and concepts, i.e., you learn and enhance your skill set. By contrast, what are you learning from executing the same tasks a 1000 times? As a nice side-effect, automation is surely more effective and less error-prone.
  • Ensures transparency and reproducibility (which also is the backbone of good scientific practice). In short, it is easier to share your R script than explain a sequence of 10+ ‘points and clicks’.
  • Encourages extending existing software by making it easy to modify and extend functions. If you are missing an algorithm you need, the CLI gives you the freedom to write your own!
  • Is the way to (geo-)data science. Professional and advanced technical skills will certainly enhance your career prospects, and are in dire need across a wide range of disciplines.
  • Is fun, but admittedly that is a subjective argument.

On the other hand, GUI-based GIS systems (particularly QGIS) are also advantageous. For instance, think of:

  • The GUI. The really user-friendly graphical interface spares the user from programming. Though you probably wouldn’t read the book if this were your main objective.
  • Digitizing and all related tools (trace, snap, topological rules, etc.). Note that there is also the new mapedit package but its intention is to allow the quick editing of a few spatial features, and not professional, large-scale cartographic digitizing.
  • Georeferencing.
  • Stereoscopic mapping (e.g., LiDAR and structure from motion).
  • The built-in geodatabase management system often integrated in Desktop GIS (ArcGIS, GRASS GIS) and all related advantages such as object relational modeling, topology, fast (spatial) querying, etc.
  • Map production, in case you only want to create a beautiful map once. If you have to produce it over and over again, then maybe the CLI approach is better.
  • Zooming and dragging on WMS (though this is also possible with mapview and leaflet).

This general overview already points out the differences between R’s CLI and desktop GIS. However, there is more: dedicated GIS software provides hundreds of geoalgorithms that are simply missing in R. The good news is that ‘GIS bridges’ enable the access to these with the comfort of the R command line.63

The R language was originally designed as an interface to and extension of other languages, especially C and FORTRAN, to enable access to statistical algorithms in a user-friendly and intuitive read-evaluate-print loop (REPL) (Chambers 2016). R was not originally intended to be a GIS. This makes the breadth of R’s geospatial capabilities astonishing to many who are unaware of its ability to replicate established GIS software for many operations on spatial data. There are some domains where R can now outperform desktop GIS including spatial statistical modeling, online interactive visualization and the generation of animated or faceted maps.

Instead of implementing existing GIS algorithms in R, it makes sense to avoid ‘reinventing the wheel’ by taking advantage of R’s ability to interface with other languages (especially C++, which is used for much low-level and high-performance GIS work). Using compiled code for new geoprocessing functionality (particularly with the help of the excellent Rcpp package) could form the basis of new R packages, building on the success of sf. However, there is already a wide range of algorithms that can be accessed via R’s interfaces to dedicated GIS software. It makes sense to understand these before moving to develop your own optimized algorithms. For this reason this chapter focuses on ‘bridges’ to the mature GIS products QGIS (via the package RQGIS), SAGA (RSAGA) and GRASS (rgrass7) from within R. Obviously, we here focus on open-source software solutions, however, there is also a bridge to the commercial GIS leader ArcGIS through the RPyGeo package. And the so-called R-ArcGIS bridge allows to use R from within ArcGIS. As a final note, we would like to point out that aside from interfaces to desktop GIS there are also interfaces to geospatial libraries such as GDAL (gdalUtils, rgdal, sf) and GEOS (rgeos, sf). By the end of the chapter you should have a working knowledge of the functionality such packages open up, and a more nuanced understanding of the ‘CLI vs GUI’ debate. As mentioned in chapter 1, doing GIS at the command-line makes it more reproducible, in-line with the principles of Geographic Data Science.

10.1 (R)QGIS

QGIS is one of the most popular open-source GIS (Table 10.1; Graser and Olaya 2015). Its main advantage lies in the fact that it provides a unified interface to several other open-source GIS.

Table 10.1: Comparison between three open-source GIS. Hybrid refers to the support of vector and raster operations.
GIS first release no. functions support
GRASS 1984 >500 hybrid
QGIS 2002 >1000 hybrid
SAGA 2004 >600 hybrid

Note: a Comparing downloads of different providers is rather difficult (see http://spatialgalaxy.net/2011/12/19/qgis-users-around-the-world/), and here also useless since every Windows QGIS download automatically also downloads SAGA and GRASS.

First and foremost, this means that you have access to GDAL/OGR, GRASS and SAGA through QGIS but also to other third-party providers such as TauDEM, Orfeo Toolbox and Lastools (tools for LiDAR data) (Graser and Olaya 2015). To run all these geoalgorithms (frequently more than 1000 depending on your set up) outside of the QGIS GUI, QGIS provides a Python API. RQGIS establishes a tunnel to this Python API through the reticulate package. Basically, functions set_env and open_app are doing this. Note that it is optional to run set_env and open_app since all functions depending on their output will run them automatically if needed. Before running RQGIS you have to make sure to have installed QGIS and all its (third-party) dependencies such as SAGA and GRASS. To help you with the installation process, please follow the steps as detailed in vignette("install_guide", package = "RQGIS") for several platforms (Windows, Linux, MacOS).

Leaving the path-argument of set_env unspecified will search the computer for a QGIS installation. Hence, it is faster to specify explicitly the path to your QGIS installation. Subsequently, open_app sets all paths necessary to run QGIS from within R, and finally creates a so-called QGIS custom application http://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/intro.html#using-pyqgis-in-custom-applications. We are now ready for some QGIS geoprocessing from within R! First of all, we load some data from the spData-package, namely the boroughs of London (lnd) and cycle hire points in London (cycle_hire).

In chapter 2, we already learned how to do a spatial overlay using the sf-package. Of course, any GIS is also able to perform spatial overlays. Here, we would like to know how many cycle points we can find per borough. First of all, we need to come up with the name of the function in QGIS. find_algorithms lets you search all QGIS geoalgorithms with the help of regular expressions. Here, we assume that the short description of the function contains first the word “point” and secondly somewhere later also the word “poly”. If you have no clue at all what to look for you can leave the search_term-argument empty which will return a list of all available QGIS geoalgorithms. If you also want to have a short description for each geoalgorithm, set the name_only-parameter to FALSE.

Now that we know the name of the function (qgis:countpointsinpolygon), we wonder how we can use it. get_usage returns all function parameters and default values. open_help lets you access the corresponding online help.

Finally, we can let QGIS do the work. Note that the workhorse function run_qgis accepts R named arguments, i.e., you can specify the parameter names as returned by get_usage as you would do in any other regular R function. Note also that run_qgis accepts spatial objects residing in R’s global environment as input (here: lnd and cycle_hire). But of course, you could also specify paths to shapefiles stored on disk. Setting the load_output to TRUE automatically loads the QGIS output into R. Since we only indicated the name of the output (“cycle.shp”), run_qgis saves the output to a temporary folder as returned by tempdir(), and loads it into R as an sf-object.

In case you leave some parameters of a geoalgorithm unspecified, run_qgis will automatically use the default values as arguments if available. To find out about the default values, run get_args_man.

In this case the output tells us, had we left the FIELDS-parameter unspecified, our output (attribute) field would have been named “NUMPOINTS” (instead of “no_bikes”).

Other notes:

  • Leaving the output parameter(s) unspecified, saves the resulting QGIS output to a temporary folder created by QGIS. run_qgis prints these paths to the console after successfully running the QGIS engine.
  • If the output consists of multiple files and you have set load_output to TRUE, run_qgis will return a list with each element corresponding to one output file.

To learn more about RQGIS please refer to Muenchow, Schratz, and Brenning (2017).

10.2 (R)SAGA

Similar to QGIS, the System for Automated Geoscientific Analyses (SAGA; Table 10.1) provides the possibility to run SAGA modules from Python (SAGA Python API). In addition, there is also a command line interface (saga_cmd.exe) to execute SAGA modules (see also https://sourceforge.net/p/saga-gis/wiki/Executing%20Modules%20with%20SAGA%20CMD/). RSAGA uses the latter to run SAGA from within R.

Though SAGA is a hybrid GIS, its main focus has been on raster processing, and here particularly on digital elevation models (soil properties, terrain attributes, climate parameters). Hence, SAGA is especially good at the fast processing of large (high-resolution) rasters datasets (Conrad et al. 2015). Therefore, we will introduce RSAGA with a raster use case from Muenchow, Brenning, and Richter (2012). Specifically, we would like to compute the SAGA wetness index from a digital elevation model. First of all, we need to make sure that RSAGA will find SAGA on the computer when called. For this, all RSAGA functions using SAGA in the background make use of rsaga.env(). Usually, rsaga.env() will detect SAGA automatically by searching several likely directories (see its help for more information). However, it is possible to have ‘hidden’ SAGA in the OSGEO4W-installation, a location rsaga.env() does not search automatically. linkSAGA searches your computer for a valid SAGA installation. If it finds one, it adds the newest version to the PATH environment variable thereby making sure that rsaga.env() runs successfully.

Secondly, we need to write the digital elevation model to a SAGA-format. Note that calling data(landslides) attaches two object to the global environment - dem, a digital elevation model in the form of a list, and landslides, a data.frame containing observations representing the presence or absence of a landslide:

The organization of SAGA is modular. Libraries contain so-called modules, i.e., geoalgorithms. To find out which libraries are available, run:

We choose the library ta_hydrology (ta is the abbreviation for terrain analysis). Subsequently, we can access the available modules of a specific library (here: ta_hydrology) as follows:

Similarly to RQGIS::get_usage(), rsaga.get.usage() prints the function parameters of a specific geoalgorithm, e.g., the SAGA Wetness Index, to the console.

Finally, you can run SAGA from within R using RSAGA’s geoprocessing workhorse function rsaga.geoprocessor(). The function expects a parameter-argument list in which you have specified all necessary parameters.

To facilitate the access to the SAGA interface, RSAGA frequently provides user-friendly wrapper-functions with meaningful default values (see RSAGA documentation for examples, e.g., ?rsaga.wetness.index). So the function call for calculating the ‘SAGA Wetness Index’ becomes as simple as:

Of course, we would like to inspect our result visually (Figure 10.1). To load and plot the SAGA output file, we use the raster package.

SAGA wetness index of Mount Mongón, Peru.

Figure 10.1: SAGA wetness index of Mount Mongón, Peru.

You can find a much more extended version of the presented example in the RSAGA vignette vignette("RSAGA-landslides"). This example includes statistical geocomputing, i.e., it uses a GIS to derive terrain attributes as predictors for a non-linear Generalized Additive Model (GAM) to predict spatially landslide susceptibility (Muenchow, Brenning, and Richter 2012). The term statistical geocomputation emphasizes the strength of combining R’s data science power with the geoprocessing power of a GIS which is at the very heart of building a bridge from R to GIS.

10.3 GRASS through rgrass7

The U.S. Army - Construction Engineering Research Laboratory (USA-CERL) created the core of the Geographical Resources Analysis Support System (GRASS) (Table 10.1; Neteler and Mitasova 2008) from 1982 to 1995. Academia continued this work since 1997. Similar to SAGA, GRASS focused on raster processing in the beginning while only later, since GRASS 6.0, adding advanced vector functionality (Bivand, Pebesma, and Gómez-Rubio 2013).

We will introduce rgrass7 with one of the most interesting problems in GIScience - the traveling salesman problem. Suppose a traveling salesman would like to visit 24 customers while covering the shortest distance possible. Additionally, the salesman would like to set out his journey at home, and come back to it after having finished all customer visits. There is a single best solution to this problem, however, to find it, is even for modern computers (mostly) impossible (Longley et al. 2015). In our case, the number of possible solution correspond to (25 - 1)! / 2 possible solutions, i.e., the factorial of 24 divided by 2 (since we do not differentiate between forward or backward direction). Even if one iteration can be done in a nanosecond this still corresponds to 9837145 years. Luckily, there are clever, almost optimal solutions which run in a tiny fraction of this inconceivable amount of time. GRASS GIS provides one of these solutions (for more details see v.net.salesman). In our use case, we would like to find the shortest path between the first 25 bicycle stations (instead of customers) on London’s streets.

Aside from the cycle hire points data, we will need the OpenStreetMap data of London. We download it with the help of the osmdata package (see also section 6.2). We constrain the download of the street network (in OSM language called “highway”) to the bounding box of the cycle hire data, and attach the corresponding data as an sf-object. osmdata_sf returns a list with several spatial objects (points, lines, polygons, etc.). Here, we will only keep the line objects.

Now that we have the data, we can go on and initiate a GRASS session, i.e., we have to create a GRASS geodatabase. The GRASS geodatabase system is based on SQLite. Consequently, different users can easily work on the same project, possibly with different read/write permissions. However, one has to set up this geodatabase (also from within R), and users used to a GIS GUI popping up by one click might find this process a bit intimidating in the beginning. First of all, the GRASS database requires its own directory, and contains a location (see the GRASS GIS Database help pages at grass.osgeo.org for further information). The location in turn simply contains the geodata for one project. Within one location several mapsets can exist, and typically refer to different users. PERMANENT is a mandatory mapset, and created automatically. It stores the projection, the spatial extent and the default resolution for raster data. In order to share spatial data with all users of a project, the database owner can add spatial data to the PERMANENT mapset. Please refer to Neteler and Mitasova (2008) and the GRASS GIS quick start for more information on the GRASS geodatabase system.

You have to set up a location and a mapset if you want to use GRASS from within R. First of all, we need to find out if and where GRASS7 is installed on the computer.

link is a data.frame which contains in its rows the GRASS 7 installations on your computer. Here, we will use a GRASS7 installation. If you have not installed GRASS7 on your computer, we recommend that you do so now. Assuming that we have found a working installation on your computer, we use the corresponding path in initGRASS. Additionally, we specify where to store the geodatabase (gisDbase), name the location london, and use the PERMANENT mapset.

Subsequently, we define the projection, the extent and the resolution.

Once you are familiar how to set up the GRASS environment, it becomes tedious to do so over and over again. Luckily, linkGRASS7 of the link2GI packages lets you do it with one line of code. The only thing you need to provide is a spatial object which determines the projection and the extent of the geodatabase. First, linkGRASS7 finds all GRASS installations on your computer. Since we have set ver_select to TRUE, we can interactively choose one of the found GRASS-installations. If there is just one installation, the linkGRASS7 automatically chooses this one. Secondly, linkGRASS7 establishes a connection to GRASS7.

Before we can use GRASS geoalgorithms, we need to add data to GRASS’s spatial database. Luckily, the convenience function writeVECT does this for us. (Use writeRast in the case of raster data.) In our case we add the street and cycle hire point data while using only the first attribute column, and name them also streets and points.

To perform our network analysis, we need a topological clean street network. GRASS’s v.clean takes care of the removal of duplicates, small angles and dangles, among others. Here, we break lines at each intersection to ensure that the subsequent routing algorithm can actually turn right or left at an intersection, and save the output in a GRASS object named streets_clean. Probably a few of our cycling station points do not exactly lie on a street segment. However, to find the shortest route between them, we need to connect them to the nearest streets segment. v.net’s connect-operator does exactly this. We save its output in streets_points_con.

The resulting clean dataset serves as input for the v.net.salesman-algorithm, which finally finds the shortest route between all cycle hire stations. center_cats requires a numeric range as input. This range represents the points for which a shortest route should be calculated. Since we would like to calculate the route for all cycle stations, we set it to 1-25. To access the GRASS help page of the traveling salesman algorithm, run execGRASS("g.manual", entry = "v.net.salesman").

To visualize our result, we import the output layer into R, and visualize it with the help of the mapview package (Figure 10.2).

Shortest route between 25 cycle hire station on the OSM street network of London.

Figure 10.2: Shortest route between 25 cycle hire station on the OSM street network of London.

Further notes:

  • Please note that we have used GRASS’s geodatabase (based on SQLite) which allows faster processing. That means we have only exported spatial data in the beginning. Then we created new objects but only imported the final result back into R. To find out which datasets are currently available, run execGRASS("g.list", type = "vector,raster", flags = "p").
  • Of course, we could have also accessed an already existing GRASS geodatabase from within R. Prior to importing data into R, you might want to perform some (spatial) subsetting. Use v.select and v.extract for vector data. db.select lets you select subsets of the attribute table of a vector layer without returning the corresponding geometry.
  • You can start R also from within a running GRASS session (for more information please refer to Bivand, Pebesma, and Gómez-Rubio 2013 and this wiki).
  • Refer to the excellent GRASS online help or execGRASS("g.manual", flags = "i") for more information on each available GRASS geoalgorithm.
  • If you would like to use GRASS 6 from within R, use the R package spgrass6.

10.4 When to use what?

To recommend a single R-GIS interface is hard since the usage depends on personal preferences, the tasks at hand and your familiarity with different GIS. The latter means if you have already preferred the GUI of a certain GIS, you are quite likely to use the corresponding interface. That being said, RQGIS is an appropriate choice for most use cases. Its main advantages are:

  • An unified access to several GIS, and therefore the provision of >1000 geoalgorithms. Of course, this includes duplicated functionality, e.g., you can perform overlay-operations using QGIS-, SAGA- or GRASS-geoalgorithms.
  • The automatic data format conversions. For instance, SAGA uses .sdat grid files and GRASS uses its own database format but QGIS will handle the corresponding conversions for you on the fly.
  • RQGIS can also handle spatial objects residing in R as input for geoalgorithms, and loads QGIS output automatically back into R if desired.
  • Its convenience functions to support the access of the online help, R named arguments and automatic default value retrieval. Please note that rgrass7 inspired the latter two features.
  • Currently (but this might change), RQGIS supports newer SAGA (2.3.1) versions than RSAGA (2.2.3).

However, there are use cases when you certainly should use one of the other R-GIS bridges. QGIS only provides access to a subset of GRASS and SAGA functionality. Therefore, to use the complete set of SAGA and GRASS functions, stick with RSAGA and rgrass7. When doing so, make advantage of RSAGA’s numerous user-friendly functions. Note also, that RSAGA offers native R functions for geocomputation such as multi.local.function, pick.from.grid and many more. Finally, if you need topological correct data and/or geodatabase-management functionality, we recommend the usage of GRASS. In addition, if you would like to run simulations with the help of a geodatabase (Krug, Roura-Pascual, and Richardson 2010), use rgrass7 directly since RQGIS always starts a new GRASS session for each call.

10.5 Exercises

  1. Create two overlapping polygons (poly_1 and poly_2) with the help of the sf-package (see chapter 2). Calculate the intersection using:

    • RQGIS, RSAGA and rgrass7
    • sf
  2. Run data(dem, package = "RQGIS") and data(random_points, package = "RQGIS"). Select randomly a point from random_points and find all dem pixels that can be seen from this point (hint: viewshed). Visualize your result. For example, plot a hillshade, and on top of it the digital elevation model, your viewshed output and the point. Additionally, give mapview a try.

References

Sherman, Gary. 2008. Desktop GIS: Mapping the Planet with Open Source Tools. Pragmatic Bookshelf.

Chambers, John M. 2016. Extending R. CRC Press.

Graser, Anita, and Victor Olaya. 2015. “Processing: A Python Framework for the Seamless Integration of Geoprocessing Tools in QGIS.” ISPRS International Journal of Geo-Information 4 (4): 2219–45. https://doi.org/10.3390/ijgi4042219.

Muenchow, Jannes, Patrick Schratz, and Alexander Brenning. 2017. “RQGIS: Integrating R with QGIS for Statistical Geocomputing.” The R Journal 9 (2): 409–28.

Conrad, O., B. Bechtel, M. Bock, H. Dietrich, E. Fischer, L. Gerlitz, J. Wehberg, V. Wichmann, and J. Böhner. 2015. “System for Automated Geoscientific Analyses (SAGA) V. 2.1.4.” Geosci. Model Dev. 8 (7): 1991–2007. https://doi.org/10.5194/gmd-8-1991-2015.

Muenchow, Jannes, Alexander Brenning, and Michael Richter. 2012. “Geomorphic Process Rates of Landslides Along a Humidity Gradient in the Tropical Andes.” Geomorphology 139-140 (February): 271–84. https://doi.org/10.1016/j.geomorph.2011.10.029.

Neteler, Markus, and Helena Mitasova. 2008. Open Source GIS: A GRASS GIS Approach. 3. ed. New York, NY: Springer.

Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. 2nd ed. 2013 edition. New York: Springer.

Longley, Paul, Michael Goodchild, David Maguire, and David Rhind. 2015. Geographic Information Science & Systems. Fourth edition. Hoboken, NJ: Wiley.

Krug, Rainer M., Núria Roura-Pascual, and David M. Richardson. 2010. “Clearing of Invasive Alien Plants Under Different Budget Scenarios: Using a Simulation Model to Test Efficiency.” Biological Invasions 12 (12): 4099–4112. https://doi.org/10.1007/s10530-010-9827-3.


  1. The mature GRASS GIS software package and PostGIS are quite popular in academia and industry, and could be seen as products which buck this trend as they are built around the command-line. In 2008 GRASS developers added a sophisticated GUI, shifting the emphasis slightly away from its CLI. However, GRASS lacks a sophisticated and feature-rich IDE such as RStudio that supports ‘CLI newbies’. On the other hand, PostGIS is the spatial extension of the popular PostgreSQL open source database, and therefore not really a dedicated GIS. This is also highlighted by the fact that PostGIS lacks any geovisualization capabilities and its description of ‘legacy GIS’ on its website. Similar to GRASS, PostgreSQGL provides a (partial) GUI called pgAdmin to facilitate the editing and administration of the database. To make it clear, though PostGIS provides spatial functions, its main purpose is the handling of spatial objects in a relational database management system. Therefore, frequently users store their spatial data in PostGIS, and interact with it through a dedicated GIS software such as QGIS. Of course, you can also use R to access data from PostGIS (sf, rgdal, rpostgis). In summary, a typical workflow would be: perform a large spatial query with PostGIS, then load the result into a further application (QGIS, R) for further geoprocessing.

  2. The term ‘bridge’ was probably first used in the R-spatial world for the coupling of R with GRASS (Neteler and Mitasova 2008). Roger Bivand elaborates on this in his talk “Bridges between GIS and R”, delivered at the 2016 GEOSTAT summer school. The resulting slides can be found on Roger’s personal website at http://spatial.nhh.no/misc in the file geostat_talk16.zip.