# 15 Conclusion

## Prerequisites

Like the introduction, this concluding chapter contains few code chunks. But its prerequisites are demanding. It assumes that you have:

• Read-through and attempted the exercises in all the chapters of Part 1 (Foundations).
• Grasped the diversity of methods that build on these foundations, by following the code and prose in Part 2 (Extensions).
• Considered how you can use geocomputation to solve real-world problems, at work and beyond, after engaging with Part 3 (Applications).

The aim is to consolidate knowledge and skills for geocomputation and inspire future directions of application and development. The next section (15.1) discusses the wide range of packages for geographic data in R and how to decide between them. Section 15.2 describes gaps in the book’s contents and explains why some areas of research were deliberately omitted while others were emphasized. This discussion leads to the question (which is answered in section 15.3): having read this book, where next? The final section (15.4) returns to the wider issues raised in Chapter 1 and considers how geocomputation can be used for social benefit.

## 15.1 Package choice

A characteristic of R is that there are often multiple ways to achieve the same result. Geocomputation with R is no exception. The code chunk below illustrates this by using three functions covered in Chapters 3 and 5 based on the sf package to combine the 16 regions of New Zealand into a single geometry:

library(spData)
nz_u1 = sf::st_union(nz)
nz_u2 = aggregate(nz["Population"], list(rep(1, nrow(nz))), sum)
nz_u3 = dplyr::summarise(nz, t = sum(Population))
identical(nz_u1, nz_u2$geometry) #> [1] TRUE identical(nz_u1, nz_u3$geom)
#> [1] TRUE

Although the classes, attributes and column names of the resulting objects nz_u1 to nz_u3 differ, their geometries are identical. This is verified using the base R function identical().79 Which to use? It depends. The former only processes the geometry data contained in nz so is faster. The other options performed attribute operations, which may be useful for subsequent steps.

The wider point is that there are often multiple options to choose from when working with geographic data in R, even within a single package. The range of options grows further when other R packages are considered: you could achieve the same result using the older sp package, for example. We recommend using sf and the other packages showcased in this book, for reasons outlined in Chapter 2, but it’s worth being aware of alternatives and being able to justify your choice of software.

We deliberately covered both tidyverse and base R approaches to attribute data operations. Chapter 3 showed how both nz[, "Name"] and nz %>% select(Name) can be used to achieve the same result, for example. The overlap is highlighted because each approach has advantages: the pipe syntax is popular and appealing to some, while base R is more stable, and is well-known to others. Choosing between them is therefore largely a matter of preference, but beware of pitfalls when using tidyverse functions to handle spatial data (see the spatial-tidyverse vignette at geocompr.github.io).

While commonly needed subsetting functions were covered in depth, in two packages, many others were omitted. Chapter 1 mentions 20+ influential spatial packages that have been developed over the years. Although each package covered in this book has a different emphasis, there are overlaps between them and it is important to remember that there are dozens of packages for geographic data not covered in this book. There are 176 packages in the Spatial Task View alone (as of summer 2018); more packages and countless functions for geographic data are developed each year. It would be impossible to do justice to all of them in a single book.

The volume and rate of evolution in R’s spatial ecosystem may seem overwhelming. Our advice in this context applies equally to other domains of knowledge: learn how to use one thing in depth but have a general understand of the breadth of options available to solve problems in your domain with other R packages (section 15.3 covers developments in other languages). Of course, some packages perform much better than others, making package selection an important decision. From this diversity, we have focused on packages that are future-proof (they will work long into the future), high performance (relative to other R packages) and complimentary. But there is still overlap in the packages we have used, as illustrated by the diversity of packages for making maps, for example (see Chapter 8).

Package overlap is not necessarily a bad thing. It can increase resilience, performance (partly driven by friendly competition and mutual learning between developers) and choice, a key feature of open source software. In this context the decision to use a particular ‘stack’, such as the sf/tidyverse/raster ecosystem advocated in this book should be made with knowledge of alternatives. The sp/rgdal/rgeos ‘stack’, for example, can do many of the things covered in this book and used in more than 300 packages.80 It is also being aware of promising alternatives that are under development. The package stars, for example, provides a new class system for working with spatiotemporal data. If you are interested in this topic, you can check for updates on the package’s source code and the broader SpatialTemporal Task View. The same principle applies to other domains: it is important to justify software choices and review software decisions based on up-to-date information.

## 15.2 Gaps and overlaps

There are a number of gaps in, and some overlaps between, the topics covered in this book. We have been selective, emphasizing some topics while omitting others. We have tried to emphasize topics that are most commonly needed in real-world applications such as geographic data operations, projections, data read/write and visualization. These topics appear repeatedly in the chapters, a substantial area of overlap designed to consolidate these essential skills for geocomputation.

On the other hand, we have omitted topics that are less commonly used, or which are well catered for in other resources. Point pattern analysis, spatial interpolation (kriging) and spatial epidemiological modeling, for example, are important topics. But there is already excellent material on these things such as Baddeley, Rubak, and Turner (2015) and Bivand, Pebesma, and Gómez-Rubio (2013). Another topic which we barely touched is remote sensing though especially everything related to raster analysis is a good introduction to remote sensing with R. If you want to know more, you might find Wegmann, Leutner, and Dech (2016) interesting.

Instead of providing the reader with spatial statistical modeling and inference, we mainly chose to present machine-learning algorithms (see Chapters 11 and 14). Again, the reason was that there are already great books out there covering these topics, especially with ecological use cases (among others, Zuur et al. 2009, 2017). In case, you are more interested in spatial statistics using Bayesian modeling, check out also Blangiardo and Cameletti (2015).

Finally, we have largely omitted big data analytics. This might seem surprising since especially geographic data can become big really fast. But the prerequisite for doing big data analytics is to know how to solve a problem on a small dataset. Once you have learned that you can apply the exact same techniques on big data questions, though of course you need to expand your toolbox. The first thing to learn is to handle spatial data queries. This is because big data analytics often boil down to extracting a small amount of data from a database for a specific statistical analysis. For this, we have provided an introduction to spatial databases and how to use a GIS from within R in chapter 9. If you really have to do the analysis on a big or even the complete dataset, hopefully, the problem you are trying to solve is embarrassingly parallel. For this, you need to learn a system that is able to do this parallelization efficiently such as Hadoop, GeoMesa (http://www.geomesa.org/) or GeoSpark (http://geospark.datasyslab.org/; Huang et al. 2017). But still, you are applying the same techniques and concepts you have used on small datasets to answer a big data question, the only difference is that you then do it in a big data setting.

## 15.4 Geo* for social benefit

This is a technical book so it makes sense for the next steps to also be technical. But there are many non-technical issues to consider, now you understand what is possible with geographic data in R. This section returns to the definition of geocomputation and wider issues covered in Chapter 1. It argues for the methods to be used to tackle some of the planet’s most pressing problems. The use of geo* rather than geocomputation is deliberate. Many terms, including geographic data science, geographic information systems and geoinformatics, capture the range of possibilities opened-up by geospatial software and knowledge of data. But geocomputation has advantages: a concise term that defines a field with with three main ingredients:

• The creative use of geographic data.
• Application to real-world problems for social benefit.
• Building tools using a ‘scientific’ approach (Openshaw and Abrahart 2000).

Only one of these ingredients is technical. We believe the broader non-technical aims are what make geospatial work so rewarding, and this is an asset of geocomputation: its application to solve important problems.

Reproducibility is an additional ingredient that can ensure geo* work is socially beneficial, or at least benign. It supports creativity, encouraging the focus of methods to shift away from the basics (which are readily available through shared code, avoiding many people ‘reinventing the wheel’) and towards applications. Reproducibility encourages geocomputation for social benefit because it makes geographic data analysis publicly accessible and transparent.

The benefits of reproducibility can be illustrated with the example of using geocomputation to increase sales of perfume. If the methods are hidden and cannot reproduced, few people can benefit (except for the perfume company who commissioned the work!). If the underlying code is made open and reproducible, by contrast, the methods can be re-purposed or improved (which would also benefit the perfume company). Reproducibility encourages socially but also economically beneficial collaboration.82 The importance of reproducibility, and other non-technical ingredients in the field of geocomputation, are further discussed in an open access article celebrating ‘21+ years of geocomputation’ (Harris et al. 2017).

### References

Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with R. CRC Press.

Bivand, Roger, Edzer J Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. Vol. 747248717. Springer.

Wegmann, Martin, Benjamin Leutner, and Stefan Dech, eds. 2016. Remote Sensing and GIS for Ecologists: Using Open Source Software. Data in the Wild. Exeter: Pelagic Publishing.

Zuur, Alain, Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Statistics for Biology and Health. New York: Springer-Verlag.

Zuur, Alain F., Elena N. Ieno, Anatoly A. Saveliev, and Alain F. Zuur. 2017. Beginner’s Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA. Vol. Volume 1: Using GLM and GLMM. Newburgh, United Kingdom: Highland Statistics Ltd.

Blangiardo, Marta, and Michela Cameletti. 2015. Spatial and Spatio-Temporal Bayesian Models with R-INLA. Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118950203.

Huang, Zhou, Yiran Chen, Lin Wan, and Xia Peng. 2017. “GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark.” ISPRS International Journal of Geo-Information 6 (9): 285. https://doi.org/10.3390/ijgi6090285.

Chambers, John M. 2016. Extending R. CRC Press.

Garrard, Chris. 2016. Geoprocessing with Python. Shelter Island, NY: Manning Publications.

Brus, D. J. 2018. “Sampling for Digital Soil Mapping: A Tutorial Supported by R Scripts.” Geoderma, August. https://doi.org/10.1016/j.geoderma.2018.07.036.

Openshaw, Stan, and Robert J. Abrahart, eds. 2000. Geocomputation. 1 edition. London ; New York: CRC Press.

Harris, Richard, David O’Sullivan, Mark Gahegan, Martin Charlton, Lex Comber, Paul Longley, Chris Brunsdon, et al. 2017. “More Bark Than Bytes? Reflections on 21+ Years of Geocomputation.” Environment and Planning B: Urban Analytics and City Science, July. https://doi.org/10.1177/2399808317710132.

1. The first operation, undertaken by the function st_union(), creates an object of class sfc (a simple feature column). The latter two operations create sf objects, each of which contains a simple feature column. Therefore it is the geometries contained in simple feature columns, not the objects themselves, that are identical.

2. At the time of writing 450 package Depend or Import sp, showing that its data structures are widely used and have been extended in many directions. The equivalent number for sf was 60 in summer 2018; with the growing popularity of sf, this is set to grow.

3. R’s strengths relevant to our definition geocomputation include its emphasis on scientific reproducibility, widespread use in academic research and unparalleled support for statistical modeling of geographic data. Furthermore, we advocate learning one language (R) for geocomputation in depth before delving into other languages/frameworks because of the costs associated with context switching. It is preferable to have expertise in one language than basic knowledge of many.

4. One accessible way to contribute upstream is creating a reprex (reproducible example) to highlight a bug in the package’s issue tracker, as outlined in section 10.2.