This book is aimed at people who want to do spatial data analysis, visualization and modeling using open source software and reproducible workflows. It is based on R, a flexible language for ‘data science’ with powerful geospatial capabilities and a strong ecosystem of add-on packages dedicated to spatial data (see the ‘Spatial Task View’ at cran.r-project.org/web/views).
R enables reproducibility through its command-line interface and ensures accessibility because it is freely available and works on most modern operating systems (including Linux, Windows and Mac). The book will therefore be of interest to a wide range of people worldwide, although we expect it to be especially useful for:
- People who have learned spatial analysis skills using a desktop Geographic Information System (GIS) such as QGIS, ArcMap, GRASS or SAGA, who want access to a powerful (geo)statistical and visualization programming language and the benefits of a command-line approach (Sherman 2008):
With the advent of ‘modern’ GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line.
- Graduate students and researchers from fields specializing in geographic data including Geography, Remote Sensing, Planning, GIS and Geographic Data Science
- Academics and post-graduate students working on projects in fields including Geology, Regional Science, Biology and Ecology, Agricultural Sciences (precision farming), Archaeology, Epidemiology, Transport Modeling, and broadly defined Data Science which require the power and flexibility of R for their research
- Applied researchers and analysts in public, private or third-sector organizations who need the reproducibility, speed and flexibility of a command-line language such as R in applications dealing with spatial data as diverse as Urban and Transport Planning, Logistics, Geo-marketing (store location analysis) and Emergency Planning
The book is designed for intermediate-to-advanced R users interested in geocomputation and R beginners who have prior experience with geographic data. If you are new to both R and geographic data do not be discouraged: we provide links to further materials and describe the nature of spatial data from a beginner’s perspective in Chapter 2 and in links provided below.
We aim to make R’s famously steep learning curve more mellow and less rollercoaster: the chapters increase in difficulty as the book progresses; each chapter starts relatively easy and covers the most important topics first to make the book as accessible as possible. Exercises can be found at the end of each chapter. Completing these encourages using R interactively to solve geospatial problems, ensuring you can operationalize the concepts and code in each chapter.
Impatient readers are welcome to dive straight into the practical examples, starting in Chapter 2. However, we recommend reading about the wider context of Geocomputation with R in Chapter 1 first. If you are new to R we also recommend learning more about the language before attempting to run the code chunks provided in each chapter (unless you’re reading the book for an understanding of the concepts). Fortunately for R beginners R has a supportive community that has developed a wealth of resources that can help. We particularly recommend three tutorials: R for Data Science (Grolemund and Wickham 2016) and Efficient R Programming (Gillespie and Lovelace 2016), especially Chapter 2 (on installing and setting-up R/RStudio) and Chapter 10 (on learning to learn), and An introduction to R (Venables, Smith, and Team 2017). A good interactive tutorial is DataCamp’s Introduction to R.
Although R has a steep learning curve the command-line approach advocated in this book can quickly pay-off. As you’ll learn in subsequent chapters, R is an effective tool for tackling a wide range of geographic data challenges. We expect that, with practice, R will become the program of choice in your geospatial toolbox for many applications. Typing and executing commands at the command-line is, in many cases, faster than pointing-and-clicking around the graphical user interface (GUI) a desktop GIS. For some applications such as Spatial Statistics and modeling R may be the only realistic way to get the work done.
As outlined in section 1.2 there are many reasons for using R for geocomputation:
R is well-suited to the interactive use required in many geographic data analysis workflows compared with other languages.
R excels in the rapidly growing fields of Data Science (which includes data carpentry, statistical learning techniques and data visualization) and Big Data (via efficient interfaces to databases and distributed computing systems).
Furthermore R enables a reproducible workflow: sharing scripts underlying your analysis will allow others to build-on your work.
To ensure reproducibility in this book we have made its source code available at github.com/Robinlovelace/geocompr.
There you will find script files in the
code/ folder that generate figures:
when code generating a figure is not provided in the main text of the book the name of the script file that generated it is provided in the caption (see for example the caption for Figure 12.2).
Other languages such as Python, Java and C++ can be used for geocomputation and there are excellent resources for learning geocomputation without R, as discussed in section 1.3. None of these provide the unique combination of package ecosystem, statistical capabilities, visualization options, powerful IDEs offered by the R community. Furthermore, by teaching how to use one language (R) in depth, this book will equip you with the concepts and confidence needed to do geocomputation in other languages.
Geocomputation with R will equip you with knowledge and skills to tackle a wide range of issues, including those with scientific, societal and environmental implications, manifested in geographic data. As described in section 1.1, geocomputation is not only about using computers to process geographic data: it is also about real-world impact. If you are interested in the wider context and motivations behind this book, read on: these are covered in Chapter 1.
Many thanks to all the people who contributed to the book via GitHub: katygregg, erstearns, eyesofbambi, rsbivand, pat-s, gisma, ateucher, gavinsimpson, Himanshuteli, yutannihilation, katiejolly, layik, mvl22, nickbearman, richfitz, wdearden, yihui, chihinl, gregor-d, p-kono, pokyah, tim-salabim.
We thank Patrick Schratz (University of Jena) for fruitful discussions on mlr and for providing code input (Chapters 11 & 14). We also thank Dr. Alexander Brenning (University of Jena) for providing detailed feedback on Chapter 11. We also would like to thank numerous anonymous reviewers who provided detailed feedback which helped to substantially improve the book in terms of structure, content and programming.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Sherman, Gary. 2008. Desktop GIS: Mapping the Planet with Open Source Tools. Pragmatic Bookshelf.
Grolemund, Garrett, and Hadley Wickham. 2016. R for Data Science. 1 edition. O’Reilly Media.
Gillespie, Colin, and Robin Lovelace. 2016. Efficient R Programming: A Practical Guide to Smarter Programming. O’Reilly Media.
Venables, W.N., D.M. Smith, and R Core Team. 2017. An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics.