Who this book is for
This book is for people who want to analyse, visualize and model geographic data with open source software. It is based on R, a statistical programming language that has powerful data processing, visualisation and geospatial capabilities. The book covers a wide range of topics and will be of interest to a wide range of people from many different backgrounds, expecially:
- People who have learned spatial analysis skills using a desktop Geographic Information System (GIS) such as QGIS, ArcMap, GRASS or SAGA, who want access to a powerful (geo)statistical and visualization programming language and the benefits of a command-line approach (Sherman 2008):
With the advent of ‘modern’ GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line.
- Graduate students and researchers from fields specializing in geographic data including Geography, Remote Sensing, Planning, GIS and Geographic Data Science
- Academics and post-graduate students working on projects in fields including Geology, Regional Science, Biology and Ecology, Agricultural Sciences (precision farming), Archaeology, Epidemiology, Transport Modeling, and broadly defined Data Science which require the power and flexibility of R for their research
- Applied researchers and analysts in public, private or third-sector organizations who need the reproducibility, speed and flexibility of a command-line language such as R in applications dealing with spatial data as diverse as Urban and Transport Planning, Logistics, Geo-marketing (store location analysis) and Emergency Planning
The book is designed for intermediate-to-advanced R users interested in geocomputation and R beginners who have prior experience with geographic data. If you are new to both R and geographic data do not be discouraged: we provide links to further materials and describe the nature of spatial data from a beginner’s perspective in Chapter 2 and in links provided below.
How to read this book
The book is divided into three parts:
- Foundations, aimed at getting you up-to-speed with geographic data in R.
- Extensions, which covers advanced techniques.
- Applications, to real-world problems.
The chapters get progressively harder in each so we recommend reading the book in order. A major barrier to geographical analysis in R is its steep learning curve. The chapters in Part 1 aim to address this by providing reproducible code on simple datasets that should ease the process of getting started.
An important aspect of the book from a teaching/learning perspective is the exercises at the end of each chapter. Completing these will develop your skills and equip you with the confidence needed to tackle a range of geospatial problems. Solutions to the exercises, and a number of extended examples, are provided on the book’s supporting website, at geocompr.github.io.
Impatient readers are welcome to dive straight into the practical examples, starting in Chapter 2. However, we recommend reading about the wider context of Geocomputation with R in Chapter 1 first. If you are new to R we also recommend learning more about the language before attempting to run the code chunks provided in each chapter (unless you’re reading the book for an understanding of the concepts). Fortunately for R beginners R has a supportive community that has developed a wealth of resources that can help. We particularly recommend three tutorials: R for Data Science (Grolemund and Wickham 2016) and Efficient R Programming (Gillespie and Lovelace 2016), especially Chapter 2 (on installing and setting-up R/RStudio) and Chapter 10 (on learning to learn), and An introduction to R (Venables, Smith, and Team 2017). A good interactive tutorial is DataCamp’s Introduction to R.
Although R has a steep learning curve the command-line approach advocated in this book can quickly pay-off. As you’ll learn in subsequent chapters, R is an effective tool for tackling a wide range of geographic data challenges. We expect that, with practice, R will become the program of choice in your geospatial toolbox for many applications. Typing and executing commands at the command-line is, in many cases, faster than pointing-and-clicking around the graphical user interface (GUI) a desktop GIS. For some applications such as Spatial Statistics and modeling R may be the only realistic way to get the work done.
As outlined in section 1.2 there are many reasons for using R for geocomputation:
R is well-suited to the interactive use required in many geographic data analysis workflows compared with other languages.
R excels in the rapidly growing fields of Data Science (which includes data carpentry, statistical learning techniques and data visualization) and Big Data (via efficient interfaces to databases and distributed computing systems).
Furthermore R enables a reproducible workflow: sharing scripts underlying your analysis will allow others to build-on your work.
To ensure reproducibility in this book we have made its source code available at github.com/Robinlovelace/geocompr.
There you will find script files in the
code/ folder that generate figures:
when code generating a figure is not provided in the main text of the book the name of the script file that generated it is provided in the caption (see for example the caption for Figure 12.2).
Other languages such as Python, Java and C++ can be used for geocomputation and there are excellent resources for learning geocomputation without R, as discussed in section 1.3. None of these provide the unique combination of package ecosystem, statistical capabilities, visualization options, powerful IDEs offered by the R community. Furthermore, by teaching how to use one language (R) in depth, this book will equip you with the concepts and confidence needed to do geocomputation in other languages.
Geocomputation with R will equip you with knowledge and skills to tackle a wide range of issues, including those with scientific, societal and environmental implications, manifested in geographic data. As described in section 1.1, geocomputation is not only about using computers to process geographic data: it is also about real-world impact. If you are interested in the wider context and motivations behind this book, read on: these are covered in Chapter 1.
Many thanks to all the people who contributed to the book via GitHub: katygregg, erstearns, eyesofbambi, rsbivand, pat-s, gisma, ateucher, annakrystalli, gavinsimpson, Himanshuteli, yutannihilation, katiejolly, layik, mvl22, nickbearman, richfitz, SymbolixAU, wdearden, yihui, chihinl, gregor-d, p-kono, pokyah, schuetzingit, tim-salabim.
We thank Patrick Schratz (University of Jena) for fruitful discussions on mlr and for providing code input (Chapters 11 & 14). We also thank Dr. Alexander Brenning (University of Jena) for providing detailed feedback on Chapter 11. We also would like to thank numerous anonymous reviewers who provided detailed feedback which helped to substantially improve the book in terms of structure, content and programming.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Sherman, Gary. 2008. Desktop GIS: Mapping the Planet with Open Source Tools. Pragmatic Bookshelf.
Grolemund, Garrett, and Hadley Wickham. 2016. R for Data Science. 1 edition. O’Reilly Media.
Gillespie, Colin, and Robin Lovelace. 2016. Efficient R Programming: A Practical Guide to Smarter Programming. O’Reilly Media.
Venables, W.N., D.M. Smith, and R Core Team. 2017. An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics.