library(reprex)
Pre-lecture materials
Read ahead
Acknowledgements
Material for this lecture was borrowed and adopted from
Learning objectives
Debugging R Code
Overall approach
Finding the root cause of a problem is always challenging.
Most bugs are subtle and hard to find because if they were obvious, you would have avoided them in the first place.
A good strategy helps. Below I outline a four step process that I have found useful:
1. Google!
Whenever you see an error message, start by googling it.
If you are lucky, you will discover that it’s a common error with a known solution.
2. Make it repeatable
To find the root cause of an error, you are going to need to execute the code many times as you consider and reject hypotheses.
To make that iteration as quick possible, it’s worth some upfront investment to make the problem both easy and fast to reproduce.
Start by creating a reproducible example (reprex).
- This will help others help you, and often leads to a solution without asking others, because in the course of making the problem reproducible you often figure out the root cause.
Make the example minimal by removing code and simplifying data.
- As you do this, you may discover inputs that do not trigger the error. - Make note of them: they will be helpful when diagnosing the root cause.
In RStudio, you can access reprex from the addins menu, which makes it even easier to point out your code and select the output format.
3. Figure out where it is
It’s a great idea to adopt the scientific method here.
- Generate hypotheses
- Design experiments to test them
- Record your results
This may seem like a lot of work, but a systematic approach will end up saving you time.
Often a lot of time can be wasted relying on my intuition to solve a bug (“oh, it must be an off-by-one error, so I’ll just subtract 1 here”), when I would have been better off taking a systematic approach.
If this fails, you might need to ask help from someone else.
If you have followed the previous step, you will have a small example that is easy to share with others. That makes it much easier for other people to look at the problem, and more likely to help you find a solution.
4. Fix it and test it
Once you have found the bug, you need to figure out how to fix it and to check that the fix actually worked.
Again, it is very useful to have automated tests in place.
- Not only does this help to ensure that you have actually fixed the bug, it also helps to ensure you have not introduced any new bugs in the process.
- In the absence of automated tests, make sure to carefully record the correct output, and check against the inputs that previously failed.
Something’s Wrong!
Once you have made the error repeatable, the next step is to figure out where it comes from.
R has a number of ways to indicate to you that something is not right.
There are different levels of indication that can be used, ranging from mere notification to fatal error. Executing any function in R may result in the following conditions.
message
: A generic notification/diagnostic message produced by themessage()
function; execution of the function continueswarning
: An indication that something is wrong but not necessarily fatal; execution of the function continues. Warnings are generated by thewarning()
functionerror
: An indication that a fatal problem has occurred and execution of the function stops. Errors are produced by thestop()
function.condition
: A generic concept for indicating that something unexpected has occurred; programmers can create their own custom conditions if they want.
Nevertheless, R doesn’t give an error, because it has a useful value that it can return, the NaN
value.
The warning is just there to let you know that something unexpected happen.
Depending on what you are programming, you may have intentionally taken the log of a negative number in order to move on to another section of code.
What happened?
- Well, the first thing the function does is test if
x > 0
. - But you can’t do that test if
x
is aNA
orNaN
value. - R doesn’t know what to do in this case so it stops with a fatal error.
We can fix this problem by anticipating the possibility of NA
values and checking to see if the input is NA
with the is.na()
function.
<- function(x) {
print_message2 if(is.na(x))
print("x is a missing value!")
else if(x > 0)
print("x is greater than zero")
else
print("x is less than or equal to zero")
invisible(x)
}
Now we can run the following.
print_message2(NA)
[1] "x is a missing value!"
And all is fine.
Now what about the following situation.
<- log(c(-1, 2)) x
Warning in log(c(-1, 2)): NaNs produced
print_message2(x)
Error in if (is.na(x)) print("x is a missing value!") else if (x > 0) print("x is greater than zero") else print("x is less than or equal to zero"): the condition has length > 1
Now what?? Why are we getting this warning?
The warning says “the condition has length > 1 and only the first element will be used”.
The problem here is that I passed print_message2()
a vector x
that was of length 2 rather then length 1.
Inside the body of print_message2()
the expression is.na(x)
returns a vector that is tested in the if
statement.
However, if
cannot take vector arguments, so you get a warning.
The fundamental problem here is that print_message2()
is not vectorized.
We can solve this problem two ways.
- Simply not allow vector arguments.
- The other way is to vectorize the
print_message2()
function to allow it to take vector arguments.
For the first way, we simply need to check the length of the input.
<- function(x) {
print_message3 if(length(x) > 1L)
stop("'x' has length > 1")
if(is.na(x))
print("x is a missing value!")
else if(x > 0)
print("x is greater than zero")
else
print("x is less than or equal to zero")
invisible(x)
}
Now when we pass print_message3()
a vector, we should get an error.
print_message3(1:2)
Error in print_message3(1:2): 'x' has length > 1
Vectorizing the function can be accomplished easily with the Vectorize()
function.
<- Vectorize(print_message2)
print_message4 <- print_message4(c(-1, 2)) out
[1] "x is less than or equal to zero"
[1] "x is greater than zero"
You can see now that the correct messages are printed without any warning or error.
Being able to answer these questions is important not just for your own sake, but in situations where you may need to ask someone else for help with debugging the problem.
Seasoned programmers will be asking you these exact questions.
Debugging Tools in R
R provides a number of tools to help you with debugging your code. The primary tools for debugging functions in R are
traceback()
: prints out the function call stack after an error occurs; does nothing if there’s no errordebug()
: flags a function for “debug” mode which allows you to step through execution of a function one line at a timebrowser()
: suspends the execution of a function wherever it is called and puts the function in debug modetrace()
: allows you to insert debugging code into a function at specific placesrecover()
: allows you to modify the error behavior so that you can browse the function call stack
These functions are interactive tools specifically designed to allow you to pick through a function. There is also the more blunt technique of inserting print()
or cat()
statements in the function.
Using traceback()
The traceback()
function prints out the function call stack after an error has occurred.
The function call stack is the sequence of functions that was called before the error occurred.
For example, you may have a function a()
which subsequently calls function b()
which calls c()
and then d()
.
If an error occurs, it may not be immediately clear in which function the error occurred.
The traceback()
function shows you how many levels deep you were when the error occurred.
The traceback()
function must be called immediately after an error occurs. Once another function is called, you lose the traceback.
Looking at the traceback is useful for figuring out roughly where an error occurred but it’s not useful for more detailed debugging. For that you might turn to the debug()
function.
Using debug()
Click here for how to use debug()
with an interactive browser.
The debug()
function initiates an interactive debugger (also known as the “browser” in R) for a function. With the debugger, you can step through an R function one expression at a time to pinpoint exactly where an error occurs.
The debug()
function takes a function as its first argument. Here is an example of debugging the lm()
function.
> debug(lm) ## Flag the 'lm()' function for interactive debugging
> lm(y ~ x)
in: lm(y ~ x)
debugging : {
debug<- x
ret.x <- y
ret.y <- match.call()
cl
...if (!qr)
$qr <- NULL
z
z
} 2]> Browse[
Now, every time you call the lm()
function it will launch the interactive debugger. To turn this behavior off you need to call the undebug()
function.
The debugger calls the browser at the very top level of the function body. From there you can step through each expression in the body. There are a few special commands you can call in the browser:
n
executes the current expression and moves to the next expressionc
continues execution of the function and does not stop until either an error or the function exitsQ
quits the browser
Here’s an example of a browser session with the lm()
function.
2]> n ## Evalute this expression and move to the next one
Browse[: ret.x <- x
debug2]> n
Browse[: ret.y <- y
debug2]> n
Browse[: cl <- match.call()
debug2]> n
Browse[: mf <- match.call(expand.dots = FALSE)
debug2]> n
Browse[: m <- match(c("formula", "data", "subset", "weights", "na.action",
debug"offset"), names(mf), 0L)
While you are in the browser you can execute any other R function that might be available to you in a regular session. In particular, you can use ls()
to see what is in your current environment (the function environment) and print()
to print out the values of R objects in the function environment.
You can turn off interactive debugging with the undebug()
function.
undebug(lm) ## Unflag the 'lm()' function for debugging
Using recover()
Click here for how to use recover()
with an interactive browser.
The recover()
function can be used to modify the error behavior of R when an error occurs. Normally, when an error occurs in a function, R will print out an error message, exit out of the function, and return you to your workspace to await further commands.
With recover()
you can tell R that when an error occurs, it should halt execution at the exact point at which the error occurred. That can give you the opportunity to poke around in the environment in which the error occurred. This can be useful to see if there are any R objects or data that have been corrupted or mistakenly modified.
> options(error = recover) ## Change default R error behavior
> read.csv("nosuchfile") ## This code doesn't work
in file(file, "rt") : cannot open the connection
Error : Warning message:
In additionfile(file, "rt") :
In : No such file or directory
cannot open file ’nosuchfile’
0 to exit
Enter a frame number, or
1: read.csv("nosuchfile")
2: read.table(file = file, header = header, sep = sep, quote = quote, dec =
3: file(file, "rt")
: Selection
The recover()
function will first print out the function call stack when an error occurrs. Then, you can choose to jump around the call stack and investigate the problem. When you choose a frame number, you will be put in the browser (just like the interactive debugger triggered with debug()
) and will have the ability to poke around.
Summary
- There are three main indications of a problem/condition:
message
,warning
,error
; only anerror
is fatal - When analyzing a function with a problem, make sure you can reproduce the problem, clearly state your expectations and how the output differs from your expectation
- Interactive debugging tools
traceback
,debug
,recover
,browser
, andtrace
can be used to find problematic code in functions - Debugging tools are not a substitute for thinking!
Post-lecture materials
Final Questions
Here are some post-lecture questions to help you think about the material discussed.