Introduction
Everyone loves progress bars. This is even more true when running
long computations in parallel, where you’d really like to have some
approximation of when your job is going to finish. furrr currently has
its own progress bar through the usage of .progress = TRUE,
but in the future this will be deprecated in favor of generic and robust
progress updates through the progressr
package.
If you’ve never heard of progressr, I’d encourage you to read its introduction vignette. One of the neat things about it is that it isn’t limited to just progress bars. progressr is really a framework for progress updates, which can then be relayed to the user using a progress bar, a beeping noise from their computer, or even through email or slack notifications. It works for sequential, multisession, and cluster futures, which means that it even works with remote connections. It currently doesn’t work with multicore, but that is likely to change.
Before we begin, please be aware that progressr is still a new
experimental package. I doubt there will be many breaking changes in it,
but new patterns for signaling progress updates will likely emerge after
enough people start using it. If you’ve used furrr’s
.progress argument, then the solutions presented below
might feel a bit clunkier than that. As progressr gets more usage,
hopefully a simpler unified way of presenting progress information will
emerge that can be used in all of the map-reduce future packages (furrr,
future.apply, and doFuture).
How progressr is used varies slightly depending on whether you are a
package developer or an interactive user. There are two main functions
that are used: progressor(), which makes an object that can
signal progress updates, and with_progress(), which listens
for these progress signals. Generally, progressor() will be
used by a package developer inside of a function that they would like to
produce progress updates. When the user calls that function, they won’t
get any progress notifications unless they wrap the function call in
with_progress(). Additionally, the user has complete
control over how these progress updates are displayed through the use of
a progress handler. In progressr, these all start with
handler_*() and tell progressr how to display the progress
update. This separation of developer API and user API is important, and
can be summarized as:
- Developer:
-
p <- progressor()for making progress signalers -
p()for signaling a unit of progress
-
- User:
-
with_progress()for listening for progress signals -
handler_*()for displaying those caught progress signals
-
Package developers
If you are a package developer using furrr with progressr, the
function from your package that calls future_map() should
first use p <- progressor() to create a progress object,
and then call p() from within .f to signal a
progress update after each iteration of the map. For example, the
following function iterates over a list, x, calling
sum() on each element of the list. At each iteration, we
send a progress update. I’ve also introduced a bit of a delay because
this otherwise would run extremely fast.
my_pkg_fn <- function(x) {
p <- progressor(steps = length(x))
future_map(x, ~{
p()
Sys.sleep(.2)
sum(.x)
})
}From the user’s side, simply calling my_pkg_fun() won’t
display anything:
# No notifications
result <- my_pkg_fn(x)However, once the user wraps this in with_progress(),
notifications are displayed. The default is to use
handler_txtprogressbar(), which creates a progress bar with
utils::txtProgressBar().
with_progress({
result <- my_pkg_fn(x)
})
#> |=============================== | 30%As mentioned before, the user controls how to display
progress updates. You can change to a different handler locally by
providing it as an argument to with_progress(handlers = ),
or you can use handlers() to set them globally. You can
even use multiple handlers. For example,
handlers(handler_progress, handler_beepr) can be used to
generate a progress bar with the progress package and generate beeps
with the beepr package.
Interactive usage
When writing data analysis scripts that use furrr and progressr, the
separation between developer and user APIs is not quite as clear since
you’ll need to generate the progress objects with
progressor(), create the function that signals progress by
calling p(), and call with_progress(). It is
easiest to show this with an example:
plan(multisession, workers = 2)
with_progress({
p <- progressor(steps = length(x))
result <- future_map(x, ~{
p()
Sys.sleep(.2)
sum(.x)
})
})
#> |===================== | 20%Currently, with_progress() doesn’t return the value of
the expression that it evaluates, so you have to assign the
result to result <-. This is likely to change.
Rather than writing an anonymous function, you might want to wrap the
logic of .f up into a real function. The easiest way to do
this right now is to have an extra argument for p that you
can pass through.
plan(multisession, workers = 2)
fn <- function(x, p) {
p()
Sys.sleep(.2)
sum(x)
}
with_progress({
p <- progressor(steps = length(x))
result <- future_map(x, fn, p = p)
})The important thing here is that p <- progressor() is
called from inside with_progress(). You generally can’t
create the progressor object outside of the with_progress()
call. For example, this doesn’t work:
p <- progressor(steps = length(x))
with_progress({
result <- future_map(x, fn, p = p)
})
#> Error in error("length(timestamp) == 0L") :
#> .validate_internal_state(‘handler(type=update) ... end’): length(timestamp) #> == 0L
#> Error in error("length(timestamp) == 0L") :
#> .validate_internal_state(‘reporter_args() ... begin’): length(timestamp) == #> 0LWith dplyr
Because with_progress() doesn’t return the value of
expr right now, current usage of progressr, furrr, and
dplyr is far from perfect. The only way to use them together currently
is to wrap an entire dplyr pipeline in with_progress().
cars <- mtcars |>
group_by(carb) |>
group_nest()
model_fn <- function(data, p) {
Sys.sleep(.5)
mod <- lm(mpg ~ cyl + disp, data = data)
out <- mod$coef
p()
out
}You can create the progressor, p, at the top of the
expression, and then call future_map() in
mutate():
plan(multisession, workers = 2)
with_progress({
p <- progressor(steps = nrow(cars))
cars2 <- cars |>
mutate(mod = future_map(data, model_fn, p = p))
})
#> |================================================ | 67%Or you can wrap model_fn() in another function that
creates p and calls future_map() all at
once:
plan(multisession, workers = 2)
model_mapper <- function(data) {
p <- progressor(steps = length(data))
future_map(data, model_fn, p = p)
}
with_progress({
cars2 <- cars |>
mutate(mod = model_mapper(data))
})
#> |================================================ | 67%An additional constraint (for now), is that
with_progress() will only respect progress updates from the
first progressor object that signals one. This means that the first call
to model_mapper() will signal updates, but the second
won’t.
with_progress({
cars2 <- cars |>
mutate(
mod1 = model_mapper(data),
mod2 = model_mapper(data)
)
})
#> |================================================ | 67%
# ^ Note, we don't get a second progress barHowever, if you know that you are going to do something like this
then you can use the first approach and create a progressor object that
is twice the number of rows in the data frame, and pass it to both calls
to future_map(). The state is maintained between
future_map() calls so you end up with one long progress
bar.
plan(multisession, workers = 2)
with_progress({
p <- progressor(steps = nrow(cars) * 2)
cars2 <- cars |>
mutate(
mod1 = future_map(data, model_fn, p = p),
mod2 = future_map(data, model_fn, p = p)
)
})
#> |=================================== | 50%Conclusion
progressr represents an exciting move towards a unified framework for progress notifications in R, but it is still early in its development cycle and needs more usage and feedback to settle on the best API. In the future, the plan is for furrr to become more tightly integrated with progressr so that this is much easier.
