gt-cols_nanoplot – The gt package

The `cols_nanoplot()` function

Let’s make some nanoplots with the illness dataset. The columns beginning with ‘day’ all contain ordered measurement values, comprising seven individual daily results. Using cols_nanoplot() we create a new column to hold the nanoplots (with new_col_name = "nanoplots"), referencing the columns containing the data (with columns = starts_with("day")). It’s also possible to define a column label here using the new_col_label argument.

illness |>
  dplyr::slice_head(n = 10) |>
  gt(rowname_col = "test") |>
  tab_header("Partial summary of daily tests performed on YF patient") |>
  tab_stubhead(label = md("**Test**")) |>
  cols_hide(columns = starts_with("norm")) |>
  fmt_units(columns = units) |>
  cols_nanoplot(
    columns = starts_with("day"),
    new_col_name = "nanoplots",
    new_col_label = md("*Progression*")
  ) |>
  cols_align(align = "center", columns = nanoplots) |>
  cols_merge(columns = c(test, units), pattern = "{1} ({2})") |>
  tab_footnote(
    footnote = "Measurements from Day 3 through to Day 8.",
    locations = cells_column_labels(columns = nanoplots)
  )

Test	Progression¹
Partial summary of daily tests performed on YF patient
Viral load (copies per mL)
WBC (×10⁹/L)
Neutrophils (×10⁹/L)
RBC (×10¹²/L)
Hb (g/L)
PLT (×10⁹/L)
ALT (U/L)
AST (U/L)
TBIL (µmol/L)
DBIL (µmol/L)
¹ Measurements from Day 3 through to Day 8.

The previous table showed us some line-based nanoplots. We can also make very small bar plots with cols_nanoplot(). Let’s take the pizzaplace dataset and make a small summary table showing daily pizza sales by type (there are four types). This will be limited to the first ten days of pizza sales in 2015, so, there will be ten rows in total. We can use plot_type = "bar" to make bar plots from the daily sales counts in the chicken, classic, supreme, and veggie columns. Because we know there will always be four bars (one for each type of pizza) we can be a little creative and apply colors to each of the bars through use of the data_bar_fill_color argument in nanoplot_options().

pizzaplace |>
  dplyr::count(type, date) |>
  tidyr::pivot_wider(names_from = type, values_from = n) |>
  dplyr::slice_head(n = 10) |>
  gt(rowname_col = "date") |>
  tab_header(
    title = md("First Ten Days of Pizza Sales in 2015")
  ) |>
  cols_nanoplot(
    columns = c(chicken, classic, supreme, veggie),
    plot_type = "bar",
    autohide = FALSE,
    new_col_name = "pizzas_sold",
    new_col_label = "Sales by Type",
    options = nanoplot_options(
      show_data_line = FALSE,
      show_data_area = FALSE,
      data_bar_stroke_color = "transparent",
      data_bar_fill_color = c("brown", "gold", "purple", "green")
    )
  ) |>
  cols_width(pizzas_sold ~ px(150)) |>
  cols_align(columns = -date, align = "center") |>
  fmt_date(columns = date, date_style = "yMMMEd") |>
  opt_all_caps()

	chicken	classic	supreme	veggie
First Ten Days of Pizza Sales in 2015
Thu, Jan 1, 2015	36	46	39	41
Fri, Jan 2, 2015	32	57	45	31
Sat, Jan 3, 2015	42	47	32	37
Sun, Jan 4, 2015	28	26	28	24
Mon, Jan 5, 2015	31	37	28	29
Tue, Jan 6, 2015	31	48	31	37
Wed, Jan 7, 2015	28	45	33	32
Thu, Jan 8, 2015	31	47	46	49
Fri, Jan 9, 2015	21	40	34	32
Sat, Jan 10, 2015	27	37	40	42

Now we’ll make another table that contains two columns of nanoplots. Starting from the towny dataset, we first reduce it down to a subset of columns and rows. All of the columns related to either population or density will be used as input data for the two nanoplots. Both nanoplots will use a reference line that is generated from the median of the input data. And by naming the new nanoplot-laden columns in a similar manner as the input data columns, we can take advantage of select helpers (e.g., when using tab_spanner()). Many of the input data columns are now redundant because of the plots, so we’ll elect to hide most of those with cols_hide().

towny |>
  dplyr::select(name, starts_with("population"), starts_with("density")) |>
  dplyr::filter(population_2021 > 200000) |>
  dplyr::arrange(desc(population_2021)) |>
  gt() |>
  fmt_integer(columns = starts_with("population")) |>
  fmt_number(columns = starts_with("density"), decimals = 1) |>
  cols_nanoplot(
    columns = starts_with("population"),
    reference_line = "median",
    autohide = FALSE,
    new_col_name = "population_plot",
    new_col_label = md("*Change*")
  ) |>
  cols_nanoplot(
    columns = starts_with("density"),
    plot_type = "bar",
    autohide = FALSE,
    new_col_name = "density_plot",
    new_col_label = md("*Change*")
  ) |>
  cols_hide(columns = matches("2001|2006|2011|2016")) |>
  tab_spanner(
    label = "Population",
    columns = starts_with("population")
  ) |>
  tab_spanner(
    label = "Density ({{*persons* km^-2}})",
    columns = starts_with("density")
  ) |>
  cols_label_with(
    columns = -matches("plot"),
    fn = function(x) gsub("[^0-9]+", "", x)
  ) |>
  cols_align(align = "center", columns = matches("plot")) |>
  cols_width(
    name ~ px(140),
    everything() ~ px(100)
  ) |>
  opt_horizontal_padding(scale = 2)

	Population			Density (persons km⁻²)
	1996	2021	Change	1996	2021	Change
Toronto	2,385,421	2,794,356		3,779.8	4,427.8
Ottawa	721,136	1,017,449		258.6	364.9
Mississauga	544,382	717,961		1,859.6	2,452.6
Brampton	268,251	656,480		1,008.9	2,469.0
Hamilton	467,799	569,353		418.3	509.1
London	325,699	422,324		774.5	1,004.3
Markham	173,383	338,503		822.0	1,604.8
Vaughan	132,549	323,103		486.5	1,186.0
Kitchener	178,420	256,885		1,304.1	1,877.7
Windsor	197,694	229,660		1,353.9	1,572.8
Oakville	128,405	213,759		924.2	1,538.5
Richmond Hill	101,725	202,022		1,009.3	2,004.4

The sza dataset can, with just some use of dplyr and tidyr, give us a wide table full of nanoplottable values. We’ll transform the solar zenith angles to solar altitude angles and create a column of nanoplots using the newly calculated values. There are a few NA values during periods where the sun hasn’t risen (usually before 06:30 in the winter months) and those values will be replaced with 0 using missing_vals = "zero". We’ll also elect to create bar plots using the plot_type = "bar" option. The height of the plots will be bumped up to "2.5em" from the default of "2em". Finally, we will use nanoplot_options() to modify the coloring of the data bars.

sza |>
  dplyr::filter(latitude == 20 & tst <= "1200") |>
  dplyr::select(-latitude) |>
  dplyr::filter(!is.na(sza)) |>
  dplyr::mutate(saa = 90 - sza) |>
  dplyr::select(-sza) |>
  tidyr::pivot_wider(
    names_from = tst,
    values_from = saa,
    names_sort = TRUE
  ) |>
  gt(rowname_col = "month") |>
  tab_header(
    title = "Solar Altitude Angles",
    subtitle = "Average values every half hour from 05:30 to 12:00"
  ) |>
  cols_nanoplot(
    columns = matches("0"),
    plot_type = "bar",
    missing_vals = "zero",
    new_col_name = "saa",
    plot_height = "2.5em",
    options = nanoplot_options(
      data_bar_stroke_color = "GoldenRod",
      data_bar_fill_color = "DarkOrange"
    )
  ) |>
  tab_options(
    table.width = px(400),
    column_labels.hidden = TRUE
  ) |>
  cols_align(
    align = "center",
    columns = everything()
  ) |>
  tab_source_note(
    source_note = "The solar altitude angle is the complement to
    the solar zenith angle. TMYK."
  )

Solar Altitude Angles
Average values every half hour from 05:30 to 12:00
jan
feb
mar
apr
may
jun
jul
aug
sep
oct
nov
dec
The solar altitude angle is the complement to the solar zenith angle. TMYK.

You can use number and time streams as data for nanoplots. Let’s demonstrate how we can make use of them with some creative transformation of the pizzaplace dataset. A value stream is really a string with delimited numeric values, like this: "7.24,84.2,14". A value stream can also contain dates and/or datetimes, and here’s an example of that: "2020-06-02 13:05:13,2020-06-02 14:24:05,2020-06-02 18:51:37". Having data in this form can often be more convenient since different nanoplots might have varying amounts of data (and holding different amounts of data in a fixed number of columns is cumbersome). There are date and time columns in this dataset and we’ll use that to get x values denoting high-resolution time instants: the second of the day that a pizza was sold (this is true pizza analytics). We also have the sell price for a pizza, and that’ll serve as the y values. The pizzas belong to four different groups (in the type column) and we’ll group by that and create value streams with paste(..., collapse = ",") inside the dplyr::summarize() call. With two value streams in each row (having the same number of values) we can now make a gt table with nanoplots.

pizzaplace |>
  dplyr::filter(date == "2015-01-01") |>
  dplyr::mutate(date_time = paste(date, time)) |>
  dplyr::select(type, date_time, price) |>
  dplyr::group_by(type) |>
  dplyr::summarize(
    date_time = paste(date_time, collapse = ","),
    sold = paste(price, collapse = ",")
  ) |>
  gt(rowname_col = "type") |>
  tab_header(
    title = md("Pizzas sold on **January 1, 2015**"),
    subtitle = "Between the opening hours of 11:30 to 22:30"
  ) |>
  cols_nanoplot(
    columns = sold,
    columns_x_vals = date_time,
    expand_x = c("2015-01-01 11:30", "2015-01-01 22:30"),
    reference_line = "median",
    new_col_name = "pizzas_sold",
    new_col_label = "Pizzas Sold",
    options = nanoplot_options(
      show_data_line = FALSE,
      show_data_area = FALSE,
      currency = "USD"
    )
  ) |>
  cols_width(pizzas_sold ~ px(200)) |>
  cols_align(columns = pizzas_sold, align = "center") |>
  opt_all_caps()

	Pizzas Sold
Pizzas sold on January 1, 2015
Between the opening hours of 11:30 to 22:30
chicken
classic
supreme
veggie

Notice that the columns containing the value streams are hid due to the default argument autohide = TRUE because, while useful, they don’t need to be displayed to anybody viewing a table. Since we have a lot of data points and a connecting line is not as valuable here, we also set show_data_line = FALSE in nanoplot_options(). It’s more interesting to see the clusters of the differently priced pizzas over the entire day. Specifying a currency in nanoplot_options() is a nice touch since the y values are sale prices in U.S. Dollars (hovering over data points gives correctly formatted values). Finally, having a reference line based on the median gives pretty useful information. Seems like customers preferred getting the "chicken"-type pizzas in large size!

Using the gibraltar dataset, let’s make a series of nanoplots across the meteorological parameters of temperature, humidity, and wind speed. We’ll want to customize the appearance of the plots across three columns and we can make this somewhat simpler by assigning a common set of options through nanoplot_options(). In this table we want to make comparisons across nanoplots in a particular column easier, so, we’ll set autoscale = TRUE so that there is a common y-axis scale for each of the parameters (based on the extents of the data).

nanoplot_options_list <-
  nanoplot_options(
    data_point_radius = px(4),
    data_point_stroke_width = px(2),
    data_point_stroke_color = "black",
    data_point_fill_color = "white",
    data_line_stroke_width = px(4),
    data_line_stroke_color = "gray",
    show_data_line = TRUE,
    show_data_points = TRUE,
    show_data_area = FALSE,
  )

gibraltar |>
  dplyr::filter(date <= "2023-05-14") |>
  dplyr::mutate(time = as.numeric(hms::as_hms(paste0(time, ":00")))) |>
  dplyr::mutate(humidity = humidity * 100) |>
  dplyr::select(date, time, temp, humidity, wind_speed) |>
  dplyr::group_by(date) |>
  dplyr::summarize(
    time = paste(time, collapse = ","),
    temp = paste(temp, collapse = ","),
    humidity = paste(humidity, collapse = ","),
    wind_speed = paste(wind_speed, collapse = ","),
  ) |>
  dplyr::mutate(is_satsun = lubridate::wday(date) %in% c(1, 7)) |>
  gt(rowname_col = "date") |>
  tab_header(
    title = "Meteorological Summary of Gibraltar Station",
    subtitle = "Data taken from May 1-14, 2023."
  ) |>
  fmt_date(columns = stub(), date_style = "wd_m_day_year") |>
  cols_nanoplot(
    columns = temp,
    columns_x_vals = time,
    expand_x = c(0, 86400),
    autoscale = TRUE,
    new_col_name = "temperature_nano",
    new_col_label = "Temperature",
    options = nanoplot_options_list
  ) |>
  cols_nanoplot(
    columns = humidity,
    columns_x_vals = time,
    expand_x = c(0, 86400),
    autoscale = TRUE,
    new_col_name = "humidity_nano",
    new_col_label = "Humidity",
    options = nanoplot_options_list
  ) |>
  cols_nanoplot(
    columns = wind_speed,
    columns_x_vals = time,
    expand_x = c(0, 86400),
    autoscale = TRUE,
    new_col_name = "wind_speed_nano",
    new_col_label = "Wind Speed",
    options = nanoplot_options_list
  ) |>
  cols_units(
    temperature_nano = ":degree:C",
    humidity_nano = "% (RH)",
    wind_speed_nano = "m s^-1"
  ) |>
  cols_hide(columns = is_satsun) |>
  tab_style_body(
    style = cell_fill(color = "#E5FEFE"),
    values = TRUE,
    targets = "row",
    extents = c("body", "stub")
  ) |>
  tab_style(
    style = cell_text(align = "center"),
    locations = cells_column_labels()
  )

	Temperature, °C	Humidity, % (RH)	Wind Speed, m s⁻¹
Meteorological Summary of Gibraltar Station
Data taken from May 1-14, 2023.
Mon, May 1, 2023
Tue, May 2, 2023
Wed, May 3, 2023
Thu, May 4, 2023
Fri, May 5, 2023
Sat, May 6, 2023
Sun, May 7, 2023
Mon, May 8, 2023
Tue, May 9, 2023
Wed, May 10, 2023
Thu, May 11, 2023
Fri, May 12, 2023
Sat, May 13, 2023
Sun, May 14, 2023

Box plots can be generated, and we just need to use plot_type = "boxplot" to make that type of nanoplot. Using a small portion of the pizzaplace dataset, we will create a simple table that displays a box plot of pizza sales for a selection of days. By converting the string-time 24-hour-clock time values to the number of seconds elapsed in a day, we get continuous values that can be incorporated into each box plot. And, by supplying a function to the y_val_fmt_fn argument within nanoplot_options(), we can transform the integer seconds values back to clock times for display on hover.

pizzaplace |>
  dplyr::filter(date <= "2015-01-14") |>
  dplyr::mutate(time = as.numeric(hms::as_hms(time))) |>
  dplyr::summarize(time = paste(time, collapse = ","), .by = date) |>
  dplyr::mutate(is_weekend = lubridate::wday(date) %in% 6:7) |>
  gt() |>
  tab_header(title = "Pizza Sales in Early January 2015") |>
  fmt_date(columns = date, date_style = 2) |>
  cols_nanoplot(
    columns = time,
    plot_type = "boxplot",
    options = nanoplot_options(y_val_fmt_fn = function(x) hms::as_hms(x))
  ) |>
  cols_hide(columns = is_weekend) |>
  cols_width(everything() ~ px(250)) |>
  cols_align(align = "center", columns = nanoplots) |>
  cols_align(align = "left", columns = date) |>
  tab_style(
    style = cell_borders(
      sides = "left", color = "gray"),
    locations = cells_body(columns = nanoplots)
  ) |>
  tab_style_body(
    style = cell_fill(color = "#E5FEFE"),
    values = TRUE,
    targets = "row"
  ) |>
  tab_options(column_labels.hidden = TRUE)

Pizza Sales in Early January 2015
Thursday, January 1, 2015
Friday, January 2, 2015
Saturday, January 3, 2015
Sunday, January 4, 2015
Monday, January 5, 2015
Tuesday, January 6, 2015
Wednesday, January 7, 2015
Thursday, January 8, 2015
Friday, January 9, 2015
Saturday, January 10, 2015
Sunday, January 11, 2015
Monday, January 12, 2015
Tuesday, January 13, 2015
Wednesday, January 14, 2015

The cols_nanoplot() function

The `cols_nanoplot()` function