The cols_label() function

Let’s use a portion of the countrypops dataset to create a gt table. We can relabel all the table’s columns with the cols_label() function to improve its presentation. In this simple case we are supplying the name of the column on the left-hand side, and the label text on the right-hand side.

countrypops |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(
    country_name == "Uganda",
    year %in% 2017:2021
  ) |>
  gt() |>
  cols_label(
    country_name = "Name",
    year = "Year",
    population = "Population"
  )
Name Year Population
Uganda 2017 40127085
Uganda 2018 41515395
Uganda 2019 42949080
Uganda 2020 44404611
Uganda 2021 45853778

Using the countrypops dataset again, we label columns similarly to before but this time making the column labels be bold through Markdown formatting (with the md() helper function). It’s possible here to use either a = or a ~ between the column name and the label text.

countrypops |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(
    country_name == "Uganda",
    year %in% 2017:2021
  ) |>
  gt() |>
  cols_label(
    country_name = md("**Name**"),
    year = md("**Year**"),
    population ~ md("**Population**")
  )
Name Year Population
Uganda 2017 40127085
Uganda 2018 41515395
Uganda 2019 42949080
Uganda 2020 44404611
Uganda 2021 45853778

With a select portion of the metro dataset, let’s create a small gt table with three columns. Within cols_label() we’d like to provide column labels that contain line breaks. For that, we can use <br> to indicate where the line breaks should be. We also need to use the md() helper function to signal to gt that this text should be interpreted as Markdown. Instead of calling md() on each of labels as before, we can more conveniently use the .fn argument and provide the bare function there (it will be applied to each label defined in the cols_label() call).

metro |>
  dplyr::select(name, lines, passengers, connect_other) |>
  dplyr::slice_max(passengers, n = 10) |>
  gt() |>
  cols_hide(columns = passengers) |>
  cols_label(
    name = "Name of<br>Metro Station",
    lines = "Metro<br>Lines",
    connect_other = "Train<br>Services",
    .fn = md
  )
Name of
Metro Station
Metro
Lines
Train
Services
Gare du Nord 4, 5 TGV, TER, Thalys, Eurostar
Saint-Lazare 3, 12, 13, 14 TGV, TER, Intercités
Gare de Lyon 1, 14 TGV, TGV Lyria, Renfe-SNCF, OUIGO, Frecciarossa
Montparnasse—Bienvenüe 4, 6, 12, 13 TGV, TER, Intercités, OUIGO
Gare de l'Est 4, 5, 7 TGV, TER, OUIGO, Nightjet
Bibliothèque François Mitterrand 14 NA
République 3, 5, 8, 9, 11 NA
Les Halles 4 NA
La Défense 1 NA
Châtelet 1, 4, 7, 11, 14 NA

Using a subset of the towny dataset, we can create an interesting gt table. First, only certain columns are selected from the dataset, some filtering of rows is done, rows are sorted, and then only the first 10 rows are kept. After the data is introduced to gt(), we then apply some spanner labels using two calls of tab_spanner(). Below those spanners, we want to label the columns by the years of interest. Using cols_label() and select expressions on the left side of the formulas, we can easily relabel multiple columns with common label text. Note that we cannot use an = sign in any of the expressions within cols_label(); because the left-hand side is not a single column name, we must use formula syntax (i.e., with the ~).

towny |>
  dplyr::select(
    name, ends_with("2001"), ends_with("2006"), matches("2001_2006")
  ) |>
  dplyr::filter(population_2001 > 100000) |>
  dplyr::arrange(desc(pop_change_2001_2006_pct)) |>
  dplyr::slice_head(n = 10) |>
  gt() |>
  fmt_integer() |>
  fmt_percent(columns = matches("change"), decimals = 1) |>
  tab_spanner(label = "Population", columns = starts_with("population")) |>
  tab_spanner(label = "Density", columns = starts_with("density")) |>
  cols_label(
    ends_with("01") ~ "2001",
    ends_with("06") ~ "2006",
    matches("change") ~ md("Population Change,<br>2001 to 2006")
  ) |>
  cols_width(everything() ~ px(120))
name
Population
Density
Population Change,
2001 to 2006
2001 2006 2001 2006
Brampton 325,428 433,806 1,224 1,632 33.3%
Vaughan 182,022 238,866 668 877 31.2%
Markham 208,615 261,573 989 1,240 25.4%
Barrie 103,710 128,430 1,047 1,297 23.8%
Richmond Hill 132,030 162,704 1,310 1,614 23.2%
Oakville 144,738 165,613 1,042 1,192 14.4%
Mississauga 612,925 668,599 2,094 2,284 9.1%
Cambridge 110,372 120,371 977 1,065 9.1%
Burlington 150,836 164,415 810 883 9.0%
Guelph 106,170 114,943 1,214 1,315 8.3%

Here’s another table that uses the towny dataset. The big difference compared to the previous gt table is that cols_label() as used here incorporates unit notation text (within "{{"/"}“}).

towny |>
  dplyr::select(
    name, population_2021, density_2021, land_area_km2, latitude, longitude
  ) |>
  dplyr::filter(population_2021 > 100000) |>
  dplyr::arrange(desc(population_2021)) |>
  dplyr::slice_head(n = 10) |>
  gt() |>
  fmt_integer(columns = population_2021) |>
  fmt_number(
    columns = c(density_2021, land_area_km2),
    decimals = 1
  ) |>
  fmt_number(columns = latitude, decimals = 2) |>
  fmt_number(columns = longitude, decimals = 2, scale_by = -1) |>
  cols_label(
    starts_with("population") ~ "Population",
    starts_with("density") ~ "Density, {{*persons* km^-2}}",
    land_area_km2 ~ "Area, {{km^2}}",
    latitude ~ "Latitude, {{:degrees:N}}",
    longitude ~ "Longitude, {{:degrees:W}}"
  ) |>
  cols_width(everything() ~ px(120))
name Population Density, persons km−2 Area, km2 Latitude, °N Longitude, °W
Toronto 2,794,356 4,427.8 631.1 43.74 79.37
Ottawa 1,017,449 364.9 2,788.2 45.42 75.69
Mississauga 717,961 2,452.6 292.7 43.60 79.65
Brampton 656,480 2,469.0 265.9 43.69 79.76
Hamilton 569,353 509.1 1,118.3 43.26 79.87
London 422,324 1,004.3 420.5 42.97 81.23
Markham 338,503 1,604.8 210.9 43.88 79.26
Vaughan 323,103 1,186.0 272.4 43.83 79.50
Kitchener 256,885 1,877.7 136.8 43.42 80.47
Windsor 229,660 1,572.8 146.0 42.28 83.00

The illness dataset has units within the units column. They’re formatted in just the right way for gt too. Let’s do some text manipulation through dplyr::mutate() and some pivoting with tidyr::pivot_longer() and tidyr::pivot_wider() in order to include the units as part of the column names in the reworked table. These column names are in a format where the units are included within "{{"/"}“}, so, we can use cols_label() with the .process_units = TRUE option to register the measurement units. In addition to this, because there is a <br> included (for a line break), we should use the .fn option and provide the md() helper function (as a bare function name). This ensures that any line breaks will materialize.

illness |>
  dplyr::mutate(test = paste0(test, ",<br>{{", units, "}}")) |>
  dplyr::slice_head(n = 8) |>
  dplyr::select(-c(starts_with("norm"), units)) |>
  tidyr::pivot_longer(
    cols = starts_with("day"),
    names_to = "day",
    names_prefix = "day_",
    values_to = "value"
  ) |>
  tidyr::pivot_wider(
    names_from = test,
    values_from = value
  ) |>
  gt(rowname_col = "day") |>
  tab_stubhead(label = "Day") |>
  cols_label(
    .fn = md,
    .process_units = TRUE
  ) |>
  cols_width(
    stub() ~ px(50),
    everything() ~ px(120)
  )
Day Viral load,
copies per mL
WBC,
×109/L
Neutrophils,
×109/L
RBC,
×1012/L
Hb,
g/L
PLT,
×109/L
ALT,
U/L
AST,
U/L
3 12000 5.26 4.87 5.72 153 67.0 12835.0 23672.0
4 4200 4.26 4.72 5.98 135 38.6 12632.0 21368.0
5 1600 9.92 7.92 4.23 126 27.4 6426.7 14730.0
6 830 10.49 18.21 4.83 115 26.2 4263.1 8691.0
7 760 24.77 22.08 4.12 75 74.1 1623.7 2189.0
8 520 30.26 27.17 2.68 87 36.2 672.6 1145.0
9 250 19.03 16.59 3.32 95 25.6 512.4 782.5