gt-cols_label – The gt package

The `cols_label()` function

Let’s use a portion of the countrypops dataset to create a gt table. We can relabel all the table’s columns with the cols_label() function to improve its presentation. In this simple case we are supplying the name of the column on the left-hand side, and the label text on the right-hand side.

countrypops |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(
    country_name == "Uganda",
    year %in% 2017:2021
  ) |>
  gt() |>
  cols_label(
    country_name = "Name",
    year = "Year",
    population = "Population"
  )

Name	Year	Population
Uganda	2017	40127085
Uganda	2018	41515395
Uganda	2019	42949080
Uganda	2020	44404611
Uganda	2021	45853778

Using the countrypops dataset again, we label columns similarly to before but this time making the column labels be bold through Markdown formatting (with the md() helper function). It’s possible here to use either a = or a ~ between the column name and the label text.

countrypops |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(
    country_name == "Uganda",
    year %in% 2017:2021
  ) |>
  gt() |>
  cols_label(
    country_name = md("**Name**"),
    year = md("**Year**"),
    population ~ md("**Population**")
  )

Name	Year	Population
Uganda	2017	40127085
Uganda	2018	41515395
Uganda	2019	42949080
Uganda	2020	44404611
Uganda	2021	45853778

With a select portion of the metro dataset, let’s create a small gt table with three columns. Within cols_label() we’d like to provide column labels that contain line breaks. For that, we can use <br> to indicate where the line breaks should be. We also need to use the md() helper function to signal to gt that this text should be interpreted as Markdown. Instead of calling md() on each of labels as before, we can more conveniently use the .fn argument and provide the bare function there (it will be applied to each label defined in the cols_label() call).

metro |>
  dplyr::select(name, lines, passengers, connect_other) |>
  dplyr::slice_max(passengers, n = 10) |>
  gt() |>
  cols_hide(columns = passengers) |>
  cols_label(
    name = "Name of<br>Metro Station",
    lines = "Metro<br>Lines",
    connect_other = "Train<br>Services",
    .fn = md
  )

Name of Metro Station	Metro Lines	Train Services
Gare du Nord	4, 5	TGV, TER, Thalys, Eurostar
Saint-Lazare	3, 12, 13, 14	TGV, TER, Intercités
Gare de Lyon	1, 14	TGV, TGV Lyria, Renfe-SNCF, OUIGO, Frecciarossa
Montparnasse—Bienvenüe	4, 6, 12, 13	TGV, TER, Intercités, OUIGO
Gare de l'Est	4, 5, 7	TGV, TER, OUIGO, Nightjet
Bibliothèque François Mitterrand	14	NA
République	3, 5, 8, 9, 11	NA
Les Halles	4	NA
La Défense	1	NA
Châtelet	1, 4, 7, 11, 14	NA

Using a subset of the towny dataset, we can create an interesting gt table. First, only certain columns are selected from the dataset, some filtering of rows is done, rows are sorted, and then only the first 10 rows are kept. After the data is introduced to gt(), we then apply some spanner labels using two calls of tab_spanner(). Below those spanners, we want to label the columns by the years of interest. Using cols_label() and select expressions on the left side of the formulas, we can easily relabel multiple columns with common label text. Note that we cannot use an = sign in any of the expressions within cols_label(); because the left-hand side is not a single column name, we must use formula syntax (i.e., with the ~).

towny |>
  dplyr::select(
    name, ends_with("2001"), ends_with("2006"), matches("2001_2006")
  ) |>
  dplyr::filter(population_2001 > 100000) |>
  dplyr::slice_max(pop_change_2001_2006_pct, n = 10) |>
  gt() |>
  fmt_integer() |>
  fmt_percent(columns = matches("change"), decimals = 1) |>
  tab_spanner(label = "Population", columns = starts_with("population")) |>
  tab_spanner(label = "Density", columns = starts_with("density")) |>
  cols_label(
    ends_with("01") ~ "2001",
    ends_with("06") ~ "2006",
    matches("change") ~ md("Population Change,<br>2001 to 2006")
  ) |>
  cols_width(everything() ~ px(120))

name	Population		Density		Population Change, 2001 to 2006
name	2001	2006	2001	2006	Population Change, 2001 to 2006
Brampton	325,428	433,806	1,224	1,632	33.3%
Vaughan	182,022	238,866	668	877	31.2%
Markham	208,615	261,573	989	1,240	25.4%
Barrie	103,710	128,430	1,047	1,297	23.8%
Richmond Hill	132,030	162,704	1,310	1,614	23.2%
Oakville	144,738	165,613	1,042	1,192	14.4%
Mississauga	612,925	668,599	2,094	2,284	9.1%
Cambridge	110,372	120,371	977	1,065	9.1%
Burlington	150,836	164,415	810	883	9.0%
Guelph	106,170	114,943	1,214	1,315	8.3%

Here’s another table that uses the towny dataset. The big difference compared to the previous gt table is that cols_label() as used here incorporates unit notation text (within "{{"/"}“}).

towny |>
  dplyr::select(
    name, population_2021, density_2021, land_area_km2, latitude, longitude
  ) |>
  dplyr::slice_max(population_2021, n = 10) |>
  gt() |>
  fmt_integer(columns = population_2021) |>
  fmt_number(
    columns = c(density_2021, land_area_km2),
    decimals = 1
  ) |>
  fmt_number(columns = latitude, decimals = 2) |>
  fmt_number(columns = longitude, decimals = 2, scale_by = -1) |>
  cols_label(
    starts_with("population") ~ "Population",
    starts_with("density") ~ "Density, {{*persons* km^-2}}",
    land_area_km2 ~ "Area, {{km^2}}",
    latitude ~ "Latitude, {{:degrees:N}}",
    longitude ~ "Longitude, {{:degrees:W}}"
  ) |>
  cols_width(everything() ~ px(120))

name	Population	Density, persons km⁻²	Area, km²	Latitude, °N	Longitude, °W
Toronto	2,794,356	4,427.8	631.1	43.74	79.37
Ottawa	1,017,449	364.9	2,788.2	45.42	75.69
Mississauga	717,961	2,452.6	292.7	43.60	79.65
Brampton	656,480	2,469.0	265.9	43.69	79.76
Hamilton	569,353	509.1	1,118.3	43.26	79.87
London	422,324	1,004.3	420.5	42.97	81.23
Markham	338,503	1,604.8	210.9	43.88	79.26
Vaughan	323,103	1,186.0	272.4	43.83	79.50
Kitchener	256,885	1,877.7	136.8	43.42	80.47
Windsor	229,660	1,572.8	146.0	42.28	83.00

The illness dataset has units within the units column. They’re formatted in just the right way for gt too. Let’s do some text manipulation through dplyr::mutate() and some pivoting with tidyr::pivot_longer() and tidyr::pivot_wider() in order to include the units as part of the column names in the reworked table. These column names are in a format where the units are included within "{{"/"}“}, so, we can use cols_label() with the .process_units = TRUE option to register the measurement units. In addition to this, because there is a <br> included (for a line break), we should use the .fn option and provide the md() helper function (as a bare function name). This ensures that any line breaks will materialize.

illness |>
  dplyr::mutate(test = paste0(test, ",<br>{{", units, "}}")) |>
  dplyr::slice_head(n = 8) |>
  dplyr::select(-c(starts_with("norm"), units)) |>
  tidyr::pivot_longer(
    cols = starts_with("day"),
    names_to = "day",
    names_prefix = "day_",
    values_to = "value"
  ) |>
  tidyr::pivot_wider(
    names_from = test,
    values_from = value
  ) |>
  gt(rowname_col = "day") |>
  tab_stubhead(label = "Day") |>
  cols_label(
    .fn = md,
    .process_units = TRUE
  ) |>
  cols_width(
    stub() ~ px(50),
    everything() ~ px(120)
  )

Day	Viral load, copies per mL	WBC, ×10⁹/L	Neutrophils, ×10⁹/L	RBC, ×10¹²/L	Hb, g/L	PLT, ×10⁹/L	ALT, U/L	AST, U/L
3	12000	5.26	4.87	5.72	153	67.0	12835.0	23672.0
4	4200	4.26	4.72	5.98	135	38.6	12632.0	21368.0
5	1600	9.92	7.92	4.23	126	27.4	6426.7	14730.0
6	830	10.49	18.21	4.83	115	26.2	4263.1	8691.0
7	760	24.77	22.08	4.12	75	74.1	1623.7	2189.0
8	520	30.26	27.17	2.68	87	36.2	672.6	1145.0
9	250	19.03	16.59	3.32	95	25.6	512.4	782.5

The cols_label() function

The `cols_label()` function