The data_color() function

data_color() can be used without any supplied arguments to colorize a gt table. Let’s do this with the exibble dataset:

exibble |>
  gt() |>
  data_color()
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

What’s happened is that data_color() applies background colors to all cells of every column with the default palette in R (accessed through palette()). The default method for applying color is "auto", where numeric values will use the "numeric" method and character or factor values will use the "factor" method. The text color undergoes an automatic modification that maximizes contrast (since autocolor_text is TRUE by default).

You can use any of the available method keywords and gt will only apply color to the compatible values. Let’s use the "numeric" method and supply palette values of "red" and "green".

exibble |>
  gt() |>
  data_color(
    method = "numeric",
    palette = c("red", "green")
  )
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

With those options in place we see that only the numeric columns num and currency received color treatments. Moreover, the palette colors were mapped to the lower and upper limits of the data in each column; interpolated colors were used for the values in between the numeric limits of the two columns.

We can constrain the cells to which coloring will be applied with the columns and rows arguments. Further to this, we can manually set the limits of the data with the domain argument (which is preferable in most cases). Here, the domain will be set as domain = c(0, 50).

exibble |>
  gt() |>
  data_color(
    columns = currency,
    rows = currency < 50,
    method = "numeric",
    palette = c("red", "green"),
    domain = c(0, 50)
  )
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

We can use any of the palettes available in the RColorBrewer and viridis packages. Let’s make a new gt table from a subset of the countrypops dataset. Then, through data_color(), we’ll apply coloring to the population column with the "numeric" method, use a domain between 2.5 and 3.4 million, and specify palette = "viridis".

countrypops |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(
    country_name == "Bangladesh",
    year %in% 2012:2021
  ) |>
  gt() |>
  data_color(
    columns = population,
    method = "numeric",
    palette = "viridis",
    domain = c(150E6, 170E6),
    reverse = TRUE
  )
country_name year population
Bangladesh 2012 152090649
Bangladesh 2013 154030139
Bangladesh 2014 155961299
Bangladesh 2015 157830000
Bangladesh 2016 159784568
Bangladesh 2017 161793964
Bangladesh 2018 163683958
Bangladesh 2019 165516222
Bangladesh 2020 167420951
Bangladesh 2021 169356251

We can alternatively use the fn argument for supplying the scales-based function scales::col_numeric(). That function call will itself return a function (which is what the fn argument actually requires) that takes a vector of numeric values and returns color values. Here is an alternate version of the code that returns the same table as in the previous example.

countrypops |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(
    country_name == "Bangladesh",
    year %in% 2012:2021
  ) |>
  gt() |>
  data_color(
    columns = population,
    fn = scales::col_numeric(
      palette = "viridis",
      domain = c(150E6, 170E6),
      reverse = TRUE
    )
  )
country_name year population
Bangladesh 2012 152090649
Bangladesh 2013 154030139
Bangladesh 2014 155961299
Bangladesh 2015 157830000
Bangladesh 2016 159784568
Bangladesh 2017 161793964
Bangladesh 2018 163683958
Bangladesh 2019 165516222
Bangladesh 2020 167420951
Bangladesh 2021 169356251

Using your own function in fn can be very useful if you want to make use of specialized arguments in the scales::col_*() functions. You could even supply your own specialized function for performing complex colorizing treatments!

data_color() has a way to apply colorization indirectly to other columns. That is, you can apply colors to a column different from the one used to generate those specific colors. The trick is to use the target_columns argument. Let’s do this with a more complete countrypops-based table example.

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR")) |>
  dplyr::filter(year %% 10 == 0) |>
  dplyr::select(-contains("code")) |>
  dplyr::mutate(color = "") |>
  gt(groupname_col = "country_name") |>
  fmt_integer(columns = population) |>
  data_color(
    columns = population,
    target_columns = color,
    method = "numeric",
    palette = "viridis",
    domain = c(4E7, 7E7)
  ) |>
  cols_label(
    year = "",
    population = "Population",
    color = ""
  ) |>
  opt_vertical_padding(scale = 0.65)
Population
France
1960 46,649,927
1970 51,724,116
1980 55,052,582
1990 58,044,701
2000 60,921,384
2010 65,030,575
2020 67,571,107
United Kingdom
1960 52,400,000
1970 55,663,250
1980 56,314,216
1990 57,247,586
2000 58,892,514
2010 62,766,365
2020 67,081,234

When specifying a single column in columns we can use as many target_columns values as we want. Let’s make another countrypops-based table where we map the generated colors from the year column to all columns in the table. This time, the palette used is "inferno" (also from the viridis package).

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR", "ITA")) |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(year %% 5 == 0) |>
  tidyr::pivot_wider(
    names_from = "country_name",
    values_from = "population"
  ) |>
  gt() |>
  fmt_integer(columns = c(everything(), -year)) |>
  cols_width(
    year ~ px(80),
    everything() ~ px(160)
  ) |>
  opt_all_caps() |>
  opt_vertical_padding(scale = 0.75) |>
  opt_horizontal_padding(scale = 3) |>
  data_color(
    columns = year,
    target_columns = everything(),
    palette = "inferno"
  ) |>
  tab_options(
    table_body.hlines.style = "none",
    column_labels.border.top.color = "black",
    column_labels.border.bottom.color = "black",
    table_body.border.bottom.color = "black"
  )
year France United Kingdom Italy
1960 46,649,927 52,400,000 50,199,700
1965 49,282,756 54,348,050 52,112,350
1970 51,724,116 55,663,250 53,821,850
1975 53,715,733 56,225,800 55,441,001
1980 55,052,582 56,314,216 56,433,883
1985 56,569,195 56,550,268 56,593,071
1990 58,044,701 57,247,586 56,719,240
1995 59,543,659 58,019,030 56,844,303
2000 60,921,384 58,892,514 56,942,108
2005 63,188,395 60,401,206 57,969,484
2010 65,030,575 62,766,365 59,277,417
2015 66,548,272 65,116,219 60,730,582
2020 67,571,107 67,081,234 59,438,851

Now, it’s time to use pizzaplace to create a gt table. The color palette to be used is the "ggsci::red_material" one (it’s in the ggsci R package but also obtainable from the paletteer package). Colorization will be applied to the to the sold and income columns. We don’t have to specify those in columns because those are the only columns in the table. Also, the domain is not set here. We’ll use the bounds of the available data in each column.

pizzaplace |>
  dplyr::group_by(type, size) |>
  dplyr::summarize(
    sold = dplyr::n(),
    income = sum(price),
    .groups = "drop_last"
  ) |>
  dplyr::group_by(type) |>
  dplyr::mutate(f_sold = sold / sum(sold)) |>
  dplyr::mutate(size = factor(
    size, levels = c("S", "M", "L", "XL", "XXL"))
  ) |>
  dplyr::arrange(type, size) |>
  gt(
    rowname_col = "size",
    groupname_col = "type"
  ) |>
  fmt_percent(
    columns = f_sold,
    decimals = 1
  ) |>
  cols_merge(
    columns = c(size, f_sold),
    pattern = "{1} ({2})"
  ) |>
  cols_align(align = "left", columns = stub()) |>
  data_color(
    method = "numeric",
    palette = "ggsci::red_material"
  )
sold income
chicken
S (20.1%) 2224 28356.00
M (35.2%) 3894 65224.50
L (44.6%) 4932 102339.00
classic
S (41.2%) 6139 69870.25
M (27.6%) 4112 60581.75
L (27.3%) 4057 74518.50
XL (3.7%) 552 14076.00
XXL (0.2%) 28 1006.60
supreme
S (28.2%) 3377 47463.50
M (33.8%) 4046 66475.00
L (38.1%) 4564 94258.50
veggie
S (22.9%) 2663 32386.75
M (30.8%) 3583 57101.00
L (46.4%) 5403 104202.70

Colorization can occur in a row-wise manner. The key to making that happen is by using direction = "row". Let’s use the sza dataset to make a gt table. Then, color will be applied to values across each ‘month’ of data in that table. This is useful when not setting a domain as the bounds of each row will be captured, coloring each cell with values relative to the range. The palette is "PuOr" from the RColorBrewer package (only the name here is required).

sza |>
  dplyr::filter(latitude == 20 & tst <= "1200") |>
  dplyr::select(-latitude) |>
  dplyr::filter(!is.na(sza)) |>
  tidyr::spread(key = "tst", value = sza) |>
  gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  data_color(
    direction = "row",
    palette = "PuOr",
    na_color = "white"
  )
0530 0600 0630 0700 0730 0800 0830 0900 0930 1000 1030 1100 1130 1200
jan


84.9 78.7 72.7 66.1 61.5 56.5 52.1 48.3 45.5 43.6 43.0
feb

88.9 82.5 75.8 69.6 63.3 57.7 52.2 47.4 43.1 40.0 37.8 37.2
mar

85.7 78.8 72.0 65.2 58.6 52.3 46.2 40.5 35.5 31.4 28.6 27.7
apr
88.5 81.5 74.4 67.4 60.3 53.4 46.5 39.7 33.2 26.9 21.3 17.2 15.5
may
85.0 78.2 71.2 64.3 57.2 50.2 43.2 36.1 29.1 26.1 15.2 8.8 5.0
jun 89.2 82.7 76.0 69.3 62.5 55.7 48.8 41.9 35.0 28.1 21.1 14.2 7.3 2.0
jul 88.8 82.3 75.7 69.1 62.3 55.5 48.7 41.8 35.0 28.1 21.2 14.3 7.7 3.1
aug
83.8 77.1 70.2 63.3 56.4 49.4 42.4 35.4 28.3 21.3 14.3 7.3 1.9
sep
87.2 80.2 73.2 66.1 59.1 52.1 45.1 38.1 31.3 24.7 18.6 13.7 11.6
oct

84.1 77.1 70.2 63.3 56.5 49.9 43.5 37.5 32.0 27.4 24.3 23.1
nov

87.8 81.3 74.5 68.3 61.8 56.0 50.2 45.3 40.7 37.4 35.1 34.4
dec


84.3 78.0 71.8 66.1 60.5 55.6 50.9 47.2 44.2 42.4 41.8

Notice that na_color = "white" was used, and this avoids the appearance of gray cells for the missing values (we also removed the "NA" text with sub_missing(), opting for empty strings).