Background

The final step in the analysis of flow cytometry data is Reporting. Below, we show how we can quickly extract counts, frequencies, and expression values for their population(s) of interest.

Counts, Frequencies, and MFI

We can easily extract Counts and Frequencies of the gated cell population from the GatingSet.

# get count for all
counts <- gs_pop_get_count_fast(gs,  # indicate which GatingSet to use
                                statistic = "count", # indicate what statistic is requested
                                format = "wide") # indicate if data should be in wide (samples are columns) or long (each node for each sample gets a row) format

knitr::kable(head(counts,20), caption = "Event counts")
Event counts
4000_TNK-CR1 4001_TNK-CR1 4002_TNK-CR1 4003_TNK-CR1
/singlet 90201 82095 334739 128359
/singlet/live 85691 77990 318002 121941
/singlet/live/lymphocytes 79632 72217 290836 113559
/singlet/live/lymphocytes/CD3+ T cells 67477 57833 241125 80699
/singlet/live/lymphocytes/CD3+ T cells/NKT cells 5 112 477 75
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells 64774 55223 237368 80040
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells 61996 52005 228635 78791
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/MAIT Cells 461 738 1649 640
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT 61535 51267 226986 78151
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon 61578 51270 226928 78154
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4+CD8a+ 333 79 343 99
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4+CD8a- 39884 35776 138031 56808
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4-CD8a+ 20283 14674 85356 20771
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4-CD8a- 1078 741 3198 476
root 102015 92703 372224 146150

Changing statistic to freq return frequency of the gated population with respect to the parent population.

freq <- gs_pop_get_count_fast(gs,
                              statistic = "freq",
                              format = "wide")
knitr::kable(head(freq,20), caption = "Event frequencies relative to parent node")
Event frequencies relative to parent node
4000_TNK-CR1 4001_TNK-CR1 4002_TNK-CR1 4003_TNK-CR1
/singlet 0.8841935 0.8855700 0.8992945 0.8782689
/singlet/live 0.9500006 0.9499970 0.9499999 0.9499996
/singlet/live/lymphocytes 0.9292925 0.9259777 0.9145729 0.9312618
/singlet/live/lymphocytes/CD3+ T cells 0.8473604 0.8008225 0.8290755 0.7106350
/singlet/live/lymphocytes/CD3+ T cells/NKT cells 0.0000741 0.0019366 0.0019782 0.0009294
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells 0.9599419 0.9548701 0.9844189 0.9918339
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells 0.9571124 0.9417272 0.9632090 0.9843953
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/MAIT Cells 0.0074360 0.0141909 0.0072124 0.0081228
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT 0.9925640 0.9858091 0.9927876 0.9918772
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon 0.9932576 0.9858667 0.9925340 0.9919153
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4+CD8a+ 0.0054078 0.0015409 0.0015115 0.0012667
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4+CD8a- 0.6476989 0.6977960 0.6082590 0.7268726
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4-CD8a+ 0.3293871 0.2862103 0.3761369 0.2657701
/singlet/live/lymphocytes/CD3+ T cells/non-NKT Cells/conv_Tcells/not_MAIT_Polygon/CD4-CD8a- 0.0175063 0.0144529 0.0140926 0.0060905
root 1.0000000 1.0000000 1.0000000 1.0000000

What if one wants to report frequencies relative to a different population?

plot(gs, bool = TRUE)

Let’s consider the scenario here: you would like to calculate the frequency of MAIT Cells, and True_NKT Cells relative to total CD3+ Cells.

We can easily achieve this by using gs_pop_get_stats and indicating which node(s) we want

library(magrittr)
# get count for specified nodes and make wider 
pop_counts <- gs_pop_get_stats(gs,
                                    node = c("CD3+ T cells","MAIT Cells", "NKT cells"),
                                    type = "count"
                                    ) %>% # output is a long column
  tidyr::pivot_wider(names_from = pop, 
                     values_from = count,names_prefix = "count_") %>% # convert to wide
  dplyr::mutate(MAIT_freq = `count_MAIT Cells`/`count_CD3+ T cells`,
         NKT_freq = `count_NKT cells`/`count_CD3+ T cells`)


knitr::kable(pop_counts,caption = "Frequency of MAIT and NKT Cells (relative to CD3+ T Cells)")
Frequency of MAIT and NKT Cells (relative to CD3+ T Cells)
sample count_CD3+ T cells count_MAIT Cells count_NKT cells MAIT_freq NKT_freq
4000_TNK-CR1 67477 461 5 0.0068320 0.0000741
4001_TNK-CR1 57833 738 112 0.0127609 0.0019366
4002_TNK-CR1 241125 1649 477 0.0068388 0.0019782
4003_TNK-CR1 80699 640 75 0.0079307 0.0009294

Another common statistic that is reported if MFI (median fluorescence intensity). To extract MFI, we can again make use of gs_pop_get_stat. It is important to realize that MFI is often reported in raw/linear scale.

mfi_dataframe <- gs_pop_get_stats(gs,
                 nodes = c("CD3+ T cells", "MAIT Cells", "CD4+CD8a-", "CD4-CD8a+"),
                 type = pop.MFI,
                 inverse.transform = TRUE)

mfi_dataframe[1:5, 1:5]
##          sample          pop TCR Vd1 FITC CD127 BB630 PD1 BB660
## 1: 4000_TNK-CR1 CD3+ T cells     255.0227   1072.3564  142.1997
## 2: 4000_TNK-CR1   MAIT Cells     273.8928   2158.7324  557.8886
## 3: 4000_TNK-CR1    CD4+CD8a-     243.4144   1254.2268  139.9552
## 4: 4000_TNK-CR1    CD4-CD8a+     266.7055    856.4607  121.5325
## 5: 4001_TNK-CR1 CD3+ T cells     227.1721    626.4796  133.4854

For convenience, MFI are extracted for all markers for each of the specified population.

A simple report

In the table above, we noticed that expression of PD1 is higher in MAIT Cells. Below, we create 2 plots to help drive this point (if this is of interest). We will make use of ggcyto and ggplot2

Boxplot

# plot some data
pd1_mfi_plot <- dplyr::left_join(pData(gs),mfi_dataframe, 
          by = c("name" = "sample")) %>% 
  ggplot(aes(x = mock_treatment, y = `PD1 BB660`, fill = pop))+
  geom_boxplot()+
  labs(x = "Cell Type", y = "median PD1 expression")+
  scale_fill_manual(values = c("CD3+ T cells" = "black",
                               "MAIT Cells" = "red",
                               "CD4+CD8a-" = "blue",
                               "CD4-CD8a+" = "green"),
                    )+
  theme(aspect.ratio = 1, 
        legend.position = "right")

pd1_mfi_plot

Overlaid histogram

pd1_overlay <- ggcyto(gs,
       subset = "MAIT Cells",
       aes(x = "PD1"))+
  geom_density(fill = "red", alpha = 0.3)+ # MAIT Cells in red
  geom_overlay(
    data = gs_pop_get_data(gs,"CD3+ T cells"),
    fill = "black", alpha = 0.5)+ # CD3+ Cells in black
   geom_overlay(
    data = gs_pop_get_data(gs,"CD4+CD8a-"),
    fill = "blue", alpha = 0.5)+ # CD4+ T Cells in blue
   geom_overlay(
    data = gs_pop_get_data(gs,"CD4-CD8a+"),
    fill = "green", alpha = 0.5)+ # CD8+ T Cells in green
  axis_x_inverse_trans()+
  facet_wrap(mock_treatment~name, nrow = 1)+
  labs(title = "",x = "PD1")+
  theme(aspect.ratio = 1)
pd1_overlay <- as.ggplot(pd1_overlay)

Final Figure

gridExtra::grid.arrange(
 pd1_mfi_plot,
 pd1_overlay,
 nrow = 2,
 left = "PD1 expression in T cell subsets"
)

It is now clear that the expression of PD1 indeed tends to be higher on MAIT Cells compared to non-MAIT T Cells.

(Optional) Additional statistics

It is also possible to calculate additional statistics for the population of interest. For instance, if you were interested in nth percentile expression of a specific marker you could define a function and use it in gs_pop_get_stats like so:

# define a function
my_quantile <- function(fr,percentile,chnl){
  matched_chnl <- flowCore::getChannelMarker(fr,chnl) # match channel name
  res <- apply(exprs(fr)[,matched_chnl[["name"]],drop = FALSE], 2, quantile,percentile) # get quantile for specific channel
  names(res) <- matched_chnl[["desc"]]
  return(res)
}

# get stats
median_cd4 <- gs_pop_get_stats(gs,
                 c("lymphocytes","CD3+ T cells","CD4+CD8a-","CD4-CD8a+"), 
                 type = my_quantile,
                 inverse.transform = TRUE,
                 stats.fun.arg = list(percentile = .5,
                                      chnl = "cd4")) 
knitr::kable(median_cd4,caption = "Median expression of CD4 extracted using user defined function")
Median expression of CD4 extracted using user defined function
sample pop CD4 BUV805
4000_TNK-CR1 lymphocytes 1043.205715
4000_TNK-CR1 CD3+ T cells 1393.012354
4000_TNK-CR1 CD4+CD8a- 1866.839339
4000_TNK-CR1 CD4-CD8a+ 3.203119
4001_TNK-CR1 lymphocytes 873.184795
4001_TNK-CR1 CD3+ T cells 1366.918737
4001_TNK-CR1 CD4+CD8a- 1805.953805
4001_TNK-CR1 CD4-CD8a+ 4.153675
4002_TNK-CR1 lymphocytes 328.090372
4002_TNK-CR1 CD3+ T cells 959.012183
4002_TNK-CR1 CD4+CD8a- 1401.996476
4002_TNK-CR1 CD4-CD8a+ -8.668585
4003_TNK-CR1 lymphocytes 785.155278
4003_TNK-CR1 CD3+ T cells 1486.741756
4003_TNK-CR1 CD4+CD8a- 1758.953449
4003_TNK-CR1 CD4-CD8a+ -9.211168
As we see, we were able to extract median expression of the marker CD4 on 4 different populations. As expected, expression is highest on CD4+ T Cells and lowest on CD8+ T Cells.

Conclusion

In this section, we show how to easily generate reports and visualizations from the GatingSet object which we created earlier. We demonstrated some approaches to easily extract counts, frequencies, MFI, or a user defined metric from the GatingSet object easily. Also, as noted in the section Visualizations, the ggcyto makes use of the ggplot2 framework providing a familiar user interface. Importantly, this allows users to prepare intricate figures for events within the GatingSet.