A new data-driven cell population discovery and annotation method for single-cell data, FAUST, reveals correlates of clinical response to cancer immunotherapy

We introduce a non-parametric method for unbiased cell population discovery in single-cell flow and mass cytometry that annotates cell populations with biologically interpretable phenotypes through a new procedure called Full Annotation Using Shape-constrained Trees (FAUST). We used FAUST to discover novel (and validate known) cell populations associated with treatment outcome across three cancer immunotherapy clinical trials. In a Merkel cell carcinoma anti-PD-1 trial, we detected a PD-1 expressing CD8+ T cell population – undetected both by manual gating and existing computational discovery approaches – in blood at baseline that was associated with outcome and correlated with PD-1 IHC and T cell clonality in the tumor. We also validated a previously reported cellular correlate in a melanoma trial, and detected it de novo in two independent trials. We show that FAUST’s phenotypic annotations enable cross-study data integration and multivariate analysis in the presence of heterogeneous data and diverse immunophenotyping staining panels, demonstrating FAUST is a powerful method for unbiased discovery in single-cell data.

: Overview of FAUST. FAUST estimates annotation boundaries for an experimental unit. An experimental unit is user defined and can be a sample, stimulation condition, subject, batch, or site. This schematic overview of FAUST assumes the experimental unit is an individual sample stained with a panel of cell markers as detected by cytometry. A) To estimate annotation boundaries, FAUST grows an exhaustive forest of 1-dimensional, depth-3 gating strategies, constrained by shape: if, prior to depth-3, the cells in a node of the gating strategy have unimodal expression along all markers, the gating strategy along that path terminates. B) Annotation boundaries are estimated for markers within an experimental unit by averaging over gates drawn for that marker over the entire annotation forest. A "depth score" (Methods 4.4) is derived for each marker and it quantifies how well-gated the marker is in each experimental unit. The distribution of scores across experimental units is used to determine whether a marker should be included in the discovery process and to determine the number of annotation boundaries a marker should receive. C) This procedure ensures that FAUST selects a standard set of markers for discovery and annotation as well as a standard number of annotation boundaries per selected marker. D) For each experimental unit, FAUST then relaxes the depth-3 constraint and conducts a search of 1-dimensional gating strategies in order to discover and select phenotypes present in the experimental unit. Each discovered phenotype is given a score that quantifies the homogeneity of cells in an experimental unit with that phenotype; high-scoring phenotypes are then selected for annotation (Methods 4.8). Each selected phenotype is annotated using all selected markers from step C), regardless of the specific gating strategy that led to the phenotype's discovery. E) FAUST returns an annotated count matrix with counts of cells in each phenotypic region discovered and selected in step D) that also survives down-selection by frequency of occurrence across experimental units. the underlying protein data, and that any method that solely relies on UMAP for population 126 discovery would likely miss these sub-populations. Figure 2: FAUST annotations reflect underlying protein expression not captured by dimensionality reduction. A) In a baseline responder's sample, the densities of per-marker fluorescence intensity for cells in the four correlates (different colors) as well as the entire collection of live lymphocytes in the sample (gray) are compared. Cells used in density calculations are marked by tick marks and demonstrate that differences in cluster annotations reflect strict expression differences in the underlying data. B) A UMAP embedding computed from the same sample as panel A using the ten stated protein markers. All cells in the sample were used to compute the embedding. The embedding is colored by the relative intensity of observed PD-1 expression, windsorized at the 1st and 99th percentile, and scaled to the unit interval. A random subset of 10,000 cells is displayed from 233,736 cells in the sample together with the complete set of 61 CD8+ PD-1 dim cells, 176 CD8+ PD-1 bright cells, 450 CD45RA-CD4 bright PD-1 dim cells, and 76 CD45RA+ CD4 bright PD-1 dim cells. C) The same UMAP embedding highlighting the location of the cells from the four discovered sub-populations. FAUST annotations are listed in depth-score order (Methods 4.4), from highest depth score to lowest. The sub-populations are annotated by FAUST as: CD4 bright CD3+ CD8-CD45RA-HLA-DR-CD28+ PD-1 dim CD25-CD127-CCR7-(yellow cells in solid red box); CD4-CD3+ CD8+ CD45RA-HLA-DR+ CD28+ PD-1 bright CD25-CD127-CCR7-(orange cells in dashed blue box); CD4-CD3+ CD8+ CD45RA-HLA-DR+ CD28+ PD-1 dim CD25-CD127-CCR7-(purple cells in dashed blue box); CD4 bright CD3+ CD8-CD45RA+ HLA-DR-CD28-PD-1 dim CD25-CD127+ CCR7+ (dark blue in dotted green box).
Reports that CD8 T cells co-expressing HLA-DR and CD28 can exhibit anti-viral properties [29], 128 as well as reports of CD28 dependent rescue of exhausted CD8 T cells by anti-PD1 therapies in 129 mice [26], led us to investigate the association between the abundance of the therapeutic-response-130 associated sub-populations discovered by FAUST and tumor viral status of each subject, as MCC 131 is a viral-associated malignancy. We adapted the differential abundance GLMM to test for an 132 interaction between response to therapy and tumor viral status in the four cell sub-populations 133 discovered and annotated by FAUST. This interaction was statistically significant for both CD8+ 134 correlates. The observed interaction p-value of 0.026 for the CD8+ PD-1 dim correlate ( Figure 3A) 135 suggested that these T cells may be particularly relevant in subjects with virus-positive tumors.

136
In order to further investigate the relevance of these T cells measured in blood, we examined 137 published data on PD-1 immunohistochemistry (IHC) staining in tumor biopsies from the same  Figure 3B). 144 We also examined published TCR clonality data generated from patient tumor samples, 145 described in [31]. Ten subjects passing clonality QC were common to the two datasets, six of which 146 were virus positive. Frequencies of the FAUST populations within these six subjects were strongly 147 correlated (r = 0.952) with the measurement of productive clonality ( Figure 3C). Normalizing the 148 correlate cell counts by the total number of CD3+ annotated FAUST sub-populations (i.e., total T 149 cells, the recommended normalization constant for T cell clonality) instead of total lymphocyte 150 count produced an observed correlation of r = 0.972 (Supplementary Figure S1). Together, 151 these results led us to hypothesize that the CD8+ T cell correlate discovered by FAUST in 152 blood represents a circulating population of tumor-associated virus-specific T cells that are also 153 detectable in the tumor and whose presence in the tumor is known to correlate with outcome.

154
Due to the small sample size, this hypothesis must be confirmed on an independent, larger set 155 of patient samples. However, our results demonstrate that FAUST discovers and annotates cell 156 sub-populations that are immunologically plausible, suggest a testable hypothesis for follow-up experimentation, and potentially have clinical utility. Figure 3: A CD8+ PD-1 dim CD28+ HLA-DR+ T cell sub-population discovered and annotated by FAUST is associated with outcome in virus positive subjects and with independent measurements of PD-1 and T cell clonality in the tumor. A) Boxplots of the abundance of the CD8+ PD-1 dim CD28+ HLA-DR+ T cell outcome correlate discovered by FAUST, stratified by subjects' response to therapy (FDR adjusted p-value contrasting all responders (n = 18) vs all non-responders (n = 9): 0.022) and their viral status (unadjusted p-value of interaction: 0.026). B) The abundance of the CD8+ PD-1 dim CD28+ HLA-DR+ T cell correlate among virus positive subjects against total PD-1 expression measured by IHC from tumor biopsies as described in [30], with observed correlation in virus positive subjects (n=4) of 0.942. C) The abundance of the CD8+ PD-1 dim CD28+ HLA-DR+ T cell correlate among virus positive subjects plotted against productive clonality (1-normalized entropy) from tumor samples as described in [31], with observed correlation in virus positive subjects (n=6) of 0.959. Supplementary Figure S2 displays the remaining correlates.  We also analyzed flow cytometry data from a second CITN trial: CITN-07 (NCT02129075, see 171 supplementary table S1 for trial data), a randomized phase II trial studying immune responses   These results demonstrate that FAUST is able to detect, annotate, and correctly assign abun-   A) The baseline outcome-associated subpopulation discovered by FAUST in the MCC anti-PD-1 trial myeloid data (n=15, 10 Responders, 5 Non-Responders). The full FAUST annotation for the sub-population was CD33 bright CD16-CD15-HLA-DR bright CD14+ CD3-CD11B+ CD20-CD19-CD56-CD11C+. B) The baseline outcome-associated sub-population discovered by FAUST in the FLT3-L therapeutic Vx trial myeloid data (n=32, 18 No Recurrence, 14 Recurrence). The full FAUST annotation for the subpopulation was CD8-CD3-HLA-DR+ CD4-CD19-CD14+ CD11C+ CD123-CD16-CD56-. C) The baseline outcome-associated sub-population found by FAUST from the re-analysis of the Krieg CyTOF panel 03 (stratified by batch) (n=19, 10 Responder, 9 Non-Responder). The full FAUST annotation for the sub-population was CD16-CD14+ CD11B+ CD11C+ ICAM1+ CD62L-CD33+ PDL1+ CD7-CD56-HLA-DR+. D) The baseline outcome-associated sub-population found by FAUST from the re-analysis of the Krieg FACS validation data (n=31, 16 Responder, 15 Non-Responder). The full FAUST annotation for the sub-population was CD3-CD4+ HLA-DR+ CD19-CD14+ CD11B+ CD56-CD16-CD45RO+. 210 FAUST annotations make it possible to test hypotheses involving prior biological knowledge of 211 hierarchical relationships among cell types. By jointly modeling those annotated populations 212 related through biological hierarchy, we are able to account for their dependence structure when 213 conducting secondary tests of interests. This is analogous to the techniques used to perform gene 214 set enrichment analysis in gene expression data [35]. We contrast this approach against aggregating 215 (i.e., summing) cell sub-population counts on the basis of their common annotations to derive 216 ancestral populations that resemble those obtained by manual gating, which we hypothesized can 217 obscure interesting signals in the data.

218
To demonstrate this we tested each of four different myeloid sub-compartments for association 219 with outcome at baseline in each of the three trials which used heterogeneous marker panels. 220 We used the FAUST annotations to define membership in the myeloid compartment (described   Table S5).

251
In contrast, the multivariate model also detected a significant association between outcome and 252 increased abundance in the CD14-CD16-dendritic cell sub-compartment ( Figure 6B) in the two 253 CITN trials, consistent with our analysis of baseline predictors in those trials. We did not detect 254 such an association in the DC sub-compartment in the Melanoma anti-PD1 trial. Since both the 255 CITN trials used fresh blood samples for analysis while the latter used frozen PBMC samples [8], 256 we hypothesize the observed differences in modeling outcomes is due to cryopreservation status, 257 a hypothesis supported by studies [36, 37] that examine the differential effect of cryopreservation 258 on monocytes and DCs, respectively. This multivariate modeling approach demonstrates how 259 FAUST can enable cross-study data integration and analysis even in the presence of heterogeneous 260 staining panels. Figure 6: Standardized annotation of clusters enables cross-study meta-analysis of datasets stained with disparate marker panels. Differential abundance between responders and nonresponders across the different sub-compartments was tested by aggregating model coefficients (analogous to meta-analysis over cell sub-populations in a sub-compartment) from a multivariate GLMM and by univariate modeling of aggregated cell counts. One-sided, 99% (Bonferroniadjusted) confidence intervals for increased abundance in responders vs. non-responders are displayed for each sub-compartment in each dataset. In all modeling scenarios, when the whisker of a forest-plot line crosses the vertical red-line at 1, this indicates the increased odds in the responders vs non-responders are not statistically significant at the Bonferonni-adjusted level. A) Cells in the CD14+ CD16-HLA-DR+ sub-compartment were found to be significantly more abundant in responders than non-responders in all datasets tested using the multivariate modeling approach. In the univariate modeling of aggregate cell counts, the CD14+ CD16-HLA-DR+ subcompartment was only significant in the melanoma anti-PD-1 FACS dataset, consistent with the authors' published findings. The x-axis can be interpreted as the odds increase in the probability of observing more cells in the responders than the non-responders in the compartment. B) Cells in the CD14-CD16-HLA-DR+ sub-compartment were found to be significantly more abundant in responders than non-responders in the two CITN datasets tested using the multivariate modeling approach. We hypothesize that the observed difference between the CITN trials and the melanoma anti-PD-1 trial is explained by cryopreservation in the latter trial, since it has been reported that cryopreservation affects the relative abundance of pDCs and mDCs [37], but does not affect monocyte function [36]. See supplementary Information A.12 for results from the other compartments. 262 We re-analyzed the MCC anti-PD-1 T cell dataset described in section 2. 1  to therapy at the FDR-adjusted 5% level (Supplementary Section A.7). 267 We also conducted a simulation study that simulated the discovery process in cytometry data 268 analysis by inducing a differentially abundant population associated with a simulated response to 269 therapy, in datasets generated from a variety of mixture models (Supplementary Section A.13). 270 We compared FAUST to FlowSOM in this study since FlowSOM is computationally efficient, is  We applied FAUST to five datasets (CyTOF and flow) from three independent immunotherapy tri-278 als. Across these trials, FAUST discovered cell sub-populations and labeled them with annotations 279 that are generally consistent with previous manual gating of the cytometry data (when aggregated 280 by appropriate annotation) as well as with the known biological context, strongly supporting this 281 novel unbiased approach. 282 We found FAUST discovered cell populations associated with clinical outcome in the analyzed 283 datasets that are missed by other methods. Notably, manual gating did not identify statistically 284 significant correlates of outcome in the MCC anti-PD-1 baseline T cell data. The multivariate 285 analyses (Section 2.4) found that only some fully-annotated sub-populations exhibit differential  Table S6), since the manual 294 gating strategy was designed to interrogate MDSCs in the subjects. In the FLT3-L trial, CD14+, 295 mDCs, and pDCs were manually gated, but a differential signal at the FDR-adjusted 5% level was 296 not detected at baseline. subsequently discovers in each sample. In consequence, FAUST produces annotations that describe 304 the protein expression of each discovered sub-population relative to the starting population of 305 cells in the sample, and differs in kind from the standard paradigm of following a path in a single 306 gating strategy to arrive at a phenotype. We hypothesize it is these methodological characteristics -307 as well as its pervasive use of non-parametric statistical methods -that explain FAUST's discovery 308 performance relative to manual gating on the analyzed datasets. FAUST is detecting real biological signals in the analyzed datasets.

326
As with any computational method, FAUST has tuning parameters that need to be adjusted 327 to analyze real experimental data. These parameters are described in section 4.9, and in our 328 view are uniquely interpretable among computational methods since they affect how FAUST 329 processes 1-dimensional density estimates. Our results demonstrate that FAUST can consistently 330 detect immunologically-plausible candidate biomarkers from measurements made in blood using 331 a simple, well-understood assay. Many large experimental flow cytometry datasets already exist, 332 and FAUST has the potential for the productive re-analysis and meta-analysis of such data.  If pre-gating has not been performed by an investigator, computational methods [42, 43] can 340 be used before applying FAUST to cytometry data in order to guarantee this assumption is met.
FAUST assumes the mixture components f m,i of an experimental unit in (4.1) belong to the class of densities on the space of protein measurements for each experimental unit i, with the common class F is defined as    append(multimodalList, markerIndex) 10: if length(multimodalList) == 0 then 11: return strategy . Gating strategy stops due to shape constraint.
where, abusing notation, we let 408 a R ⌘ a R (N m,j ) ⌘ the dip test p-value in the root population of the gating strategy that led to N m,j .
We allow a C and a G to be defined similarly. The function Q can be interpreted as a measure of the 409 quality of the gating strategy that led to node N m,j . In the case of a grandchild node that had clear Finally, define The depth score is defined to be the normalized sum . score an e above zero, and are effectively never selected by FAUST for discovery and annotation.

427
Hence the restriction to 3-marker gating strategies.
In the event FAUST selects G j , j > 1 (i.e., multiple annotation boundaries), similar weighted 439 averages are taken for g 2 (N m,j ), etc.

FAUST method: phenotype discovery and cluster annotation
Partition tree construction is similar to tree construction for the annotation forest (4.3), but they are 466 not depth-constrained: a tree continues to grow following the previously described strategy until        we assume c i,k ⇠ Binomial(n i , µ i,k ). Our model is where Responder is an indicator variable equal to 1 when the subject exhibits complete or partial 525 response to therapy, and 0 otherwise, and each x i,k ⇠ N(0, s 2 i,k ) is a subject-level random effect.

526
The R package lme4 was used to fit all GLMMs [48].  132 cell populations. We tested each discovered cell population at the cohort-specific baseline for 554 association with recurrence of disease (14 subjects had disease recur, 18 did not have disease recur). 555 We analyzed the baseline counts using a model similar to (4.5). Here, the model was adjusted for subject-to-subject variability using a random effect, while cohort status, recurrence, and NYESO-1 557 staining of the tumor by immunohistochemistry (measured as positive, negative, or undetermined) 558 were modeled as population effects.  Our baseline model was similar to (4.5), but was modified by

CITN-09 Myeloid
where j 2 {1, 2}, and h i,j ⇠ N(0, s 2 j ) is a random effect included to model the batch effects.
where Cluster i,j is an indicator variable that is 1 when observation i is from cluster j and 0 otherwise, Responder i is an indicator variable when observation i is taken from a responding subject, and h i ⇠ N(0, s 2 i ) is an observation-level random effect. To test for differential abundance across a compartment, we test for positivity of linear combination of the coefficients b i,j in (4.6).
For example to test for differential abundance across an entire compartment, we test

585
For the aggregate analysis, compartment definitions are the same as presented in section 4.15.

586
Counts are derived by summing across FAUST clusters within each compartment. The model(4.5) 587 is then used to test each derived compartment for differential abundance.