Hamilton’s rule, gradual evolution, and the optimal (feedback) control of reaction norms and other function-valued traits

Most phenotypes, such as gene expression profiles, developmental trajectories, behavioural sequences or other reaction norms are function-valued traits, since they vary across an individual’s age and in response to various internal and/or external factors (state variables). In turn, many individuals live in populations subject to some limited genetic mixing and are thus likely to interact with their relatives. We here formalise selection on function-valued traits when individuals interact in a group-structured population, by deriving the marginal version of Hamilton’s rule for function-valued traits. This rule simultaneously gives a condition for the invasion of an initially rare mutant function-valued trait and its ultimate fixation in the population (invasion thus implies substitution). Hamilton’s rule thus underlies the gradual evolution of function-valued traits and gives rise to necessary first-order conditions for uninvadability (evolutionary stability). Using and extending results from optimal control theory and differential game theory, we characterise the first-order condition for time-dependent traits (dynamic traits) in terms of dynamic constraints on state variables and their marginal effects on reproductive value. Our results apply to both open-loop traits, which are function of time (or age) only, and closed-loop (state-feedback) traits, which are function of both time and state. This allows us to delineate role of state-dependence of trait expression and thus to other’s traits affects selection on function-valued trait, which pertains to both life-history and social evolution.


Introduction
Any biological organism is an open system exchanging energy and information with its surrounding.
As such, most if not all traits of an organism may vary in response to changes of its internal factors as well as to changes in its external biotic and abiotic environmental conditions. Examples include gene expression profiles, physiological processes, developmental trajectories, morphological shapes, and behavioural sequences, which we here refer collectively as function-valued traits and by which we mean a phenotype whose expression level depends on some index variable describing the domain over which the phenotype can vary (e.g., time, space, temperature, behaviour of others; and where traits that vary with the environment are often collectively called reaction norms). The shapes of function-valued traits are moulded by natural selection and therefore formalising how selection shapes these traits helps to understand their evolution (Gomulkiewicz et al., 2018).
Selection on quantitative function-valued traits has been formalised using different theoretical approaches, which consider different perspectives on the evolution of these traits. First, the evolution of life-history schedules has often been studied by applying Pontryagin's maximum principle (e.g., León, 1976;Oster and Wilson, 1977;Schaffer, 1982;Iwasa and Roughgarden, 1984;Stearns, 1992;Perrin, 1992;Bulmer, 1994;Irie and Iwasa, 2005;Metz et al., 2016). Here, a trait evolves to vary as a function of the age or time of interaction of individuals, while individual fitness can be constrained by the dynamics of "state variables". These are observables describing internal condition(s) of the individual and/or that of its environment, e.g., fat reserves, information, resource availability, behaviour of others, that in turn depend on trait expression. This approach formalises so-called open-loop traits (Weber, 2011;Liberzon, 2011), which are only time-or age-dependent traits. Selection on open-loop traits has been formalised to include interactions between relatives, which allows the assumption of limited dispersal or spatial or family-structured populations (Day and Taylor, 1997, 1998Wild, 2011). Second, in behavioural ecology and evolutionary game theory, selection on function-valued traits has typically been studied by using dynamic programming (see Houston et al., 1999;Mangel et al., 1988 for textbook treatments and e.g. Leimar, 1997;Ewald et al., 2007;McNamara and Houston, 1987;Dechaume-Moncharmont et al., 2005. Here, a trait evolves to vary as a function of (some relevant) state variables and time. This formalises so-called closed-loop strategies (Weber, 2011;Liberzon, 2011) as these involve a feedback between current and future trait expression. Both Pontryagin's maximum principle and dynamic programming are optimal control theory approaches (e.g., Bryson and Ho, 1975;Kamien and Schwartz, 2012;Weber, 2011;Liberzon, 2011), whose general aim is to identify a control variable over a period of time that maximises (in the best response sense in the presence of interactions between individuals) an objective function (fitness in biology) under the constraints of a dynamical system on state variables. Third, in quantitative genetics theory, a directional selection coefficient on function-valued has been derived (Gomulkiewicz and Beder, 1996;Beder and Gomulkiewicz, 1998) (assuming no interactions between individuals), which can be decomposed with component-wise selection coefficient describing how the mean of each component of a function-valued trait evolves over short-time scales, i.e., the time scales of demographic changes (Kirkpatrick and Heckman, 1989;Gomulkiewicz and Kirkpatrick, 1992;Gomulkiewicz and Beder, 1996).
Finally, the connected approach of invasion analysis has investigated long-term adaptive dynamics of open-loop function-valued traits, which allows to focus on candidate endpoints of evolution Parvinen et al., 2006Parvinen et al., , 2013.
What are the relationships between these different approaches? When do closed-loop traits and openloop traits lead to different evolutionary outcomes? How does the directional selection coefficient on function-valued traits connect to gradual evolution and Hamilton's rule in group-structured populations?
There are many open questions about the evolution of function-valued traits. By contrast, many general principles have been proven to hold for scalar traits (e.g. body size at maturity, sex allocation throughout life). In particular, for small trait deviations (weak selection), the selection coefficient on a scalar quantitative trait in a population subject to limited genetic mixing can be expressed as a marginal version of Hamilton's rule, where the direct and indirect fitness effects (the "cost" and "benefit") are given by partial derivatives of individual fitness (e.g., Taylor and Frank, 1996;Frank, 1998;Roze and Rousset, 2003;Rousset, 2004;Lehmann and Rousset, 2014;Van Cleve, 2015). This provides two useful results about gradual quantitative evolution. First, since the selection coefficient is independent of allele frequency at all frequencies and is of constant sign (Roze andRousset, 2003, 2004;Rousset, 2004;Lehmann and Rousset, 2014), gradual evolution is vindicated even when the survival and reproduction of individuals depend on the behaviour of others, such like under density-and frequency-dependent selection. Second, when the selection gradient vanishes, Hamilton's rule provides the necessary first-order condition for a strategy to be locally uninvadable; that is, it allows to determine candidate evolutionary stable strategies, which is central to the characterisation of long-term evolution (Geritz et al., 1998;Rousset, 2004) and thus of adaption.
Our goal in this paper is twofold. First, it is to formalise the selection coefficient on quantitative function-valued traits under weak selection and limited genetic mixing (taking panmictic population as a special case into account) in order to characterise the gradual evolution of function-valued traits. Our second aim is to understand the role of trait dependence to state variables for selection on functionvalued traits. To achieve these aims, the rest of this paper is organised as follows. (1) We derive the selection coefficient acting on a mutant allele coding for a function-valued trait in the island model of dispersal (group-structured population), which yields the marginal version of Hamilton's rule for functionvalued traits. We deduce from Hamilton's rule the necessary first-order condition for local uninvadability, which yields the candidate uninvadable function-valued traits and applies to both continuous and discrete traits.
(2) We apply these results to time-dependent function-valued traits (dynamic traits) by deriving necessary conditions for uninvadability expressed in terms of dynamic constraints on state variables and their (marginal) effects on the reproductive value. This allows to compare how selection acts on openloop versus closed-loop traits, specifying the role of trait responsiveness, and thus allows to establish the connection between the dynamic programming and the maximum principle type of approaches in the context of gradual phenotypic evolution. (3) We illustrate the different main concepts of our approach by analysing the evolution of temporal common pool resource production within groups. (4) Finally, we discuss the scope of our results.

Biological scenario
Consider a haploid population subdivided into an infinite number of homogeneous groups (without division into class structure) with a fixed number N of individuals, where censusing takes place at discrete demographic time periods. All groups are subject to the same environmental conditions and are equally connected to each other by random dispersal. A discrete demographic time period spans an entire life cycle iteration where various events can occur (e.g. growth, reproduction, dispersal) to individuals. The life cycle may allow for one, several, or all individuals per group to die (thus including whole group extinction through environmental stochasticity or warfare). Generations can thus overlap but when this occurs, the parents are considered equal (in respect to their "demographic properties") to their offspring in each generation (since there is no within group class structure). Dispersal can occur before, during, or after reproduction, and in groups, so that more than one offspring from the same natal group can establish in a non-natal group (i.e., propagule dispersal). We refer to this group-structured population where all individuals within groups are indistinguishable, as the homogeneous island population (i.e., broadly this corresponds to the infinite island model of dispersal of Wright, 1931, used since at least Eshel, 1972 under various versions to understand selection on social traits, e.g., Rousset, 2004, and where the specifics of our demographic assumptions are equivalent to those considered in Mullon et al. 2016).
We assume that two alleles segregate in the homogeneous island population at a locus of interest: a mutant allele with trait u m ∈ U[T ] and a resident (wild-type) allele with trait u ∈ U [T ]. Here, U [T ] is the set of feasible traits that individuals can potentially express and defined as the set of real-valued functions with range U and domain T , where T is a space of some index variable(s) representing, for instance, time, an environmental gradient or some cue. We assume here that T is a closed interval over some discrete or continuous index variable t. If t is discrete, then the element u ∈ U[T ] is a vector and if t is continuous then the element u ∈ U[T ] is a piece-wise continuous function. Hence, we write u = {u(t)} t∈T to emphasise that it consists of the (finite or infinite) collection of all point-wise values u(t) of the trait. Namely, u can be thought of as a continuous or a discrete "path" (or a schedule, or a trajectory) on the space T .
The crucial assumption of this paper is that the mutant trait u m can be expressed in the following form as a small deviation from the resident trait: where the phenotypic deviation function η = {η(t)} t∈T must satisfy u + η ∈ U[T ] for sufficiently small non-negative parameter . Because U[T ] may have a boundary, not all phenotypic deviations generate a mutant strategy u m that remains within the bounds of the feasible trait space u m ∈ U[T ], independent of the choice of (see also Section 2.3.2). Note that we are making a distinction between a phenotypic deviation η and the effect size of that deviation (in the literature, a (scalar) mutant effect is often modelled with the notation δ = η , e.g. Rousset, 2004). This distinction in the notation is necessary for analysing selection on function-valued traits.

Allele frequency change and short-term evolution
Our first aim is to characterise the change in mutant allele frequency in the homogeneous island population under weak selection ( 1). To that end, it is useful to follow the direct fitness approach (Taylor and Frank, 1996;Rousset and Billiard, 2000;Rousset, 2004) and introduce the individual fitness function gives the expected number of successful offspring produced over one life cycle iteration by a focal individual (possibly including self through survival) with trait u • , when its average neighbour in the focal group has trait u • and an average individual (from other groups) in the population has trait u, which is taken here to be the resident trait. We note that any individual in the population can be taken to be the focal individual (Rousset and Billiard, 2000;Rousset, 2004) and that the fitness of this individual can always be expressed in terms of average phenotypes of other individuals in different roles with respect to the focal (e.g., group neighbour, cousin, members of other groups, etc.), whenever mutant and resident phenotypes are closely similar (see the argument in Appendix A.2 for function-valued traits and a textbook argument for scalar traits e.g. Rousset, 2004, p. 95). These individuals in different roles as well as the focal individual itself are actors on the fitness of the focal, which will throughout be regarded as a focal recipient of the trait expressions of different categories of actors (i.e., recipient-centred approach).
In terms of this definition of individual fitness, we define the direct fitness effect of expressing the mutant allele as which is the effect that the focal individual has on its own fitness if it would switch from expressing the resident to the mutant allele for a small allelic effect. Analogously, we define the indirect fitness effect of expressing the mutant allele as which is the effect that the whole set of neighbours have on focal's fitness if they were to all switch from expressing the resident to the mutant allele. Finally, let us denote by r(u) the neutral relatedness between two randomly sampled group neighbours (Michod and Hamilton, 1980;Frank, 1998) in the homogeneous island population that is monomorphic for the resident; namely, r(u) is the probability that in a neutral process (where all individuals are alike) the two homologous alleles of these individuals coalesce in the same common ancestor (e.g. Roze and Rousset, 2003;Rousset, 2004;Lehmann and Rousset, 2014;Van Cleve, 2015). Note that relatedness defined as such depends only on the resident trait. In Appendix A, we show that the change ∆p in the frequency p of the mutant allele over one demographic time period (one life cycle iteration) can be expressed in terms of these quantities as follows.
Invasion implies substitution-principle result. In the homogeneous island population with two alleles, the change in mutant allele frequency p in the population takes the form where p(1 − p) is the genetic variance at the locus of interest, is a selection coefficient of order O( ) that is independent of p, and O( 2 ) is a remainder of all higher order terms. This entails an "invasion implies substitution" property of the mutant allele, which says that if s η (u) > 0, the mutant allele coding for a small function-valued deviation η is selected for and not only invades but substitutes the (ancestral) resident allele [since effects of order O( 2 ) can be neglected in eq. (4) whenever s η (u) is non-zero].
We have thus formalised an "invasion implies substitution"-principle (see Priklopil and Lehmann, form of Hamilton's rule: the mutant spreads if r(u)b η (u) − c η (u) > 0. This result is a generalisation of previous analogous results for scalar traits (Roze andRousset, 2003, 2004;Rousset, 2004;Lehmann and Rousset, 2014). Owing to its simplicity, the function-valued trait nature of our result is perhaps yet not fully apparent, but is made explicit on noting that the direct and indirect effects (eqs. 2-3) are both formally Gâteaux derivatives, which are directional derivatives (see section A.1 in Appendix for a formal definition and e.g., Troutman, 1991, p. 45-50, Luenberger, 1997 and represent change in fitness resulting from a sum of all weighted component-wise changes in trait expression (over the domain T ) induced by the mutation function η. To outline the component-wise change in fitness, it is useful to write the selection coefficient as where · is an inner product on functions (the generalisation of a dot product, and will be used as such throughout), s(u) = {s(t, u)} t∈T is the selection gradient function, where the component s(t, u) gives the selection gradient on component u(t) of the trait, i.e. the value of u at time t, holding other components u(t ) (for all t = t ∈ T ) of the trait fixed. Each component of the selection gradient function is then given by where are, respectively, the effect on the focal's own fitness from changing marginally component u • (t) of its trait, while holding other trait components u • (t ) (for t = t) fixed, while b(t, u) is the effect of all group neighbours on the focal individuals fitness when changing marginally component u • (t) of their traits, while holding other components u • (t ) (for t = t) of their traits fixed. That is, the costs and benefits are partial derivatives and s(t, u) is the inclusive fitness effect on a focal individual. When t is discrete and T finite, eq. (7) corresponds to the trait specific inclusive fitness effect derived previously for a backdrop monomorphic resident homogeneous island population (Mullon et al., 2016, eq. 12).
Eq. (6) shows that the selection coefficient is a weighted change of trait-specific changes. Note that for continuous index variable t over the interval T , the partial derivatives −c(t, u) and b(t, u) in eq. (8) are formally functional derivatives (e.g. Troutman, 1991, p. 45-50, Luenberger, 1997. In the absence of interactions between relatives s(t, u) reduces to β(y) in eq. 1 of Gomulkiewicz and Kirkpatrick (1992) for y = t, G(a) in eq. 4 of Parvinen et al. (2006) for a = t, or g(a; u) in eq. 3 in Metz et al. (2016) for a = t (but see ∆W incl (t) in eq. 25 of Day and Taylor, 2000, which incorporates interactions between relatives).

Necessary condition for local uninvadability and long-term evolution
It follows from Section 2.2 that a necessary first-order condition for a trait u * to be locally uninvadable (resistant to invasion by any mutant in a small radius 1) is that the selection coefficient is nonpositive for all admissible mutants in the resident u * population, that is Local resistance to invasion by sets of alternative mutants allows to characterise candidate long-term evolutionary outcomes (Eshel and Feldman, 1984;Eshel, 1996;Eshel et al., 1998) and is a first-step (and often the only accessible computational step) towards characterising uninvadable traits.
A crucial question is whether a locally uninvadable strategy u * will be approached by gradual evolution from within its neighbourhood and thus be convergence stable (Eshel, 1983;Lessard, 1990;Geritz et al., 1998;Rousset, 2004;Leimar, 2009). Because characterising convergence stability involves a second order analysis of the selection coefficient, which is involved for multidimensional traits (Lessard, 1990;Leimar, 2009), it will not be investigated further in this paper. For the same reason, we will also not consider sufficient conditions for local uninvadability. In the remainder of this section, we focus on characterising in more detail the necessary condition of local univadability (eq. 9) in terms of the selection gradient function s(u), which allows removing the considerations of mutational effect η.

Local uninvadability for scalar-valued traits
Let us first consider the case of scalar quantitative traits, where the trait of each individual is an element belonging to a bounded subset U ⊂ R of a real line. That is, the resident and mutant traits stay within the feasibility bounds u min ≤ u, u m ≤ u max ]). For this case, the index t in eq. (7) can be dropped and one obtains the standard selection gradient on a scalar-valued trait for the homogeneous island population: which is the standard selection gradient for scalar quantitative traits in the homogeneous island population (Taylor and Frank, 1996;Frank, 1998;Roze andRousset, 2003, 2004;Rousset, 2004;Lehmann and Rousset, 2014;Van Cleve, 2015).
Note that for u = u min an admissible phenotypic deviation η must be non-negative η ≥ 0 and for u = u max it must be non-positive η ≤ 0 while for u min ≤ u ≤ u max the deviation η is unrestricted.
Substituting this into the first-order condition for uninvadability eq. (9) yields that the necessary condition for uninvadability for scalar bounded traits can be expressed in the following form Note that if the set of admissible traits is unbounded (i.e. U = R), then the first-order necessary condition for local uninvadability is given by the second line of eq. (11).

Local uninvadability for function-valued traits
Let us return to the general case where the trait of each individual is an element of U[T ], being either a vector (T discrete) or a (bounded and piece-wise continuous) function (T continuous). More precisely, for all t ∈ T the resident and mutant traits stay within the feasibility bounds u such that u min = {u min (t)} t∈T and u max = {u max (t)} t∈T . Now, an admissible deviation η = {η(t)} t∈T must satisfy for all t ∈ T similar conditions as given for the scalar-traits in Section (2.3.1), that is, for u(t) = u min (t) an admissible phenotypic deviation η must be non-negative η(t) ≥ 0 and for u(t) = u max (t) it must be non-positive η(t) ≤ 0 while for u min (t) ≤ u(t) ≤ u max (t) the deviation η(t) is unrestricted.
Substituting the admissible deviations into eq. (9) yields that a candidate uninvadable strategy u * = {u * (t)} t∈T ∈ U[T ] satisfies for all t ∈ T : which is thus equivalent to eq. (11) in a point-wise way.
3 From the selection gradient to candidate optimal controls The point-wise description of the candidate uninvadable trait u * given by eq. (12) is unlikely to be directly useful in solving for u * in concrete applications because factors characterising the organism and its environment change over time (e.g. organisms can grow and the resources in the environment can get depleted). Hence, solving u * from eq. (12) would entail simultaneously solving a large number of equations, while tracking the changes in the relevant time-dependent factors. A more useful characterisation of u * can be achieved with the use of the mathematical framework of optimal control theory, most notably dynamic programming and Pontryagin's maximum principle, both of which have been used abundantly in evolutionary biology (e.g., León, 1976;Iwasa and Roughgarden, 1984;Mangel et al., 1988;Houston et al., 1999;Stearns, 1992;Perrin, 1992;Koz lowski, 1992;Taylor, 1997, 2000;Cichon and Kozlowski, 2000;Irie and Iwasa, 2005;Parvinen et al., 2006Parvinen et al., , 2013Priklopil et al., 2015;Metz et al., 2016;Avila et al., 2019). Yet these previous models often do not discuss and integrate the various modes of trait expression, in particular the open-loop and closed-loop strategies (see e.g. p. 225-226 in Basar and Olsder, 1999 for a discussion in the game theory literature). In this section, we connect our analysis to optimal control theory and in so doing, we develop an integrative approach to the functional dependence of trait evolution.

Concept of control and state variables
For space reasons, we focus on a continuous time formulation (but parallel developments apply to discrete time), and assume that a demographic time period is characterised by the time interval T = [0, t f ] during which the trait expression is observed. This time interval can be thought of as the length of the reproductive season or the time during which behavioural interaction occur between individuals, which eventually leads to reproduction. More specifically, we now assume that the fitness of the focal individual can be written in the form where f (u • (t), x • (t)) is the rate of increase of individual fitness at time t and Φ(x • (t f )) is the contribution to individual fitness at the final time t = t f (formally f : U 3 × R 3 → R + and Φ : R 3 → R + ). Here, collect, respectively, the trait expression levels u • (t), u • (t), and u(t) at time t of the focal individual, that of an average neighbour, and an average individual from the population, and the state variables , and x(t) of these respective individuals. State variables describe some measurable condition of an individual, e.g. size, stored energy, hunting skill, or that of its environment, e.g. amount of resource patches, environmental toxicity (note that the "•" in the subscript of u • and x • emphasises that these controls and state variables are those of all actors on a focal recipient). The defining feature of a state variable in our model is that its time dynamics depends on the evolving trait of one or more individuals in interaction and we will henceforth from now on call the elements of u • (t) the control variables, which is customary for these type of models in the evolutionary literature (e.g., Perrin, 1992;Day and Taylor, 2000).
Because models with both control and state variables become rapidly notationally complex, we assume that both the controls and the state variables are one-dimensional real numbers. The state of every individual is assumed to change according to the function g : with initial condition ("i.c.") x • (0) = x init and which is the rate of change of the state of a focal individual with control u • (t) in state x • (t), when its neighbours have average control u • (t) and average state x • (t) in a population where the average control (in other groups) is u(t) and the average state is x(t). Similarly, we can also express the rate of change of the state of an average neighbour of the focal and an average individual in the rest of the population, respectively, as where the vectors collect the (average) controls and states of actors on the state variables of an average neighbour of the focal individual (first line), and on an average individual in the population (second line), respectively (here and throughout all vectors are defined by default as being column vectors). These actors are thus second-order level actors on the focal recipient since they affect the state variables of actors affecting the focal's fitness. Note that the subscript of the control vectors (u • (t) and x • (t)) and state vectors (u(t) and x(t)) emphasise the individual (actor) from who's perspective the second-order actors' control and variables are collected. Accordingly, the vectors in eq. (17) contain elements which are, for an average neighbour of the focal, the control and state expressions of average neighbours viewed as actors on the focal individual. While we have so far explicitly distinguished between the states of different individuals, which is required if state represents some property of individual's condition (e.g.
body size or individual knowledge), nothing prevents the individual state to represent some environmental condition common to the group or population and which can be influenced by individual behaviour (e.g. local amount of resources in the group, in which case x • (t) = x • (t), see concrete example in section 4).
Note that while tracking the dynamics of three state variables (eqs. 15-18) may appear complicated, it is much simpler than tracking the state of all individuals in a group separately (which would require as many equations as there are individuals in the group and is the approach taken in Taylor, 1997, 2000).
We now make a couple of remarks about the properties of the fitness function w(u • , u • , u) (eq. 13) and its dynamic constraints (eqs. 15-16), which is a special case of a fitness function w(u • , u • , u) considered in section 2. The fitness function (13) depends on the full trajectories of the control u • = {u • (t)} t∈T and state x • = {x • (t)} t∈T variables, but since the state variables are fully determined by the controls (by way of eqs. 15-16) and the initial condition x init (which we assume here to be fixed), then fitness is determined by the controls. In particular, if fitness depends only on the state of the system at the , then fitness still depends critically on the control variables. We assumed in section 2 that the fitness w(u • , u • , u) is Gâteaux differentiable (eqs. 2 and 3), which means here that functions f , Φ and g are smooth enough with respect to its arguments (see e.g. section 3 of Liberzon, 2011 for textbook treatment of assumptions and Clarke, 1976 for minimal assumptions needed).
We finally note that in the homogeneous island population, individual fitness may depend in a non-linear way on various vital rates (e.g, Roze and Rousset, 2003, eq. 35, Akçay and Van Cleve, 2012, eq. A12, Van Cleve, 2015, eq. 38, Mullon et al., 2016, eq. box 1a), which themselves may depend on integrals depending on the control schedules of the individuals in interaction. Such situations can be analysed either by defining state variables whose integrated values represent the integral, and are covered by (13), or by noting that to the first-order, functions of integrals can be replaced by integrals of first-order Taylor series of fitness and hence the f (u • (t), x • (t)) fitness component in eq. (13) may be evaluated as a first-order Taylor expansion of fitness in its vital rates (e.g., Van Cleve, 2015, eq. 39).

Concept of neutral reproductive and shadow value
A central role in our analysis will be played by the (neutral) reproductive value of an individual at time t in a resident population, which gives the total contribution to fitness from time t onward of a (recipient) individual when the current state variables of the actors on its fitness is x(t). The argument u has been separated with the semicolon in order to emphasise that the reproductive value is evaluated assuming a fixed control trajectory where u is treated as a parameter. Hence, the reproductive value is formally a function of current time t and state show that the reproductive value satisfies the following partial differential equation where 1 = (1, 1, 1) and the vector is the gradient of the reproductive value with respect to the changes in the state variables of each individual affecting the focal's fitness (and associated with fixed resident control path u). Note that we use the vector x • (t) as an argument of v when expressing the partial derivatives to emphasise who's actor state we are varying, even though we are here considering only a resident population where all actors on a focal recipient have the same controls. The "-" sign on the left-hand-side of eq. (20) indicates that the reproductive value of an individual is growing when looking backwards in time. Hence, it grows according to the current rate f (u(t), x(t)) of fitness increase and the sum 1 · λ(t, x(t); u) of the effects of the current state change of each type of actor on the future fitness of the focal individual, weighted by the change g(u(t), x(t)) of state of the actors that are all the same in a resident population. The elements of the gradient λ(t, x(t); u) are called the shadow values of the states in the optimal control literature (see e.g. Dorfman, 1969;Caputo and Caputo, 2005), since by changing state, there is no immediate effect on fitness, but only future fitness effects.

Concept of open and closed-loop controls
Because the internal and external conditions of organisms vary, trait expression can evolve to be functionally dependent on these conditions (Sibly and McFarland, 1976;McFarland, 1977;McFarland and Houston, 1981;Houston et al., 1999). Hence, trait expression may not only depend on time but also on state variables. Focusing on the resident trait u(t), we can conceptualise trait expression in (at least) two different ways that are relevant to evolutionary biology. Namely, where the function d : T × R 3 → U is the trait expression rule (or decision rule for short) in the so-called closed-loop (or feedback or Markovian) form of the control variable (Basar andOlsder, 1999, p. 221, Dockner et al., 2000, p. 59). A closed-loop (feedback) control can be thought of as a contingency plan, which specifies a trait expression rule according to both the state and age (or time during interaction phase T ) of the individual. An open-loop control and can be thought of as an entirely fixed course of trait expression from birth to death of an individual (trait expression happens "by the clock"). Note that closed-loop (feedback) control is the general characterisation that subsumes open-loop control.

The first-order condition in terms of dynamic constraints
Let us now evaluate the point-wise fitness effects of Hamilton's marginal rule (7) by substituting the fitness function eq. (13) into eq. (8) and taking the derivative with respect to u • (t) and u • (t). Calculations displayed in Appendix B (in particular eqs. B.16-B.28) then show that the direct effect is state change effect on future fitness (23) and the indirect effect is The derivatives are evaluated at In order to obtain a full characterisation of the first-order condition taking into account the dynamic constraints brought by the shadow value, it is useful to introduce the Hamiltonian function This can be thought of as the contribution to individual fitness of all current "activities" (Dorfman, 1969, p. 822); namely, the (phenotypic) expressions u • (t) of all individuals currently in state x • (t) at t, holding everything else fixed in a resident population. It is thus the sum of the current rate of fitness ) and the changes in current states g(u • (t), x • (t)) resulting from the activities weighted by λ(t, x(t); u) evaluated in the resident population, since the shadow values do not directly depend on the activities at time t. Our next result (proved in Appendices B.1 and B.2) establishes the necessary condition for uninvadability for closed-loop control paths as follows.
x * (t)) and shadow value λ * (t, x * (t)) = λ(t, x * (t); u * ). The candidate uninvadable control path d * (x * ) has to necessarily satisfy eq. (12), where the point-wise selection coefficient s(t, u * ) can be written for all t ∈ T as and the shadow value function λ Hence, all quantities are evaluated on the resident control u * = d * (x * ) and state x * paths.
We now emphasise two points about this result. First, the dynamic constraints entail solving forward in time eq. (27), which is an ODE (ordinary differential equation), and solving backwards in time eq. (28), which is a PDE (partial differential equation). Thus, the Hamiltonian can be thought as the growth rate of the reproductive value (when looking backwards in time). For a reader familiar with the dynamic programming literature, the reproductive value v * (t, x * (t)) is not the so-called value function of the model and hence eq. (20) (even when evaluated along the candidate uninvadable control path u • = u * ) is not the eponymous Hamilton-Jacobi-Bellman equation (e.g., Bryson and Ho, 1975;Kamien and Schwartz, 2012;Basar and Olsder, 1999;Dockner et al., 2000;Liberzon, 2011;Weber, 2011). This means that the above result says nothing about the sufficiency of uninvadability, like any standard first-order selection analysis. In this regard, our result provides a weaker, yet simpler and novel condition to characterise closed-loop controls compared to the standard approach in the optimal control theory literature.
Second, by substituting eq. (25) into (26) yields that any interior candidate uninvadable strategy satisfying s(t, u * ) = 0 (recall 12) must satisfy This fundamental balance condition says that the current fitness effect (on the focal individual) is traded-off (hence the negative sign) by the state-modulated future fitness effect resulting from the change in state variables. This trade-off is instrumental in allowing to characterise the candidate uninvadable control u * (t) = d * (t, x * (t)), which can be typically done in two steps. The first step is to determine d * (t, x * (t)) satisfying (29), while treating the system state x * (t) and its shadow value λ * (t, x * (t)) as parameters, yielding the implicit expression in terms of some function D (that satisfies eq. 29). Essentially, this step is akin to solving a static (onedimensional) first-order condition (and which can in principle also be used whenever s(t, u * ) = 0, Dockner et al., 2000, p. 97). We will refer to this first-step characterisation as the static characterisation, since it allows to characterise the general nature of the solution in terms of x * (t) and λ * (t, x * (t)) independently of their explicit values. The second step entails solving for the trajectories of x * (t) and v * (t, x * (t)) generated by eqs. (27) and (28) under eq. (30) and then taking the gradient of ∇v * (t, x * (t)) to obtain λ * (t, x * (t)). Finally, after solving for trajectories x * (t) and λ * (t, x * (t)) we can explicitly characterise the candidate uninvadable control by substituting the these solutions into eq. (30). Solving eq. (28) for v * (t, x * (t)) is the main technical challenge in finding the candidate uninvadable traits.
It is also often the case in biological models that the Hamiltonian is affine in the control variables so that fitness depends linearly on the evolving traits (e.g. Macevicz and Oster, 1976;Perrin et al., 1993;Irie and Iwasa, 2005;Avila et al., 2019). In such cases, controls do not appear in the selection gradient (26) and hence, one can not directly determine from it the static characterisation (30). These types of controls are known to be singular arcs (see Kelley, 1964;Kopp and Moyer, 1965;Goh, 1966 for classic developments and see e.g. Sethi and Thompson, 2006;Bryson and Ho, 1975 for textbook treatments). In order to characterise the candidate uninvadable singular arc, we can take the total time derivative of the selection gradient s(t, u * ), which (potentially) provides an additional algebraic equation in the variables (u * , x * , λ * ) that can contain the control(s) with a non-zero coefficient. In case it does not, another time derivative can be taken until expression for u * can be obtained. Hence, for singular arcs, we can obtain the static characterisation (30) by applying until u * can be obtained. Note that for a candidate uninvadable control to be a singular arc, eq. (31) has to hold for a finite interval. If eq. (31) does not hold over a finite interval, then u * is known to be a bang-bang control (see e.g. Sethi and Thompson, 2006;Bryson and Ho, 1975), meaning that u * takes the values only on its boundaries (u * (t) = u max (t) or u * (t) = u min (t), owing to eq. 12).

Shadow value dynamics and state feedback in a resident population
From the static (first-step) characterisation of d * (t, x * (t)) (eq. 30), we observe that the candidate uninvadable trait is at most a function of λ * (t, x * (t)), but does not directly depend on the reproductive value v * itself. Furthermore, taking the partial derivative of eq. (28) with respect to x * (t) and using the definition of the Hamiltonian (25) and re-arranging (see sections B.1.2 and B.3 in Appendix) yields . The dynamics of the shadow value, given by eq. (32), may at first glance appear to be an ODE (and therefore easier to solve than eq. 28, which is clearly a PDE for the reproductive value v * ). This may lead one to hope that it is possible to circumvent from explicitly determining v * , by simply solving eq. (32) to directly obtain λ * . But this hope is crushed by the trait dependence on state, which entails that eq. (32) depends on the derivatives of the elements of d(t, x • (t)) with respect to state, which in turn depends on higher-order derivatives of v(t, x * (t); u * ). This can be seen by using eq. (30), whereby which unveils that eq. (32) is actually not an ODE. This means that in general it is not possible to determine the candidate uninvadable trait from using eq. (32), which has been repeatedly stressed in optimal control theory (e.g. Starr and Ho, 1969b,a;Başar, 1977). However, the analysis of the components of eq. (32) has less been stressed, but turns out to be informative in highlighting the main similarities and differences between selection on closed-loop and open-loop controls.
Lets now decompose eq. (32) for the component λ * . This says that the rate of change of the shadow value is given by a direct effect of state change on the Hamiltonian (current fitness effect and statemodulated fitness effect) and a feedback effect on the Hamiltonian, which arises since closed-loop traits react to changes in the state. Using the expression for the Hamiltonian in eq. (25), the expressions for direct and indirect effects in eq. (23)-(24), and noting that ∂d(t, x(t))/∂x • (t) = 0, the trait feedback effect can be further expanded as where the derivatives ∂d(t, x • (t))/∂x • (t) and ∂d(t, x • (t))/∂x • (t) give the trait sensitivities of the focal individual and its average neighbour, respectively, at time t to changes in focal's state variable x • (t).
Hence, the feedback effect of state change is equal to the trait sensitivities of all individuals in the group weighted by their effects on the focal's fitness (the latter are effectively the direct and indirect fitness effects).
We now make three observations about eqs. if the trait sensitivity causes the shadow values to be higher (for negative feedback term) or lower (for positive feedback term), assuming that the sign is the same throughout the interaction period. This means that the sign of feedback effect determines how trait sensitivity affects the trade-off between current and future (state-modulated) fitness effects (by way of eq. 29). Third, the shadow value dynamics given by eqs. (34)-(35) is different from that in classical results from dynamic game theory (first developed by Starr and Ho, 1969a,b), where the feedback effect through the focal's own trait variation does not appear due to the absence of interactions between relatives, whereby −c(t, u * ) = 0 at s(t, u * ) = 0. By contrast, in our model with interactions between relatives one has −c(t, u * ) + r(u * )b(t, u * ) = 0 at s(t, u * ) = 0.
Thus, we recover the classical result for the feedback effect from dynamic game theory when r(u * ) = 0.
We now consider three scenarios (which are relevant for biology) under which the feedback term (given by eq. 35) that describes the dynamics of λ * • (t, x * (t)) vanishes (similar arguments also hold for the feedback term λ * • (t, x * (t)) and recall that we do not need to consider the dynamics λ(t) here, because λ * (t, x * (t)) does not affect the selection gradient). That is, we consider scenarios for which eq. (32) is a system of ODE's (for components λ * • (t, x * (t)) and λ • ) and therefore solving a PDE (28) for is not necessary to determine the candidate uninvadable trait. These three scenarios are as follows. (iii) In a population of clonal groups (r(u * ) = 1) that share a common state variable (x • (t) = x • (t), e.g. common resource in the group). Two observations can be made for this scenario. First, from eq. (26) it follows that −c(t, u * ) + b(t, u * ) = 0 for clones at s(t, u * ) = 0. Second, since

First-order condition for open-loop controls
We now focus specifically on open-loop controls by pointing out the simplifications in our analysis that arise when As we showed in the previous section, under eq. (36) the state-feedback term eq. (35) for shadow value dynamics λ * • (t, x * (t)) vanishes (similarly it vanishes also for λ * • (t, x * (t)) and λ * (t, x * (t))), which implies that eq. (32)  The candidate uninvadable control path u * = d * has to necessarily satisfy eq. (12), where the point-wise selection coefficient s(t, u * ) on a control component u * (t) = d * (t) can be written for all t ∈ T as and the shadow values satisfy This result is Pontryagin's weak principle for interactions between relatives (since only small mutant deviations are considered, Speyer and Jacobson, 2010, p. 74) and only requires consideration of the shadow value. It has been derived previously Taylor (1997, 2000), for a slightly less general model, where individuals locally play the field or interact in a pairwise way (see also Day and Taylor, 1998;Wild, 2011 for related work). We here re-derived this result as a corollary of the closed-loop result of the previous section when the feedback-terms describing the dynamics of the shadow value λ(t) vanish (i.e, eq. 35 vanishes). Hence, we closed the loop between the selection gradient on function-valued traits, invasion implies substitution, Hamilton's rule, dynamic programming, and Pontryagin's (weak) maximum principle.
4 Example: Common pool resource production

Biological scenario
We here present an application of our model to production of common pool resource that returns fitness benefits to all group members but is produced at a personal fitness cost. The evolving trait we consider is the schedule u = {u(t)} t∈T of effort invested into resource production during an interaction period T ∈ [0, t f ] and we will hereinafter refer to the control u(t) as the production effort at time t. Let We study the evolution of the production effort under two different scenarios: (i) individuals can adjust their production effort according to the (local) resource level x c (t) (closed-loop control) and (ii) individuals are committed to a fixed schedule of production effort (open-loop control). One difficulty in analysing the evolution of such traits is that limited genetic mixing generates relatedness between group members but also competition between them, which leads to kin competition (e.g., Taylor, 1988;Frank, 1998;Rousset, 2004;Van Cleve, 2015). Since we want to highlights the key effects of the evolution of open-loop and closed-loop controls in the context of interactions between relatives in a simple way, we want to avoid the complexities brought by kin competition and thus assume implicitly a life cycle that entails no kin competition. In particular, we assume that individual fitness takes the form which depends on the resource level x c (t f ) in the group at the end of interaction period t f and on the accumulated (personal) cost of producing the common resource for the group (the second term in eq. 40).
Here c effort is a parameter scaling the personal cost. The resource level x c (t f ) and c effort are measured in units of the number of offspring to the focal individual and scaled such that they inherently take into account the proportional effect of density-dependent competition (proportional scaling of fitness does not affect the direction of selection). Neglecting the effects of kin competition in eq. (40) does not lead to any loss of generality in our forthcoming analysis, since taking kin competition into account would only affect the final results by re-scaling the value of relatedness (e.g., Van Cleve, 2015).
We assume that the rate of increase of the resource level in a group depends on the total amount of production effort that individuals in the group invest into producing it and that the return from this effort decreases exponentially with the current level of resource where the parameter a > 0 is the efficiency of producing the common resource.

Static characterisation of the production effort
We can express the reproductive value (19) as where the state variable is only one-dimensional (since fitness depends only on resources produced in the group x c (t), which is here evaluated at resident strategy, i.e. x c (t) = x(t)). We will denote the corresponding shadow value (common to all actors) with λ c (t, which gives the effect on fitness of changing the state variable in the group evaluated at the resident population. Using eq. (40), the fitness components f and Φ (as defined in eq. 13) take the form while the rate of change of the state variable x c (t) is given by On substituting eqs. (43) and (44) into the Hamiltonian function eq. (25) produces Since reproduction is assumed to take place after the interaction period [0, t f ] and producing resources is costly, then the current fitness contribution f (the first term in eq. 45) is negative for all t ∈ [0, t f ].
Hence, the effect on the current state change weighted by the shadow value g c λ c (the second term in eq. 45) has to be positive.
In terms of eq. (45), the direct fitness effect is while the indirect fitness effect is where λ * c (t, x * (t)) = λ c (t, x * (t); u * ) from which we obtain the fundamental balance equation (eq. 29) for this model: This says that the net effect of accumulation of personal cost due to spending effort to produce a unit resource must balance out the inclusive fitness benefit associated with the unit resource. Solving eq. (48) for u * (t) yields where γ = a(1 + (N − 1)r(u * )) 2c effort scales the benefit to cost ratio of producing the resource, note also that γ > 0. Eq. (49) says that (candidate uninvadable) production effort u * (t) decreases exponentially with the resource level x * (t), increases linearly with the shadow value and relatedness, and is not directly dependent on time. This general nature of the solution applies to both open-loop and closed-loop controls and is depicted in Fig. 1.

Closed-loop production effort
We now turn to analyse u * (t) explicitly as a function of time, which requires to evaluate x * (t) and v * (t, x * (t)) = v(t, x * (t); u * ). To that end, we evaluate the dynamic eq. (41) for x c (t) along u • = u * and x c = x * and substituting the expression for u * (t) from eq. (49) therein yields Substituting the Hamiltonian (45) for our example into eq. (28) and substituting the right-hand-side of eq. (49) to express u * (t) and simplify yields where and note that c 1 > 0. Using the method of characteristics, Ewald et al. (2007) have showed that the partial differential equation (52) has the following solution v * (t, x * (t)) = log (our eq. 52 corresponds to eq. (21) of Ewald et al., 2007 where c 1 = 3/2k and their solution is presented on page 1459 of their paper, where c 1 = c = 3/2k). Taking the derivative with respect to x * (t) and upon simplifying yields the expression for the shadow value λ * c (t, x * (t)) = 2 exp(x * (t)) 8c 1 (t f − t) + exp(2x * (t)) + exp(x * (t)) .
Substituting this into the static characterisation eq. (49) shows that where the state variables is the solution of which was obtained by substituting the shadow value from eq. (55) into the dynamic eq. (51) of resource level x * (t) and for which we were not able to find a closed form expression.

Open-loop production effort
We now derive an explicit expression for the candidate uninvadable open-loop trait u * (t) = d * (t). Substituting the Hamiltonian (45) for our example into eq. (39) and using the right-hand-side of eq. (49) to express u * (t), yields the dynamic equation of the shadow value λ * c (t, x * (t)), and with eq. (51) for the dynamic equation of resource level x * (t) we arrive at the following two-point boundary value system This system has one real-valued solution and substituting this solution back into the expression for static characterisation eq. (49), we obtain the explicit expression for the candidate uninvadable open-loop trait, which turns out to be constant in time.

Comparison of closed-loop and open-loop production efforts
With this example, we recover the result that in a population of clonal groups (r(u * ) = 1) closed-loop and open-loop equilibria coincide (Figure 2). In a population of non-clonal groups (r(u * ) < 1) the production effort u * (t) and the resulting amount of resource x * (t) tend to be lower under closed-loop candidate uninvadable equilibrium (hereinafter, we simply write "equilibrium") than under open-loop equilibrium ( Figure 2). Overall, the production effort monotonically increases over time for the closed-loop control and stays constant under the open-loop control (Figure 2).
The difference between closed-loop and open-loop (non-clonal) equilibria arises from the difference in the dynamic constraint on the shadow value (recall section 3.2.2). We find that the shadow value is lower under closed-loop control than under open-loop control when r(u * ) < 1 (Fig. 3). This is so, because of the feedback effect of state change (eq. 35), which for our example is Since, this is negative, the shadow value declines faster backwards in time than under closed-loop equilibrium. In order to understand why the feedback effect is negative, we need to consider the signs of b(t, u * ) − c(t, u * ) and the trait sensitivity ∂d(t, x c (t))/∂x c (t) (which is here same for all group members).
The term b(t, u * ) − c(t, u * ) is positive when groups are non-clonal and zero when they are clonal (Fig. 4, panel a). This means that if everyone in the group produces more of the resource, then the focal's fitness increases under non-clonal equilibrium and is unaffected under the clonal equilibrium. The trait sensitivity is always negative (Fig. 4, panel b). Hence, individuals will reduce their production effort in response to an increase in the resource level and the magnitude of this effect is larger for higher values of relatedness.
In conclusion, investment effort is lower for closed- uninvadability and allows to characterise long-term evolutionary outcomes. While these three features are well known to hold for scalar traits (e.g., Rousset, 2004;Lehmann and Rousset, 2014;Van Cleve, 2015), our derivation of Hamilton's marginal rule for multidimensional traits generalises them to traits of arbitrary complexity and establishes a link between selection on scalar and function-valued quantitative traits.
We also showed how selection on dynamic function-valued traits (i.e. traits that are time-dependent and are potentially subject to dynamic state constraints) can be analysed using optimal control theory.
In particular, we analysed selection on both open-loop controls, whose expression is only time-dependent, and closed-loop controls, whose expression is also state-dependent.  We also worked out an example of common pool resource production to illustrate these concepts,  McNamara et al., 1991;Venner et al., 2006). Then, the uninvadable closed-loop strategy convergences to a stationary strategy d * (x * ) and the PDE for the value function becomes an ODE, which is easier to solve (see e.g. Dockner et al., 2000, p. 97). This could be applied for modelling reaction norms to (stationary) environmental factors (e.g. salinity, altitude, temperature).
Another special case included in our analysis is that of constant controls, where trait expression depends neither on state nor time, but nevertheless affects the dynamics of some state variable that in turn affects fitness, in which case the value function becomes a constant. Several concrete biological situations fall into this category. For instance, neural networks are dynamical systems whose output is controlled by a finite number of scalar weights (Haykin, 2009), the selection on which is encapsulated by our formalisation if weights are taken to be traits evolving genetically and thus treated as controls (see Ezoe and Iwasa, 1997 for an application to evolutionary biology). Likewise, phenomena as different as gene expression profiles and learning during individual's lifespan can be regarded as outcomes of dynamical systems controlled by a finite number of constant traits (e.g. see respectively Alon, 2020 andDridi andAkçay, 2018).
Concerning limitations, we modelled a population reproducing in discrete time, where within each time period individuals can interact for a fixed time interval [0, t f ]. As such, the vital rates of individuals can change over this interaction period, but not between interaction periods. Hence, our model with limited dispersal and time-dependent vital rates (during [0, t f ]) applies to semelparous species, which covers models with conflict between relatives of annual organisms (Day and Taylor, 2000;Avila et al., 2019).
Further, if we allow for complete dispersal between groups (r(u) = 0), then our framework can be used to address the evolution of function-valued traits under overlapping generations with time-dependent vital rates as in continuous time classical life history models with and without social interactions (e.g. León, 1976;Schaffer, 1983;Stearns, 1992;Perrin, 1992), but we add the possibility of considering the evolution of closed-loop controls. This scenario is encapsulated in our formalisation because the individual fitness function we used to analyse dynamic trait evolution (eq. 13) takes the same functional form as the basic reproductive number in age-structured populations (and which is sign equivalent to the Malthusian or geometric growth rate, e.g., Karlin and Taylor, 1981, p. 423-424). As such, our results on closedloop controls (section 3.2) allow to characterise long-term evolutionary outcomes when the fitness of an individual takes the form of the basic reproductive number. For this situation, our results for openloop controls (section 3.3) reduce to the standard Pontryagin's weak principle used in life-history models (e.g. Schaffer, 1983;Stearns, 1992;Perrin, 1992). In order to cover time-dependent vital rates with overlapping generations within groups under limited dispersal, one needs to track the within-group age structure (e.g. Ronce et al., 2000), which calls for an extension of our formalisation. Finally, we did not consider between-generation fluctuations in environmental conditions, which certainly affect the evolution of function-valued traits and it would be interesting to investigate this case. Hence, while our results are not demographically general, our hope is that the present formalisation is nevertheless helpful in providing broad intuition about the nature of directional selection on function-valued traits.

Appendix A: Derivation of Hamilton's rule for function-valued traits
In this Appendix, we prove the gradient version of Hamilton's rule for function-valued traits and show that this provides an invasion implies substitution principle under weak selection (eqs. 4-8). A central concept used in our proof is the notion of a Gâteaux derivative.
A.1 Gâteaux derivative and point-wise functional derivative is a vector space over a domain T and assume that for some y ∈ Y[T ] the limit exists for all deviations η that satisfy y + η ∈ Y[T ] for a sufficiently small non-negative parameter .
Then, the function F is said to be Gâteaux differentiable at y, and δ η F (y)/δ η y is the shorthand notation for a Gâteaux derivative at y in the direction of η (Hille and Phillips, 1957, Section 3). The Gâteaux derivative can thus be thought of as a generalization of the directional derivative familiar from finite dimensional spaces. Most rules that hold for ordinary derivatives also hold for Gâteaux derivatives, e.g.
Taylor's theorem and chain rule (e.g. see Section 2.1C in Berger, 1977, or, Appendix A of Engel andDreizler, 2013). The Gâteaux derivative can be expressed in terms of point variations (e.g. see Engel and Dreizler, 2013, eq. A.15 and eq. A.28) as is the point-wise functional derivative of F at y(t) and δ t is the Dirac measure taking value 1 at t and otherwise it is 0. That is, eq. (A.3) is the partial derivative of F with respect to x at t and hence we use the more familiar 'partial derivative' notation from finite dimensional spaces. The representation in eqs. (A.2)-(A.3) is useful because it allows, for instance, to take a functional derivative of fitness with respect to the trait, and partition it into a deviation η(t) and a marginal fitness effect at a specific (single) time point t ∈ T , ∂F (y) ∂y(t) (i.e. a point-wise marginal fitness effect), and only then integrate over the domain T .

A.2 Dynamics of mutant-frequency
Consider that the mutant allele coding for trait u m and the resident coding for trait u, segregate in the homogeneous island population as described in the main text. Because no individual-level demographic heterogeneity is assumed withing groups (i.e., no class structure), each group can be characterised, from a population genetic state perspective, by the number of mutants that inhabit the group and we denote the set of all group genetic states with I = {0, 1, 2, . . . , N }. The state of the entire homogeneous island population can thus be described with the vector φ τ = {φ i,τ } i∈I where φ i,τ is the frequency of groups with i mutants at demographic time τ . Since population size is constant in the homogeneous island population (mean fitness is one), the change in the average frequency ∆p τ = p τ +1 − p τ of the mutant allele from demographic time τ to τ + 1 (over one life-cycle iteration) can be expressed as where W (u m , u, φ τ ) is the marginal fitness (or lineage fitness) of the mutant allele. Namely, this is the expected number of offspring (including the surviving self) produced by a randomly sampled mutant individual from the collection of all mutants in the population when the distribution of mutants and resident across groups is φ τ . This fitness can be written as the average where q i,τ = iφ i,τ / k∈I kφ k,τ is the probability that a randomly sampled mutant resides in a group with i mutants (whence i q i,τ = 1) and where u i = (u m , u, i − 1) and u φ = (u m , u, φ τ ) are vectors that describe, from the perspective of an individual sampled in a group with i mutants, the distribution of traits among group neighbours (local individuals) and in the groups in the population at large (non-local individuals), respectively. The functionŵ : U × U 2 × I × U 2 × ∆(I) → R + is the individual fitness where ∆(I) denotes the space of frequency distributions on I (i.e. the simplex in R N +1 ), and as such,

A.3 Weak-selection approximation
We now study mutant gene frequency change ∆p τ assuming small . To that end, it is useful to note that the fitness of a mutant in a group in state i can be approximated by writing it in terms of average traits aŝ specifies that all non-local individuals have the same average population traitū = u + ηp τ with p τ = i∈I (i/N )q i,τ being the mutant frequency in the population. Eq. (A.6), which has been used for scalar traits (Rousset, 2004, p. 95), follows by Taylor expandingŵ(u m , u i , u φ ) to the first-order about = 0 and using the chain rule (which applies to Gâteaux derivatives, e.g. eq. A.38 in Engel and Dreizler, 2013) to see that the coefficients of the Taylor series involve (at most) Gâteaux derivatives weighted by average allele frequencies. This is an instantiation of the so-called generalised law of mass action (Meszéna et al., 2005;Dercole, 2016) and is secured by the assumption that all individuals within a group that have the same trait are exchangeable (individuals are demographically homogeneous).
Because all non-local (mutant and resident) individuals are considered to have the same average trait (the same is true for group neighbours),ŵ(u m ,ū i ,ū φ ) is de facto independent of φ τ . This allows us to further simplify the right-hand side of eq. (A.6) by writinĝ where the function w : U 3 → R + is the ( where w (u m ,ū i ,ū)| =0 = 1 (fitness in a monomorphic population is one), whereby We now apply eq. (A.1) and use the chain rule for Gâteaux derivatives (see e.g. eq. A.38 in Engel and Dreizler, 2013), which produces where all partial derivatives here and henceforth are evaluated at the resident value u. Since all the partial derivatives are independent of any allele frequency, they give the effects on any individual's fitness stemming, respectively, from itself, its average neighbour, and an average population member by varying (infinitesimally) trait expression. Hence, the type of the actor is not relevant when evaluating the fitness effects and we can equivalently write eq. (A.11) as where we took into consideration that the sum of partial derivatives of the fitness function with respect to all of its arguments is zero (since population size is constant, see e.g. Rousset, 2004, p. 96 for scalar traits) and where we replaced the variables u m ,ū i , andū with u • , u • , and u (note that we have already substituted the resident trait into the final argument). This will be useful subsequently as it makes clear that fitness effects are independent of individual types and thus allows us to focus attention on the fitness of a focal individual.
Substituting eq. (A.12) into eq. (A.10) gives Because ∈I p i,τ q i,τ = p m|m,τ is the probability that, conditional on being a mutant, a randomly sam-pled neighbour is also a mutant, and p m|m,τ p τ = p mm,τ is the probability that two randomly sampled individuals are both mutants (i.e., frequency of mutant pairs), eq. (A.13) can be written Hence, to the first order in , the dynamics of ∆p τ is a function of only direct and indirect fitness effects evaluated in the resident population, and the average frequency p τ and mutant-pair frequency p mm,τ .
Further, we only need to study the dynamics of p mm,τ under neutrality ( = 0) because any higher order terms contribute to O( 2 ) in eq. (A.14). Eq. (A.14) thus generalises to function-valued traits, a standard result for scalar traits (first detailed in Roze and Rousset, 2003 and re-derived a number of times since, e.g., Roze and Rousset, 2004;Rousset, 2004;Roze and Rousset, 2008;Lehmann and Rousset, 2014).

A.3.2 Mutant-pair dynamics and relatedness
Using standard population genetic arguments for writing recursions of moments of allelic state (e.g., Jacquard, 1974;Nagylaki, 1992;Roze and Rousset, 2008), we have where P 1 (u) is the the fraction of pairs within groups (of two randomly sampled individuals in the same group without replacement) that descended from the same individual in the previous demographic time is the relatedness in a patch at the steady state, i.e. the fraction of pairs at the steady state that have a common ancestor in the patch. Owing to neutrality, this is also the probability that a randomly sampled neighbour of a randomly sampled focal individual, carries the same allele as the focal. More-over, the steady stater(u) changes continuously with a resident trait whenever P 1 (u) and P 2 (u) change continuously.

A.4 Timescale separation and the invasion implies substitution -principle
We can observe that the dynamics of mutant frequency p τ , given by eq. (A.14), is dominated by terms of order O( ), while the mutant-pair frequency p τ,mm , given by eq. (A.15), is dominated by terms of order O(1). Hence, when is small, the variable p mm,τ undergoes significant fluctuations over the demographic time step ∆τ = (τ + 1) − τ = 1 (one iteration of a life cycle) while p τ is (nearly) constant. By contrast, p τ changes significantly over a slower time interval ∆τ * = ∆τ while p mm,τ is near its equilibrium value.
We will refer to ∆τ * as the evolutionary time step and the phenotypic effect scales the relationship between evolutionary and demographic time (i.e. one evolutionary time step contains 1 demographic time steps, and equivalently we can write 1 ∆τ = 1 ∆τ * ). Combining eq. (A.14) and eq. (A.15), we see that the dynamics of the mutant frequency is thus fully described by the coupled system in demographic time and by a change of variables the system in eq. (A.18) can be equivalently expressed in slow evolutionary time as ∆ * p mm,τ * ∆τ * = (P 2 (u) − 1)p mm,τ * + P 1 (u)p τ * + (1 − P 1 ((u)) − P 2 (u))p 2 τ * + O( ).

(A.19)
We now separate the demographic and evolutionary timescales (i.e. the timescales of p τ,mm and p τ ) by letting → 0 and the two last systems above reduce, respectively, to 0 = (P 2 (u) − 1)p mm,τ * + P 1 (u)p τ * + (1 − P 1 ((u)) − P 2 (u))p 2 τ * . , while the mutant frequency p τ * = p changes (thus p is in a so-called quasi-steady state -it changes so slowly that it is considered a steady state in one timescale but a fluctuating variable in another). By performing the substitution p mm,τ * =p mm (u) and p τ * = p in eq. (A.21) the dynamics of mutant frequency in slow evolutionary time is where we used ∆τ = 1. This gives us the invasion implies substitution -principle on the time of the demographic process we began with (e.g., eq. A.4). Therefore, we can re-write eq. (A.23) as (A.25) and by using the definition of Gâteaux derivatives in eq. (A.1) we can explicitly write which is the effect a focal individual has on itself if it were to express the mutant phenotype and

Appendix B: First-order condition for state-dependent models
In this Appendix we derive the results of main text section 3. These derivation are based on standard approach of calculus of variations as used in optimal control theory (Liberzon, 2011;Weber, 2011), but our argument will somewhat differ from standard approaches insofar as we will not make use of the Hamilton-Jacobi-Bellman equation, since we are interested only in the necessary first-order conditions (as opposed to necessary conditions in the standard approach). As such, it is important to stress that throughout sections B.1 and B.3, where we derive the dynamics of the (neutral) reproductive value v(t, x(t); u) and the shadow value λ(t, x(t); u) = ∇v(t, x(t); u), we evaluate all the traits u • (t) = u • (t) = u(t) and states show that we only need to analyse the (neutral) reproductive value v(t, x(t); u).
For conciseness of notation, we also use the following short-hand notation: for total derivatives w.r.t.
time t we write df (t, x(t))/ dt ≡ḟ (t, x(t)), for partial derivatives we write ∂f (t, and second-order partial derivatives we write ∂ 2 f (t, x(t))/∂x(t)∂x(t) ≡ f xx (t, x(t)). As in the main text, we always use the gradient ∇ notation for gradient with respect to state variables x(t).

B.1 Reproductive value dynamics in a resident population
We here derive the dynamic equations for the reproductive value, eq. (20) of the main text, and an associated equation for the reproductive value that is useful for the other derivations.
B.1.1 Partial differential equation for the reproductive value function Recall from eq. (19) of the main text that the reproductive value at time t is defined as where the argument u has been separated with the semicolon in order to emphasise that the controls have been fixed. Hence, for a given u and initial condition x(t) at time t the state trajectory x is fully determined (i.e. the solution to the ODE in eq. (16) exists and is unique). Because both functions u and x are now given functions, the reproductive value in eq. (B.1) is considered to be a function of time t and the initial condition x(t) only (strictly speaking it should be a function also of the final time t f ).
In order to derive a dynamic equation of v(t, x(t); u), we consider a very small (but positive) time interval ∆t and write eq. (B.1) as where ∆x(t) = x(t + ∆t) − x(t) is the change in the state variables over ∆t and v(t + ∆t, x(t) + ∆x(t); u) is the reproductive value at t + ∆t and all arguments have been noted accordingly. Using a first-order Taylor expansion around t, we approximate the second term in the second line of eq. (B.2) as is the partial derivative with respect to the first-argument while is the vector of partial derivatives with respect to the last argument. Now approximating the first term on the right-hand-side of eq. (B. 2) by f (u(t), x(t), t)∆t, we can write eq. (B.2) as v(t, x(t); u) = f (u(t), x(t))∆t + v(t, x(t); u) Subtracting v(t, x(t); u) from both sides, dividing by ∆t, letting ∆t → 0, noting that ∆x(t)/∆t → g(u(t), x(t)) (as ∆t → 0), and rearranging leads to which is a PDE for v(t, x(t); u) with a final condition (f.c.) v(t f , x(t f ); u) = Φ(x(t f )) for the fixed control path u.
An analogous equation (in the context of life-history evolution) to eq. (B.7) has been previously derived for any resident population in Metz et al. (2016, see eq. 71), but where the traits were assumed to be open-loop controls. It is important to stress here that eq. (B.7) is not a form of the so-called Hamilton-Jacobi-Bellman equation for the value function evaluated on the optimal control path of optimal control theory (e.g., eq. 3.7 Dockner et al., 2000, chapter 3.2, or eq. 5.10 in Liberzon, 2011or eq. 3.16 in Weber, 2011, even though it has a similar structure. This is because (i) the reproductive value v is here defined to hold for any resident control schedule u (and is not evaluated at optimality like the value function), and (ii) the value function for our model cannot be computed from the reproductive value of the focal individual, but needs to be computed from the invasion fitness of the mutant, which is the value function in an evolutionary model (invasion fitness is given by eq. A.5 when the mutant becomes rare or eq. 38 in Day and Taylor, 2000, but in the latter case only open-loop traits were allowed).

B.1.2 Dynamic equation for shadow value (gradient of reproductive value)
Recall that the controls u(t) = d(t, x) are functions of x. We now derive the dynamic equation for the shadow value (gradient of reproductive values), which will be useful in later proofs. Taking the gradient of eq. (B.7) with respect to x(t), we have , (B.10) are (column) vectors. Bringing all the terms to the same side and using the chain in rule to expand where is the transpose operator, 0 = (0, 0, 0) is a zero (column) vector and is the Hessian matrix of the reproductive value function and is a gradient of vector g.
Now total differentiating ∇v(t, x(t); u) with respect to time and using the property that u is fixed along a path, we get which, on substitution into eq. (B.11), and noting that the order of taking partial derivatives can be changed yields which will be used in the next section.

B.2 First-order condition and the Hamiltonian
We now turn to deriving the (point-wise) direct effect −c(t, u(t)) and the indirect effect b(t, u(t)) , given by eqs. (23) and (24), as well as the point-wise selection gradient for closed-loop traits, eq. (26) and the dynamic equation for the shadow value, eq. (32).
In Appendix A we showed that we can express the direct effect (A.26) and indirect effect (A.27) as In order to compute these Gâteaux derivatives we first re-write the fitness function w(u • , u • , u) by augmenting to it a zero quantity containing of adjoint system of constraints (see e.g. Liberzon, 2011, p. 97) and we then we show how to decompose the direct effect −c η (u) and indirect effect b η (u) into point-wise direct effects −c(t, u(t)) and point-wise indirect effects b(t, u(t)), respectively, which allows to characterise the point-wise first-order condition (26).

B.2.1 Augmenting the fitness function with an adjoint system of constraints
Recall the individual fitness function eq. (13) of the main text and let us append to it a zero quantity where recalling (eq. 21 of the main text) that λ(t, x(t); u) = ∇v(t, x(t); u) is the shadow value (gradient of reproductive value function). We can integrate the last term in eq. (B.17) by parts and hence the eq. (B.17) becomes where the term λ(0, x(0); u) · x • (0) has disappeared under differentiation because it is a given initial condition, and where all the derivatives under the integrals are evaluated at u are point variations of x(t) caused by variations in u • (t) and u • (t), respectively.