Starting from:

$25

Stat4DS - HW02 - Solved

1. Background: Cooperative Games (more info)

1.1 Generalities
A game in the sense of game theory is an abstract mathematical model of a scenario in which some sort of agents interact. It is abstract in the sense that irrelevant detail is omitted: the game aims to capture only those features of the scenario that are relevant to the decisions that must be made by players within the game.

Agents can be anything depending on the application: typically real humans (e.g. investors, political parties, etc.), but also signals/towers in a communication network, genes in a biological setup or even variables in a predictive/learning problem.

The form of games we are interested in is the most basic and widely-studied model of cooperative games.

More specifically, our games are populated by a (non-empty) set P = {1,...,p} of agents: the players of the game. A coalition C is simply any subset of the player-set P. The grand coalition is the set P of all players.

All this said, let’s define what we mean by a characteristic function game. In the following, with 2P we will denote the power-set, that is, the set of all subsets, of P.

Definition 1. A characteristic function game G is given by a pair (P,ν), where P = {1,...,p} is a finite, non-empty set of agents, and ν : 2P 7→ R is a characteristic function, which maps each coalition C ⊆ P to a real number ν(C). The number ν(C) is usually referred to as the value of the coalition C.

There are many possible interpretations for the characteristic function, but note right away that characteristic function games assign values to a coalition as a whole, and not to its individual members. That is, the model behind a characteristic function game does not tell you how the coalition value ν(C) should be divided among the members of C.

In fact, the question of how to divide the coalition value is a fundamental research topic in cooperative game theory[1].

Notice also that, from a computational point of view, the naïve representation of a characteristic function game that consists in explicitly listing every coalition C together with the associated value ν(C), being of the order 2p in size, is not practical unless the number of players is very small. On the other hand, most real-life interactions admit an encoding of size polynomial in p; such an encoding provides an implicit description of the characteristic function. As an example, consider modeling the decision-making process in voting bodies.

Example 1. (weighted voting games)

•     A country has a 101 seats in its parliament, and each representative belongs to one of four parties: Liberal (L 40 seats), Moderate (M 22 seats), Conservative (C 30 seats), or Green (G 9).

•     The parliament needs to decide how to allocate “1 billion euros” of discretionary spending.

•     The decision is made by a simple majority vote, and we assume that all representatives vote along the party lines.

•     Parties can form coalitions; a coalition has value “1 billion euros” IF it can win the vote no matter what the other parties do, and value 0 otherwise.

Hence, after some thinking, we see that the associated 4-players game has P = {L,M,C,G} and characteristic function

ν(S) = (0 9 if #S6 1 or G ∈ S where #S = {cardinality of the coalition S}. 10 otherwise

1.2 Solution Concepts: the Shapley Value
A key problem in game theory is to try to understand what the outcomes of a game will be. In our cooperative framework, an outcome of a game G consists of two parts:

1.    a coalition structure, that is, a partition CS = {C1,...,Cm} of the player-set P = {1,...,p} into coalitions;

2.    a payoff vector x = (x1,...,xp) ∈ Rp for a coalition structure CS = {C1,...,Cm}, which distributes the value ν(Cj) of each coalition among its members. Any legit payoff vector x must satisfy the following natural conditions:

•     xj > 0 for all j ∈ P;

•     Pr∈Cj xr 6 ν(Cj) for any j ∈ {1,...,m}

Now, any partition of agents into coalitions and any payoff vector that respects this partition corresponds to a plausible outcome of a characteristic function game. However, not all outcomes are equally desirable.

For instance, if all agents contribute equally to the value of a coalition, a payoff vector that allocates the entire payoff to just one of the agents is less appealing than the one that shares the profits equally among all agents.

Similarly, an outcome that push all agents to work together is (typically) preferable to an outcome that some of the agents want to deviate from.

More broadly, one can evaluate outcomes according to two criteria: fairness (i.e., how well each agent’s payoff reflects his contribution), and stability (i.e., what are the incentives for the agents to stay in the coalition structure). These criteria give rise to two families of solution concepts having as most notable members the Shapley Value and the Core respectively.

Given its broad success for defining variable importance in supervised learning problems (see here, here, here, here, and also here, just to mention a few) in the following we will focus on the former, the Shapley Value, forged in the ’50s by the one and only, sir Lloyd S. Shapley.

The Shapley value is a solution concept that is usually formulated with respect to the grand coalition: it defines a way of distributing the value ν(P) that could be obtained by the grand coalition, and it is based on the intuition that the payment that each agent receives should be proportional to his contribution.

Idea v1.0: a naïve implementation of this idea would be to pay each agent according to how much he increases the value of the coalition of all other players when he joins it, i.e., set the payoff of player j to ν(P) − ν(P\{j}).

Problem: Under this payoff scheme the total payoff assigned to the agents may differ from the value of the grand coalition. Idea v2.0: we can fix an ordering of the agents and pay each agent according to how much he contributes to the coalition formed by his predecessors in this ordering: Agent 1 receives ν({1}), Agent 2 receives ν({1,2}) − ν({1}), and so on. It is easy to see that this payoff scheme distributes the value of the grand coalition among the agents.

Problem: agents that play symmetric roles in the game may receive different payoffs depending on their position in the order.

Shapley’s insight: the dependence on the agents ordering can be eliminated by averaging over all possible orderings (or permutations) of the players.

Now, to formally define the Shapley value, we need some additional notation.

1.    Fix a characteristic function game G = (P,ν) and let ΠP be the set of all permutations of P. Given a specific permutation π ∈ ΠP, we denote by Sπ(j) the set of all predecessors of player j in π, i.e., we set Sπ(j) = {r ∈ P such that π(r) < π(j)}.

For example, if P = {1,2,3}, then

          ,

moreover, if π = (3,1,2), then Sπ(3) = ∅, Sπ(1) = {3} and Sπ(2) = {1,3}.

2.    The marginal contribution of an agent j with respect to a permutation π in a game G = (P,ν) is denoted by ∆Gπ (j) and measures how much player j increases the value of the coalition consisting of its predecessor in π. More specifically:

                                                                                             .                                                                      (1)

We can now define the Shapley value ψG(j) of player j: it is simply his average marginal contribution, where the average is taken over all permutations of the player-set (assumed to be all equally likely):

ψG                                                                  π                      π                      ! X   X (#C)!(p −p!1 − #C)! ν(C ∪ {j}) − ν(C).            (2) p

                                                                                         π∈ΠP                                                                   C:j/∈C

The second version (♥) can be derived by noting that the marginal contributions ∆Gπ (j) are of the form ν(C ∪ {j}) − ν(C) where C is a coalition not containing j. Now, for how many orderings does one have Sπ(j) = C? There are (#C)! possible orderings of C and (p − 1 − #C)! orderings of P\(C ∪ {j}) another nice probabilistic interpretation (see page 307 here).

It can be shown that the Shapley value is in fact the only payoff division scheme that has all these four properties:

•     Efficiency: Pj ψG(j) = ν(P).

•     Dummy player: if player j is dummy (i.e., ν(C ∪ {j}) = ν(C) for any coalition C), then ψG(j) = 0.

•     Symmetry: if j1 and j2 are symmetric players (i.e., ν(C∪{j1}) = ν(C∪{j2}) for any coalition C), then ψG(j1) = ψG(j2). • Additivity: given any two characteristic function games G1 = (P,ν1) and G2 = (P,ν2), and their sum G+ = (P,ν1 +ν2),

                                                                                            ψG+(j) = ψG1(j) + ψG2(j)              for all j ∈ P.

Example 2. Suppose that Ciccio-Pharma (C) is a small biotech company who discovered a new vaccine but, to manufacture and market it, needs to team up with a larger partner. Two candidates: Aristo-Medical (A) and BruttiMaBuoni-Inc (B). If A or B teams up with C, the big firm will split “1 billion” with C. Here is a possible characteristic function ν(A) = ν(B) = ν(C) = ν(AB) = 0 and ν(AC) = ν(BC) = ν(ABC) = 1.

Since there’re only 3 players, to get Shapley’s payoffs we can make a table indicating the value brought to a coalition by each player on the way to formation of the gran coalition:

Permutation
Player A
Player B
Player C
ABC
0
0
1
ACB
0
0
1
BAC
0
0
1
BCA
0
0
1
CAB
1
0
0
CBA
0
1
0
Since in the derivation of the Shapley value it is assumed that the 3! = 6 permutations/arrival sequences are all equally likely, the average value of each biotech company is simply:

                                                                                                                                                 ψ(A) =  ,          ψ(B) =  ,           ψ(C) =   =  .

                total value                 1                    1                    4            the billion, and the big companies split the remaining third.So Ciccio-Pharma, the drug discoverer, will get two-thirds of

Example 3. (Shapley in Simple Games)

A game G = (P,ν) is said to be simple if it is monotone (i.e., ν(C1) 6 ν(C2) if C1 ⊆ C2) and its characteristic function only takes values 0 and 1. The game in Example 1 is clearly simple as soon as we rescale the payoffs so that ν(P) = 1.

In simple games, the Shapley (a.k.a. Shapley–Shubik power index in this context) have a particularly attractive interpretation: it measures the power of a player, i.e., the probability that she can influence the outcome of the game.

Indeed, the Shapley value of a player j in a simple game G = (P,ν) with #P = p can be rewritten as follows:

ψG(j)={proportion of permutations where j is pivotal}                                                                 ν Sπ(j) ∪ {j} =1

In other words, if agents join the coalition in a random order, ψG(j) is exactly the probability that player j turns a losing coalition into a winning one. Example 4. (Shapley for Variable Importance)

Suppose we are dealing with a generic (supervised) learning problem where we need to predict a response Y based on a bunch of covariates X = (X1,...,Xp). The goal here is to measure the importance of Xj in this task.

We can cast this problem into a suitable characteristic function game having the p covariates as players P = {1,...,p}, and some measure of fit as characteristic function ν. To be more specific, for any coalition (of covariates) C, let XC = (Xj : j ∈ C) and Yb(C) = E(Y |XC), the ideal optimal predictor (under a squared risk) based on the covariates in XC. Now, if we take ν(C) = −EY − Yb(C)2 then ψ(j) = p1! Xπ EhYb Sπ(j) − Yb Sπ(j) ∪ {j}i2 population quantity to be estimated! (4)

This is just an averaged version (over all possible submodels) of LOCO, another, recently introduced var-importance measure. It is instructive to try to rephrase in this context the previous 4 properties that characterize the Shapley value.

Remark: this is all nice and good but, of course, Shapley for variable importance is not perfect. For example, it defines variable importance with respect to all submodels, but most of those submodels are not of interest and, in addition, it is strongly influenced by the correlation between covariates.

2. The Exercise: Let’s get together and feel all right...

 Your job  
1. Introductory
To check your understanding of Shapley, let’s start easy by playing around with a tiny game having only 3 players. So, imagine (!) there’re three students curiously named: Antwohnette (A), BadellPadel (B) and Chumbawamba (C). Exactly one of them needs to be working (not necessarily all day long) on this HW to complete it. Here’s their working-hours:

A          Antwohnette      14:00-17:00         A coalition C is an agreement by one or more students as to the

B           BadellPadel        11:00-16:00         times they will be really working[2]. Its values will be given by:

C          Chumbawamba  9:00-13:00            ν(C)={# of hours potentially saved by well organized coalition}

Clearly ν(A)=ν(B)=ν(C)=0 and ν(ABC)=4. Complete the definition of ν and find Shapley (see Example 2).

2. Probabilistic
Now let’s get serious. Imagine you are in a real, larger team with 12 people having the following working schedule:

 

Now, I might ask you to find the characteristic function as before...but here you’re dealing with 212 =4096 coalitions

 I built it for you[3] (Bonus: write an R function, the shorter the better, to get it for a general game of this type). I might also ask you to calculate the Shapley payoffs for each team member...but now there’re 12! = 479,001,600 orderings to consider, not exactly trivial as in Example 2...

Nevertheless, look again at Equation 2: the Shapley is “just” an expectation (w.r.t. a very specific distribution), so:

•     Review our notes on MonteCarlo Approximation and Error Control.

•     Select 3 players of the team.

•     Fix a suitable[4] (,α) pair of tolerance and confidence, and design an Hoeffding-based simulation to approximate the Shapley payoffs of those 3 players (Hint: if you look closely, all you need is essentially sample())

•     Produce separate Hoeffding-based confidence intervals for these payoffs and briefly comment all your results.

3. Statistical
Imagine you are a financial investor currently dealing with a portfolio of p stocks whose returns (see notes) are modeled by a set of random variables {X1,...,Xp}. In this setup, it may be important to allocate to each asset of the portfolio its contribution to the total utility defined as Uω Ppj=1 Xj, where the utility function Uω(·) for us will be a linear combination of the portfolio average return and volatility (Markowitz ’ style!, see here), that is

                                                                               Uω(X) = E(X) − ω · Var(X)           for some weight      ω > 0.

In a recent paper, the Authors tackle the problem from a cooperative game theory perspective by defining a suitable variance game over P = {1,...,p}, whose Shapley value turns out to be very simple, intuitive (and familiar).

More specifically, let C be any coalition of the p assets and RC = Pj∈C Xj their return. Now, for any C ⊂ P, define

ν(C) = Uθ(RC) = E(RC) − ω · Var(RC) = νE(C) − ω · νV(C).

Hence, the game G = (P,ν) is a linear combination of two other games, GE = (P,νE) and GV=(P,νV) =⇒ by the additivity of Shapley, ψG(j)=ψGE(j) − ω · ψGV(j). By the linearity of expectation, the first term is trivially equal to ψGE(j)=E(Xj), whereas the second one, after elementary manipulations, turns out to be equal to ψGV(j) = Cov(Xj,RP).

In conclusion, once we pick a suitable weight ω > 0, the Shapley allocation to each stock j ∈ {1,...,p} is given by

p

                                           ψ(j)G = E(Xj) − ω · Cov(Xj,RP) = E(Xj) − ω · XCov(Xj,Xr) a population quantity to be estimated!                           (5)

r=1

In other words, to learn the Shapley in this financial game, we need to study the marginal correlation graph among some standard measure of stock relative performance (see Appendix (B) in the notes). To this end, we may collect the

daily closing prices for p stocks[5], selected within those consistently in the S&P500 index.

The stocks are categorized into 10 Global Industry Classification Standard (gics) sectors, including Consumer

Discretionary, Energy, Financials, Materials, Health Care, Utilities, Industrials, Information Technology, Consumer Staples, and Telecommunications Services. It is expected that stocks from the same gics sectors should tend to be clustered together, since they (allegedly) tend to interact more with each other6.

So, although cherry picking stocks to optimize our portfolio is not our goal here, ideally we want to collect few stocks from different gics to boost the overall value/utility of the portfolio7. Here’s the details:

•     Select a sensibly sized portfolio of p stocks and take data from the “Covid-Age”: January 1, 2020 till today (see notes). More specifically, if ct,j is the closing price of stock j on day t, let xt,j = log(ct,j/ct−1,j).

•     Based on the data matrix X = xt,jt,j having time on the rows and stocks on the columns, we want to estimate the Pearson-based correlation graph over stocks (i.e. each node in the graph is a stock), by simply treating the instances {xt,j}t as independent replicates even though they are not (they form a time series).

In particular we place an (undirected) edge between stock j1 and stock j2 only if an asymptotic Normal-based confidence interval for their Pearson-correlation does not include zero (you choose the level α of the many ci’s).

•     Artistically (?) visualize the resulting graph (see here, and here), and draw some conclusion based on the decisions made along the way (i.e. the number and sector of the stocks, the level α, the interval type, etc.).

•     Finally, always based on the data matrix X, provide and possibly visualize bootstrapped confidence intervals for the Shapley values of the p stocks in your portfolio. As before, you choose the weight(s) ω, the level α, and the interval type. Comment the results based on the choice you made.

Remark: as done before, also here make the simplifying assumption that, for each stock, we are dealing with independent replicates.

 


 
[1] An implicit assumption here: the coalition value ν(C) can be divided among the members of C in any way that the members of C choose. Formally, games with this property are said to be transferable utility games (TU games)
[2] This is an example of cooperative game theory applied to a resource allocation problem.
[3] The characteristic function is stored as a list char_fun with p = 12 slots so that, for example, char_fun[[2]] contains the 122 = 66 values associated to coalitions formed by 2 players.
[4] First and foremost in terms of computational time!
[5] The number of stocks p will affect the computational time and the statistical performance, hence, as usual, choose it wisely! 6I wrote “allegedly” since this is a hypothesis we should verify empirically instead of taking it for granted. 7Diversification in investing is key, somebody once said.

Hint: keep this in mind when talking about esemble tecniques like random forest to tackle predictive problems.

More products