Javier Rodríguez^{1}, Luis R. Izquierdo^{2} and Segismundo S. Izquierdo^{3}

^{1} Telefónica I+D

^{2} Department of Management Engineering, Universidad de Burgos, Edificio la Milanera, Calle de Villadiego, 09001 Burgos, Spain

^{3} BioEcoUva Research Institute on Bioeconomy, Department of Industrial Organization, Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain

**Keywords:** Evolutionary Game Theory, Agent-Based Modeling, Evolutionary Dynamics, Distributed Control, Decentralized Algorithms.

# 1. Introduction

Over the past few years, the scientific community has been studying the usefulness of evolutionary game theory to solve distributed control problems. This approach consists in finding a game (i.e. a set of actions and a payoff function for each agent) and a revision protocol such that the induced dynamics lead to the achievement of the overall objective despite the fact that individual agents may not have access to all the information needed to know the state of the system. In this paper we analyze a simple version of the Best Experienced Payoff algorithm [1,2], a simple revision protocol, completely decentralized and which has minimum information requirements. We assume that there is a population of agents that may engage in a 2-player symmetric game *G* = {*S*, *A*}, where *S* denotes the set of possible pure strategies and *A* = [*a _{ij}*] denotes the payoff matrix. In this paper we only consider games that satisfy the following

**payoff conditions**:

There is a strategy such that the following two conditions hold:

*a*> Max_{ss}_{i≠s}*a*_{ij}for all*i, j*∈*S*.*a*≥ Max_{sj}_{i≠s}Min{*a*,_{ij}*a*} for all_{is}*i, j ≠ s*.

The first condition implies that the optimal symmetric state is the one where every agent is choosing strategy *s*.

# 2. The BEP algorithm

The BEP algorithm runs in discrete timesteps. At each timestep, one agent is chosen at random to revise its strategy. In the simplest case, the agent tests all their strategies and tries each of them only once against a randomly drawn opponent. Then, the revising agent chooses the strategy that provided the greatest payoff, resolving ties using some pre-established rule. Here, we assume that the tie breaker is uniform random. The version of BEP that we use in this paper –where revising agents test all their strategies, each against one single agent, and break ties at random– is called BEPA1.

# 3. Analytical results

## 3.1 Absorbing states of the BEPA1 dynamics

Defining the population state by the number of agents that are using each strategy, the dynamics induced by the BEPA1 protocol can be seen as a Markov chain. The following observation characterizes the absorbing states of this Markov chain.

**Observation ****1****. **A state of the Markov chain induced by the BEPA1 protocol with more than 2 agents is absorbing if and only if all agents are playing the same pure strategy *i* ∈ *S *and strategy profile (*i, i*) is a strict Nash equilibrium.

This does not mean that the dynamics starting from other states will necessarily end up there. To find out whether the optimal state will be approached from other initial conditions, in the next section we turn to the mean dynamics (MD), which is a set of differential equations that approximate the transient dynamics of the Markov chain very well, especially when the population is large.

## 3.2. Deterministic approximation for the transient dynamics

**Proposition ****2** [3, prop. 5.11]. Consider a game *G* = {*S, A*} such that there is a strategy *s* ∈ *S* that satisfies *a _{ss}* > Max

_{i≠s}

*a*

_{ij}for all

*i, j*∈

*S*and

*a*≥ Max

_{sj}_{i≠s}Min{

*a*,

_{ij}*a*

_{is}} for all

*i, j*≠

*s*. For that game, all solution trajectories of the mean dynamics starting at a state

*x*with

*x*

_{s}> 0 converge to the state where every agent is using strategy

*s*.

These analytical results strongly suggest that the BEPA1 stochastic process on finite populations will converge to the optimal state with high probability from most initial conditions, at least for large populations of agents.

# 4. Simulation results

In this section we present simulation results of the BEPA1 dynamics for a (single-optimum coordination) game with *n* strategies whose payoff matrix is:

[latex]\left(\begin{array}{ccccc} 1 & 0 & 0 & ... & 0 \\ 0 & 2 & 0 & ... & 0 \\ 0 & 0 & \ddots & & \vdots \\ \vdots & \vdots & & n-1 & 0 \\ 0 & 0 &... & 0 & n \\ \end{array} \right)[/latex]

To compare simulation results with the MD, we define one *tick* as the number of timesteps over which every agent is expected to receive exactly one revision opportunity. We show that the BEPA1 dynamics converge to the optimal state with high probability and very quickly, and that the MD provides a very good approximation of the transient BEPA1 dynamics even for 100 agents. The simulations show that the time until convergence is independent of the number of agents. The MD provides a good approximation regardless of the number of strategies considered in the simulations. As one would expect, the greater the number of strategies, the slower the convergence.

We also checked the robustness of these results to different updating schemes [4]:

*Asynchronous random independent*. We take one agent at random and give it the opportunity to revise its strategy. In each tick, we repeat this process as many times as agents there are.*Asynchronous random order*. In every tick, we give all agents the opportunity to revise their strategy sequentially in a random order.*Synchronous*. In every tick, all agents revise their strategy at the same time.

Our results are robust to these different updating schemes.

Finally, we test the BEPA1 algorithm in different types of networks: ring, Barabási–Albert preferential attachment [5], Watts-Strogatz small world with different rewiring probabilities and average degrees [6], and complete. The time until convergence decreases as the size of the neighborhood increases. A neighborhood size of only 5% seems to be enough to reach convergence speeds like those achieved in the complete network regardless of the topology. For neighborhood sizes lower than 5%, topology does play a role even when fixing the average neighborhood size. The ring topology is much more regular than the Preferential Attachment and the convergence times are significantly higher. The hypothesis that information flows more slowly through regular topologies makes intuitive sense and is also borne out in Small World networks.

# 5. Conclusions

We have shown that BEPA1 can be used as a fast and scalable decentralized algorithm to achieve the global optimum in single-optimum coordination problems. The algorithm is simple, completely decentralized and has minimum information requirements. The optimum is achieved from nearly all initial conditions, so the algorithm is well suited for uncertain environments where the objective function may change dynamically. The convergence speed is high, nearly independent of the number of agents, and shows little dependency also on the number of strategies. The algorithm is also robust to different updating schemes. Experiments over different types of networks show that the algorithm retains its usefulness even in environments where agents can interact only with a low percentage of the population.

**Acknowledgments**. Financial support from the Spanish State Research Agency (PID2020-118906GB-I00 / AEI/ 10.13039/501100011033), from “Junta de Castilla y Leon – Consejería de Educación” through BDNS 425389, from the Spanish Ministry of Science, Innovation and Universities (PRX18-00182, PRX19/00113), and from the Fulbright Program (PRX19/00113), is gratefully acknowledged.

# References

- Osborne MJ, Rubinstein A (1998) Games with Procedurally Rational Players. Am Econ Rev 88:834–847. https://doi.org/10.2307/117008
- Sandholm WH, Izquierdo SS, Izquierdo LR (2019) Best Experienced Payoff Dynamics and Cooperation in the Centipede Game. Theor Econ 14: 1347–1385. https://doi.org/10.3982/TE3565
- Sandholm WH, Izquierdo SS, Izquierdo LR (2020) Stability for best experienced payoff dynamics. J Econ Theory 185: 104957. https://doi.org/10.1016/j.jet.2019.104957
- Cornforth D, Green DG, Newth D (2005) Ordered asynchronous processes in multi-agent systems. Phys D Nonlinear Phenom 204:70–82. https://doi.org/10.1016/j.physd.2005.04.005
- Barabási A-L, Albert R (1999) Emergence of Scaling in Random Networks. Science 286(5439):509–512. https://doi.org/10.1126/science.286.5439.509
- Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442. https://doi.org/10.1038/30918