State variables

The dynamic programming community (specifically Markov decision processes) has avoided coming up with a proper definition of a state variable dating back to Bellman. The renowned probabilist Erhan Cinlar — a colleague of mine at Princeton — answered my question “What is a state variable?” with the response: “Ahhh… that which cannot be defined.”

This webpage provides some background on a topic that has long been overlooked: defining a state variable. This discussion is organized under the following:

State variables are important because they describe what information is needed to solve a problem.

Some attempts at definitions

Defining state variables

The lack of a proper definition is not universal. Two books on optimal control — Kirk (2004) and Cassandras and Lafortune (2008) — both use:

“A state variable is all the information you need at time $t$ to model the system from time $t$ onward.”

This definition is absolutely true; all that is missing is a description of what information is needed.

For that, I offer the following definition (RLSO, Chapter 9):

State variable (policy-dependent version) — a function of history that is necessary and sufficient to compute:

  1. The cost / contribution function $C(S_t, x_t)$.
  2. The decision function (the policy) $X^\pi(S_t)$.
  3. Any information required by the transition function to model the information needed for the cost/contribution and decision functions in the future,
\[S_{t+1} = S^M(S_t,\, x_t,\, W_{t+1}(S_t, x_t)).\]

There is a slightly different definition if a policy is not specified.

What is key is that we just have to look at three functions (and possibly a fourth) to identify the information that is needed:

An example of information needed by the transition function can be illustrated by modeling prices using a time-series model given by

\[p_{t+1} = \theta_0 p_t + \theta_1 p_{t-1} + \theta_2 p_{t-2} + \varepsilon_{t+1}.\]

A common mistake is to assume that the “state” of this dynamic system is $p_t$, while $p_{t-1}$ and $p_{t-2}$ represent the history. This is not true. The state of the price process is given by

\[S_t^{\text{price}} = (p_t,\, p_{t-1},\, p_{t-2}).\]

The prices $p_{t-1}$ and $p_{t-2}$ are only needed to create estimates of $p_{t+1}$ and $p_{t+2}$, both of which will be needed in the performance metric.

Flavors of state variables

In our work it has been helpful to classify state variables into three categories:

A state variable may consist purely of any one of these three, or any combination of two or more, or all three. Examples of all of these can be found in Sequential Decision Analytics and Modeling (see in particular the discussion of state variables in chapter 7).

It is important to distinguish between the initial state variable $S_0$ and the dynamic state variables $S_t$ for $t \geq 0$:

State variables can be as simple as a node in a network, but for most applications they can be complex vectors of discrete and continuous values. When the state information includes statistical estimates (beliefs), it may be necessary to carry large matrices carrying information about correlations.

It is important to stop thinking of state variables as a node in a network, or the state of a game board. State variables are information, and information can be complicated.

A history of state variables

The concept of state variables goes back 200 years, yet the mathematical communities that use this concept have avoided offering a definition. The presentation below captures some of this colorful and surprising history.

Definitions from the MDP community

Some “definitions” of state variables:

Let me first start by asking: didn’t we all learn in grade school that we do not use the word we are defining in its definition??!!

A definition from the RL community

The reinforcement learning literature inherited the style of not defining state variables from the literature on Markov decision processes, but a notable exception is the second edition of Sutton and Barto’s Reinforcement Learning: An Introduction. While they never explicitly define a state variable, they offer descriptions:

The first bullet seems to suggest that all available information (about the environment) is in the state variable, but does not define “environment.” The second bullet includes the condition “that make(s) a difference for the future.” Keep reading.

From some theoreticians

I have spoken to numerous mathematicians (in stochastic control/optimization) who will insist “but I know what a state variable is.” Consider the following anecdotes of statements made by some of the best known names in the field:

Side-by-side images: the covers of Fleming and Soner's 'Controlled Markov Processes and Viscosity Solutions' and René Carmona's 'Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applications,' next to a scan of the section 'The Optimization Problem' that defines a cost functional without defining a state variable

If we agree that a state is all the information you need to model the system from time t onward, then the system is, by definition (and by construction) Markovian. Further, you would never need information from history since again, by definition (and by construction), the state variable already has any information that may have arrived before time t (or “time” k). So there is no need to “expand the state space sufficiently,” nor any need to depend on history.

Side note: a talented post-doc in my lab posed the question: what if we simply do not know all the information we need? This raises subtle issues that are more than I can cover on a webpage. See note (vii) on page 483 of RLSO (following the definition of states), and section 20.2 in RLSO which uses a two-agent model of flu mitigation to illustrate the setting of when a controlling agent does not know the environment.

Definitions from optimal control

Now look at some definitions in books on optimal control:

Scan of a paragraph from Kalman's 1963 paper: 'In modern terminology, we say that the numbers which specify the instantaneous position and momentum of each particle represent the state of the system. The state is to be regarded always as an abstract quantity. Intuitively speaking, the state is the minimal amount of information about the past history of the system which suffices to predict the effect of the past upon the future. Further, we say that the forces acting on the particles are the inputs of the system. Any variable in the system which can be directly observed is an output.'

Each of these definitions can be restated simply as:

The state is all the information you need at time t to model the system from time t onward.

This definition is also consistent with Sean Meyn’s characterization of state variables (given above) as “sufficient statistics” which is just another way of saying the definition above.

Both of the definitions above understand that to model the system moving forward, you need the controls $u(t)$ (presumably determined by a “control law” or “policy”) as well as any exogenous (random) information. These definitions appear to be standard in optimal control.

I like the characterization, widely used in books on optimal control, that the state variable is all the information you need to model the system from time t onward, regardless of when the information arrived! My only complaint is that it needs to be more explicit.

But what if there is missing information?

In a mathematical statement of a problem, we can design a state variable that includes all the information needed to meet the three requirements I outlined above. But what if we simply do not have access to some information?

Some examples of missing information are:

Missing the information about the age of materials within the transformer, or how the market responds to prices, makes it impossible to model the forward trajectory of the system. How do we construct a state variable that meets the requirements in all the above definitions to include all the information needed to model the system forward in time?

What we do is to replace the information with a probabilistic belief. This may be based on frequentist or Bayesian modeling, which means it has to be more than just a point estimate — it has to include a probability distribution of the missing information.