The Knowledge Gradient

Warren B Powell Professor Emeritus, Princeton University

In 2025, Peter Frazier, one of my former Ph.D. students, won the 2025 “Test of Time” award for his paper on the knowledge gradient for correlated beliefs that he performed while a Ph.D. student in my lab (2005-2008).  Much more than a best paper award, the “Test of Time” award represents the judgment of everyone who adopted this approach in the years since it first appeared (2008-2009).   

Peter’s work opened up a decade of research performed by eight Ph.D. students and post-docs while in my lab. This webpage is a summary of this work.

This page is organized as follows:

Test of Time Award Citation 2007–2011

INFORMS Journal on Computing Test of Time Award certificate presented to Peter Frazier, Warren Powell, and Savas Dayanik for 'The Knowledge-Gradient Policy for Correlated Normal Beliefs,' November 2025

The Knowledge-Gradient Policy for Correlated Normal Beliefs

Peter Frazier, Warren Powell, Savas Dayanik INFORMS Journal on Computing 21(4):599–613, Fall 2009 https://pubsonline.informs.org/doi/10.1287/ijoc.1080.0314

This paper introduces the now well-known knowledge-gradient (KG) method for gathering information when faced with a black-box objective, where query measurements may be both costly and noisy. The focus of the paper is on ranking and selection, but the KG method can be used also to study multiarmed bandit problems and many other models in Bayesian information collection. This very broad applicability led to a lively KG research area that continues to this day, with new developments studied in modern machine learning. It is an excellent example of work that has stood the test of time.

Peter Frazier prepared a retrospective of the knowledge gradient, which summarizes the body of research that followed the publication of the knowledge gradient (Click here).

Note that Peter (and the Bayesian optimization community) refer to the knowledge gradient as an “acquisition function.” I use the more general term “policy” that recognizes that these active learning problems are just special cases of sequential decision problems, also known as dynamic programs.  In chapter 7 of my book Reinforcement Learning and Stochastic Optimization, I address this class of problems under the heading of “derivative-free stochastic search” and illustrate all four classes of policies.  The knowledge gradient falls in the fourth class, direct lookahead approximations.


Background

There is a vast range of sequential decision problems, but in my opinion, by far the most important falls under the broad umbrella of “derivative-free stochastic search” where we are trying to solve the problem 

\(min_x E_W F(x,W)\)  

where \(x\) may be a discrete scalar, or a continuous or discrete vector.  \(W\) represents random inputs, and \(F(x, W)\) is some process or function where we cannot compute the expectation, but we can observe a noisy sample observation \({\hat F(x,W)}\) for an \(x\) that we choose, and inputs \(W\) that we do not control.  We assume in this work that we do not have access to derivatives of \(F(x,W)\).

Examples of \(F(x, W)\) include: the revenue from advertising a product, the performance of a drug or medical treatment, the performance of a scientific experiment to produce a new material, the outcome of a simulation of a business process, the performance of an athlete or salesperson, ... The list is endless.

Although \(x\) may be discrete or continuous, scalar or vector, the most common version of this problem involves identifying the best of a set of discrete choices which have uncertain values, as depicted below:

Bar chart of seven discrete choices (A through G) with uncertainty bars, alongside a list of example decision settings: type of drug, supplier, trading policy, product design, battery technology, price, web page design, product to advertise, location for a clinic, diameter of silicon wafer, financial trading policy, advertising channel, choice of manager

In chapter 7 of my 2022 book, Reinforcement Learning and Stochastic Optimization, I show how this problem can be solved with any of the four classes of policies.  However, two are the most widely used: 

These policies are known in certain communities (such as Bayesian optimization) as acquisition functions.  I am not sure why this term has been adopted, since this term is unique to certain communities, which disguises its membership in the much more general problem class I refer to as sequential decision problems. 

My experience is that the knowledge gradient policy is best suited for problems where experiments are expensive, which means we have to exploit as much as possible any prior knowledge.  A particularly important property is its ability to minimize the dependence on tunable parameters.

The knowledge gradient for correlated beliefs appears to be the first policy that explicitly captured correlations between the beliefs of different alternatives.  This laid the groundwork for the research of other students (sometimes with Peter’s help) that used other belief models, including linear models; models with sampled, parametric beliefs; models where each alternative is described with a multidimensional attribute vector, leading to a belief model based on hierarchical aggregation; and a belief model using local parametric beliefs.  

Software

Peter Frazier has made his software library for the knowledge gradient, including the calculation of the knowledge gradient for correlated beliefs, on GitHub here.

Learning while doing video

A powerful strategy to achieve steady improvement of operations sometimes involves adjusting parameters in the field.  I prepared a video demonstrating this for a mutual fund struggling to find the right amount of cash to keep on hand to meet redemptions.  The video illustrates a UCB-style policy (called “Interval estimation”) along with an application of the knowledge gradient, which is explained without mathematics.  The video brings out the challenge of tuning parameters in simpler policies such as those in the UCB class.  KG is somewhat more complex, but does a better job of exploiting prior knowledge, and does not have any tunable parameters.

Access the video here.

Teaching materials

The “optimal learning” problems addressed by the knowledge gradient span an important class of sequential decision problems.  I would argue that this is by far the most common decision problem that people have to solve on a day-to-day basis.  I have prepared teaching materials on sequential decision problems:

Teaching materials for sequential decision problems

Search on “Optimal/active learning” for the materials relevant to the knowledge gradient.

Students and papers related to the knowledge gradient

Below is a list of the students I supervised, starting with Peter Frazier, who did work developing and extending the knowledge gradient, often motivated by real applications, often in the hard sciences.  There was considerable joint work  - I have listed the papers under the student who was the primary author on the paper. 

Peter Frazier (Ph.D. student)

Ilya Ryzhov (Ph.D. student)

Martijn Mes (post doc)

Warren Scott (Ph.D. student)

Xinyu He (Ph.D. student)

Yan Li (Ph.D. student)

Yingfei Wang (Ph.D. student)

Si Chen (Ph.D. student)

The experience of supervising all this research helped me develop a comfort level with recognizing that “belief states” are legitimate elements of any state variables for dynamic systems where learning is happening.  I supervised two papers that involved managing physical resources (vehicles) while learning was occurring.  Both are available here.  Brief summaries are: