Chung-Ang University researchers develop algorithms for optimal decision-making under loud noisy rewards

In data science, researchers usually deal with data containing noisy observations. An important problem explored by data scientists in this context is the problem of sequential decision making. This is commonly referred to as a “stochastic multi-armed bandit” (stochastic MAB). Here, an intelligent agent sequentially explores and selects actions based on noisy rewards in an uncertain environment. Its goal is to minimize cumulative regret – the difference between the maximum reward and the expected reward of the selected actions. A little regret implies more effective decision-making.

Most existing studies of stochastic MABs have performed regret analysis assuming that reward noise follows a light-tailed distribution. However, many real-world datasets actually show a heavy-tailed noise distribution. These include data on user behavior patterns used to develop personalized recommendation systems, stock price data for automatic transaction development, and sensor data for autonomous driving.

In a recent study, Assistant Professor Kyungjae Lee of Chung-Ang University and Assistant Professor Sungbin Lim of Ulsan Institute of Science and Technology, both in Korea, addressed this question. In their theoretical analysis, they proved that existing algorithms for stochastic MABs were suboptimal for heavy-tailed rewards. Specifically, the methods employed in these algorithms – robust upper confidence bound (UCB) and adaptively perturbed exploration (APE) with unlimited perturbation – do not guarantee minimax optimality (minimization of the maximum possible loss).

“Based on this analysis, robust minimax optimal (MR) UCB and APE methods have been proposed. MR-UCB uses a narrower confidence limit of robust mean estimators, and MR-APE is its randomized version. It uses a bounded perturbation whose scale follows the modified confidence limit in MR-UCB,” says Dr. Lee, talking about their work, which has been published in the IEEE Transactions on Neural Networks and Learning Systems. September 14, 2022.

The researchers then derived independent and dependent upper bounds of the cumulative regret gap. For the two proposed methods, this last value corresponds to the lower limit under the heavy-tailed noise hypothesis, thus reaching the minimax optimality. Moreover, the new methods require a minimum of prior information and depend only on the maximum order of the bounded moment of the rewards. In contrast, existing algorithms require the upper bound of this a priori moment – ​​information that may not be accessible in many real-world problems.

After establishing their theoretical framework, the researchers tested their methods by carrying out simulations under Pareto and Fréchet noises. They found that MR-UCB consistently outperformed other exploration methods and was more robust with increased number of actions under heavy-tailed noise.

Further, the duo verified their approach for real-world data using a cryptocurrency dataset, showing that MR-UCB and MR-APE were beneficial – minimax optimal regret bounds and prior knowledge minimal – for tackling heavy-tailed synthetic and real stochastic MAB. problems.

“Being vulnerable to heavy-tail noise, existing MAB algorithms show poor performance in modeling stock data. They fail to predict large rises or sudden falls in stock prices, causing huge losses. In contrast, MR-APE can be used in stand-alone trading systems with stable expected returns through equity investment,” comments Dr. Lee, discussing potential applications of the present work. “Furthermore, it can be applied to personalized recommender systems since the behavioral data shows heavy-tailed noise. With better predictions of individual behavior, it is possible to provide better recommendations than conventional methods, which can maximize ad revenue,” he concludes.




Author: Kyungjae Lee1Sungbin Lim2


1Department of Artificial Intelligence, Chung-Ang University, Seoul, South Korea

2Graduate School of Artificial Intelligence and Department of Industrial Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea

About Chung-Ang University

Chung-Ang University is a private comprehensive research university located in Seoul, South Korea. It was started as a kindergarten in 1916 and gained university status in 1953. It is fully accredited by the Ministry of Education of Korea. Chung-Ang University conducts research activities under the slogan “Justice and Truth.” Its new vision to end its 100 years is “The world leader in creation”. Chung-Ang University offers undergraduate, postgraduate, and doctoral programs, which encompass a law school, a management program, and a medical school; it has 16 undergraduate and graduate schools each. Chung-Ang University’s cultural and artistic programs are considered the best in Korea.


About Assistant Professor Kyungjae Lee

Professor Kyungjae Lee got his BS and Ph.D. degrees in Electrical Engineering and Computer Engineering, respectively, from Seoul National University, Korea in 2015 and 2020, respectively. He is currently an assistant professor in the Department of Artificial Intelligence at Chung-Ang University, Seoul, Korea. His current research interests include multi-armed bandits, combinatorial bandits, reinforcement learning and their applications.

Risk Warning: Cryptocurrency is a notoriously volatile unregulated virtual asset with a high level of risk. All news, opinions, research, data or other information contained in this website is provided for news reporting purposes as general market commentary and does not constitute investment or trading advice.

/Public release. This material from the original organization/authors may be ad hoc in nature, edited for clarity, style and length. The views and opinions expressed are those of the author or authors.View Full here.

Sharon D. Cole