Earning and Learning with Varying Cost

2021-11-18 08:24:39

Topic: Earning and Learning with Varying Cost

Speaker: Associate Professor Guangwu LIU, City University of Hong Kong, China

Time and Date: 10:00, Nov. 24th, 2021

Platform: Tencent Meeting ID: 215-378-957

Speaker Profile:

Dr. Guangwu LIU, professor of the Department of Management Science, Business School of City University of Hong Kong, graduated from the Department of Mathematics of Tsinghua University in 2005, and received a doctorate in industrial engineering and logistics management from the Hong Kong University of Science and Technology in 2009. His research fields include Stochastic Simulation, Machine Learning, Business Analytics, Financial Engineering and Risk Management, etc. Professor LIU has published many papers in top and authoritative journals of management science and operational research, including Management Science, Operations Research, INFORMS Journal on Computing, Production and Operations Management, Naval Research Logistics, ACM Transactions on Modeling and Computer Simulation, etc. He is now the associate editor of Naval Research Logistics and Asia-Pacific Journal of Operational Research, and has been awarded the 2012 Outstanding Simulation Publication Award of INFORMS Simulation Society, and the Early Career Award of The Research Grants Council of Hong Kong.

Abstract:

We study a dynamic pricing problem where the observed cost in each selling period varies from period to period, and the demand function is unknown and only depends on the price. Motivated by the classical upper confidence bound (UCB) algorithm for the multi-armed bandit problem, we propose a UCB-Like policy to select the price. When the cost is a continuous random variable, as the cost varies, the profit of the optimal price can be arbitrarily close to that of the second-best price, making it very difficult to make the correct decision. In this situation, we show that the expected cumulative regret of our policy grows in the order of (log T)2. When the cost takes discrete values from a finite set and all prices are optimal for some costs, we show that the expected cumulative regret is upper bounded by a constant for any T. This result suggests that the suboptimal price will only be selected in a finite number of periods, and the trade-off between earning and learning vanishes and learning is no longer necessary beyond a certain period.

 

CopyRight 2014© Business School of Central South University

ADD: Jiangwan Building,New Campus of Central South University,Yuelu District,Changsha City,Hunan Province,P.R.China,410083 Tech Support : GEEIIN.INFO