An optimal policy for partially observable Markov decision processes with non‐independent monitors
Journal of Quality in Maintenance Engineering
ISSN: 1355-2511
Article publication date: 1 September 2005
Abstract
Purpose
This research investigated the optimal structure of a discrete‐time Markov deterioration system monitored by multiple non‐independent monitors. The purpose is to obtain a sufficient condition with which the optimal policy is given by a control limit policy.
Design/methodology/approach
The model of this research is formulated as a partially observable Markov decision process. The problem is to obtain an optimal policy which can minimize the expected total discounted cost over an infinite horizon.
Findings
The research found that the expected optimal cost function over an infinite horizon has a property of control limit policy given the conditions that a transition probability having a property of totally positive of order 2 and a conditional probability of the monitors having a property of weak multivariate monotone likelihood ratio. Furthermore, we showed that the optimal policy has only four action regions at most.
Practical implications
If the optimum policy can be limited to a control limit policy, the tremendous amount of calculation time required to find the optimum procedure can be reduced. This enables the best decision to be identified in a much shorter period of time.
Originality/value
A deterioration system monitored incompletely by one monitor has been studied in the previous research. This research considered the case of a multiple number monitors whose observations were not independent.
Keywords
Citation
Jin, L., Mashita, T. and Suzuki, K. (2005), "An optimal policy for partially observable Markov decision processes with non‐independent monitors", Journal of Quality in Maintenance Engineering, Vol. 11 No. 3, pp. 228-238. https://doi.org/10.1108/13552510510616441
Publisher
:Emerald Group Publishing Limited
Copyright © 2005, Emerald Group Publishing Limited