We present a horizon-based value iteration algorithm called Reverse
Value Iteration (RVI). Empirical results on a variety of domains,
both synthetic and real, show RVI often yields speedups of
several orders of magnitude. RVI does this by ordering backups by
horizons, with preference given to closer horizons, thereby avoiding
many unnecessary and incorrect backups. We also compare
to related work, including prioritized and partitioned value iteration
approaches, and show that our technique performs favorably.
The techniques presented in RVI are complementary and can be
used in conjunction with previous techniques. We prove that RVI
converges and often has better (but never worse) complexity than
standard value iteration. To the authors’ knowledge, this is the first
comprehensive theoretical and empirical treatment of such an approach
to value iteration.