The target of reinforcement learning is to learn a coverage, which can be a mapping from states to steps, that maximizes the predicted cumulative reward as time passes.Cities presently produce large quantities of data (from security cameras and environmental sensors) and currently contain significant infrastructure networks (like People managing ta