Project Summary

Our project is called Golem Globe, which is a modified implementation of Wumpus World in Minecraft. The goal of our project is to create a Minecraft AI agent that can navigate through a maze containing pits and monsters to find the treasure/gold then go back to the start position.

Approach

We begin by trainning our agent on a static map. The agent learns through the use of a tabular q-table which rewards the agent for successful traversals and penalizes the agent for failing to traverse the map. During a traversal, the agent stores in memory all previous observations it made and associates them with a respective reward. When the agent is done traversing (either by finding the gold, or dying) it maps all the previous actions it made to the total rewards received. We created our agent to prioritize the long-term reward, and added some randomness so that it would encounter more scenarios.

By running the agent on the same map and recording the results of previous attempts, the agent begins to associate the observations it made with the reward it was given, and thus learns which actions to undertake to maximize the reward.

Evaluation

To evaluate the performance of our agent we are using a combination of qualitive and quantitative metrics.

Quantitative:

The primary quantitative metrics we will use is the cummulative reward received and the sucess rate. The cummulative reward is calculated by simply summing up the agents rewards from all action it took, while the success rate is the ratio of succeesful traversal to total attempts made. If our agent recieves on average a higher cummulative reward, and achieve a greater success rate we consider that a successful implementation.

As an example here are our results both for training and validation:

Pre-Training Quantitative Results

As you can see here the agent continuously failed to complete his missions.

Test# Result Reward Steps
Test 1 Fail -1016 16
Test 2 Fail -1054 54
Test 3 Fail -1115 115
Test 4 Fail -1010 10
Test 5 Fail -1036 36
Average 0% -1046.2 46.2

Post-Training Quantitative Results

These results show that after a large number of training sessions the agent is not only able to complete its missions but is also able to find the shortest paths to the gold.

Test# Result Reward Steps
Test 1 Success 989 11
Test 2 Success 987 13
Test 3 Success 985 15
Test 4 Success 981 19
Test 5 Success 981 19
Average 100% 984.6 15.4

Qualitative:

To verify that the project works we will begin the AI on a controlled map that will not change (traning data). If the AI succeeds they will move on to randomized maps for testing and learning. Qualitatively, we will consider the difficulty of hte maps our agent can successfuly traverse, and how easily it can adapt to new maps.

Our moonshot case is to create an agent that stops dying and is always able to retrieve the gold for every map.

Remaining Goals and Challenges

Goals

Challenges

Resources Used

Below is a list of resources we found helpful throughout the development of our project