A check-in is just a weekly opportunity to score some points and ensure that you are keeping up with the course content. The requirements vary from week to week, but may involve responding to a survey, taking a brief online quiz, or participating in a discussion. They are due by midnight on the designated day.

  1. +25
    • Skipped
  2. +25
    • Follow the instructions for the OpenAI gym setup, in order to get the script working on your system. Save the script and a screenshot named checkin2.png to your repository folder, commit, and push.
  3. +25
    • Read and download my program (in the test-demos folder of the public gitlab project). The function defined in that file has two doc tests that pass, but the function is actually full of bugs. Write several additional tests, and see if you can identify any of the bugs. You may either continue using doc tests, or switch to the unittest module as I demonstrated in Commit your updated to your gitlab project.
  4. +25
    • Read and download my program in the bandits folder of the public gitlab project. The main program is at the bottom of, so run that to do the following tests. Save your answers and notes to the file c4bandit.txt in your repository.
    • The first chunk of output is from an agent called NaiveGambler. What is its average reward?
    • The next chunk is from an agent called BasicEstimatingGambler. What is its average reward?
    • Now we’re going to try to improve BasicEstimatingGambler. Currently the gambler only updates its estimates when exploring. Add or move around code so it updates estimates when exploiting too. Does that improve the average reward?
    • Currently the self.exploreRate is set to 0.6. Try a few numbers larger and a few smaller. What is the relationship between exploreRate and the average reward? Leave exploreRate set to the best value you found.
    • Finally, let’s take a look at the use of argmax in the exploit code. This finds the index of the largest estimated value. But what if there’s a tie? argmax will always return the left-most index equal to the max. Try to use this technique to break ties randomly instead. Does that improve the average reward?
  5. +25
    • TBD
  6. +25
    • TBD
  7. +25
    • TBD
© 2017 Christopher League  ·  Some rights reserved — CC by-sa