Exercises on Machine Learning

Part 1: Watch the 6 videos on YouTube from the online course “Data Mining with WEKA”

You are free to sign up for the course (but no need to and don’t take this as advocating it) and watch the same videos.

In either case, you should download the WEKA software and follow along with the videos. The download instructions are in the second video (I believe that the latest stable version is 3.8, but the process is pretty automated and I think you’ll get the latest version suitable for your computer). All this will require about 90 minutes, and if you speed up the video it will probably less.

The narrator is entertaining and well known in data mining.

You can work on the exercise below alone or you can do the exercise in a group, but you must be together physically or on a video-call during the entirety of the exercise itself. If you do the exercise in a group, you will each turn in a copy of the deliverable (and state the names of your partners in the submission text that accompanies the upload).

Part 2: (Backstory) Agriculture is an important aspect of sustainability. Imagine farmers in the developing world, where mobile phones are somewhat prevalent, taking a photo of a diseased soybean plant, and sending it to an international agency for purposes of diagnosis and treatment options. At the core of the receiving agency’s analysis is a classifier that was constructed by a supervised machine learning system. The photo from the farmer is image processed and translated to a set of features of the type that is accepted by the classifier. The imaged plant is diagnosed through classification, and recommendations are sent back to the farmer.

Part 3: (the deliverable).

  • Run WEKA Explorer.
  • Upload the J48 with  default options, except
    • the minNumObj set to 10
  • Upload the Weka Soybean dataset, but
    • only for instances that belong to one of the four most common classes, and
    • with the Date attribute removed
  • Upload to Brightspace an image (jpg, png, etc) of the tree (in nested list format) that is constructed from running J48 on the filtered Soybean dataset as described above.
  • Come to class prepared to discuss the trees you found with various other parameter settings and your theories about the results.