Nearest Neighbor

Summary	This assignment engages students in basic Machine-Learning concepts and implementation, including classification and similarity-based search, with minimal background knowledge. Students are tasked with implementing representative distance functions and applying them towards the task of classification across several small datasets.
Topics	Similarity-based search, classification
Audience	K-12/CS1 or AI/CS-Outreach
Difficulty	Difficulty: easy/moderate. Supplied homework would require 1 week to complete.
Strengths	The primary strength of this project is exposure to real-world Machine-Learning concepts and a useful algorithm with little-to-no computation background. The presentation and type-along program assume no CS/AI/ML/programming background, with only minor arithmetic assumptions. The homework can be used within one month of a typical CS1 class (dependent upon instruction language), includes self-grading tests for fast feedback, and offers numerous possibilites for student extension/exploration. The accompanying practice problems prepare students for the syntactic and conceptual programming necessary to implement nearest-neighbor classification and associated distance functions. The materials have been succssfully employed within an undergraduate CS1-level course, a graduate introductory CS course, and one outreach event to attract women of diverse backgrounds to study Computer Science.
Weaknesses	Given the objective to minimize requisite background knowledge, the depth and extent of the assignment is quite limited. Supplied materials assume Python 3 - nearly any language would work, though rate of deployment in a class might slow.
Dependencies	Knowledge: CS1: Python (variables, expressions, lists, loops, conditionals, functions, interactive execution) Outreach: nothing Requirements: Python 3
Variants	Obvious extensions include file I/O for more interesting datasets, visualizing/explaining results automatically, kNN (i.e. k>1). For outreach, once similarity-based search is explained, similarity-based clustering is a natural next step.
Acknowledgments	We would like to acknowledge and express our gratitude to Byron Wallace for valuable discussion on the assignment, contributions to the handout, and support in evaluating the assignment within an introductory data science course.

Summary

This assignment engages students in basic Machine-Learning concepts and implementation, including classification and similarity-based search, with minimal background knowledge. Students are tasked with implementing representative distance functions and applying them towards the task of classification across several small datasets.

Topics

Similarity-based search, classification

Audience

K-12/CS1 or AI/CS-Outreach

Difficulty

Difficulty: easy/moderate. Supplied homework would require 1 week to complete.

Strengths

The primary strength of this project is exposure to real-world Machine-Learning concepts and a useful algorithm with little-to-no computation background. The presentation and type-along program assume no CS/AI/ML/programming background, with only minor arithmetic assumptions. The homework can be used within one month of a typical CS1 class (dependent upon instruction language), includes self-grading tests for fast feedback, and offers numerous possibilites for student extension/exploration. The accompanying practice problems prepare students for the syntactic and conceptual programming necessary to implement nearest-neighbor classification and associated distance functions. The materials have been succssfully employed within an undergraduate CS1-level course, a graduate introductory CS course, and one outreach event to attract women of diverse backgrounds to study Computer Science.

Weaknesses

Given the objective to minimize requisite background knowledge, the depth and extent of the assignment is quite limited. Supplied materials assume Python 3 - nearly any language would work, though rate of deployment in a class might slow.

Dependencies

Knowledge:

CS1: Python (variables, expressions, lists, loops, conditionals, functions, interactive execution)
Outreach: nothing

Requirements:

Python 3

Variants

Obvious extensions include file I/O for more interesting datasets, visualizing/explaining results automatically, kNN (i.e. k>1). For outreach, once similarity-based search is explained, similarity-based clustering is a natural next step.

Acknowledgments

We would like to acknowledge and express our gratitude to Byron Wallace for valuable discussion on the assignment, contributions to the handout, and support in evaluating the assignment within an introductory data science course.

Assignment Components

Instructions
Textual overview of the project; example inputs/outputs
Starter Code
Commented Python code with missing blocks; unit tests; practice problems for a supplementary-instruction/lab
Outreach Materials
ML-introductory slides (with type-along Python code), demo program

Nearest Neighbor Classification with almost no background

Assignment Components

Instructions

Starter Code

Outreach Materials

Nearest Neighbor Classification
with almost no background