The ${\displaystyle k}$-nearest neighbor (k-NN) algorithm can be utilized for both classification and regression. It is commonly used when there is little or no prior knowledge about the distribution of the data. In other words, the advantage of ${\displaystyle k}$-nearest neighbors is that it enables discriminant analysis when reliable parametric estimates of probability densities are unknown. Note that ${\displaystyle k}$-nearest neighbor is a local and non-parametric method where the parameter ${\displaystyle k}$ controls the size of locality.

In order to perform ${\displaystyle k}$-nearest neighbors, a suitable ${\displaystyle k}$ must be chosen along with a distance measurement that is acceptable for the type of predictor data. Specifically, if the predictors are numerical, a Euclidean distance (such as the ${\displaystyle L_{1},L_{2}}$, or ${\displaystyle L_{\infty }}$ norms) can be used. If the data are categorical, some options are the Hamming distance and the Value Difference Measure (VDM). In practice, the choice of ${\displaystyle k}$ is usually determined by use of ${\displaystyle k}$-fold cross-validation. [1]

## K-nn Algorithm

Consider a matrix of training predictors ${\displaystyle {\textbf {X}}={\begin{bmatrix}x_{1,1}&x_{1,2}&\cdots &x_{1,m}\\x_{2,1}&x_{2,2}&\cdots &x_{2,m}\\\vdots &\vdots &\ddots &\vdots \\x_{n,1}&x_{n,2}&\cdots &x_{n,m}\\\end{bmatrix}}={\begin{bmatrix}{\textbf {x}}_{1}\\{\textbf {x}}_{2}\\\vdots \\{\textbf {x}}_{n}\\\end{bmatrix}}}$ with an associated response vector ${\displaystyle {\textbf {y}}={\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\\\end{bmatrix}}.}$ Note that ${\displaystyle {\textbf {y}}}$ may be either a vector of classes (i.e. categorical responses) or a vector of numerical values. The former case is known as ${\displaystyle k}$-nearest neighbors classification, while the latter is known as ${\displaystyle k}$-nearest neighbor regression. Next, consider a matrix ${\displaystyle {\textbf {Z}}}$ of test predictors for which we would like to predict the response. Denote ${\displaystyle {\textbf {Z}}}$ as ${\displaystyle {\textbf {Z}}={\begin{bmatrix}z_{1,1}&z_{1,2}&\cdots &z_{1,m}\\z_{2,1}&z_{2,2}&\cdots &z_{2,m}\\\vdots &\vdots &\ddots &\vdots \\z_{p,1}&z_{p,2}&\cdots &z_{p,m}\\\end{bmatrix}}={\begin{bmatrix}{\textbf {z}}_{1}\\{\textbf {z}}_{2}\\\vdots \\{\textbf {z}}_{n}\\\end{bmatrix}}.}$

The ${\displaystyle k}$-nearest neighbors algorithm is as follows:

Step 1: Choose ${\displaystyle k}$

Step 2: For each ${\displaystyle {\textbf {z}}_{i}}$ measure the distance to the ${\displaystyle k}$-nearest ${\displaystyle {\textbf {x}}_{j}}$'s. In other words measure the distance from each observation in ${\displaystyle {\textbf {Z}}}$ to the ${\displaystyle k}$-nearest observations in ${\displaystyle {\textbf {X}}}$.

Step 3: For classification (i.e. if ${\displaystyle {\textbf {y}}}$ is a categorical vector of responses), predict the response for ${\displaystyle {\textbf {z}}_{i}}$ as the class (i.e. category) that occurs the greatest number of times among the ${\displaystyle k}$ nearest ${\displaystyle {\textbf {x}}_{j}'s}$. For regression (i.e. if ${\displaystyle {\textbf {y}}}$ is a real valued vector of responses), predict the response for ${\displaystyle {\textbf {z}}_{i}}$ as the average of the ${\displaystyle k}$ responses among the ${\displaystyle k}$ nearest ${\displaystyle {\textbf {x}}_{j}'s}$.

## Controversies

The ${\displaystyle k}$-NN algorithm does not learn anything from the training data and simply uses the training data itself for classification. To predict the label of a new instance the ${\displaystyle k}$-NN algorithm will find the ${\displaystyle k}$ closest neighbors to the new instance from the training data, the predicted class label will then be set as the most common label among the ${\displaystyle k}$ closest neighboring points. This disadvantage can result in the algorithm not generalizing well and also not being robust to noisy data. Additionally, The estimation of ${\displaystyle k}$ can be problematic. For example, suppose ${\displaystyle k}$-fold cross validation reveals the presence of two minimums. This leads to the issue of deciding which minimum is optimal. [1]

## History

1951 : Fix and Hodges introduced a non-parametric method for pattern classification that has since become known the k-nearest neighbor rule

1967 : Cover and Hart showed that some of the formal properties of the ${\displaystyle k}$-nearest neighbor rule were worked out. [2]

## Top 5 Recent Tweets

Date Author Tweet
13 Dec 2015 @PeterOliverCaya Evaluating credit risk using K-nearest neighbors:
11 Dec 2015 @soulparking is getting my k-Nearest Neighbors running with my IRIS dataset #datascience https://www.swarmapp.com/c/juL7n4fTe8X
10 Dec 2015 @bloopish The k-NN algorithm is among the simplest of all #MachineLearning algorithms http://ow.ly/TBsPN
26 Nov 2015 @tetmorikawa #KNN (K-Nearest Neighbors) classifier can do a decent job in this context, I guess. #DataScience http://www.edvancer.in/logistic-regression-vs-decision-trees-vs-svm-part1/
7 Nov 2015 @vnfrombucharest Word Mover's Distance for K nearest neighbors document classification, @scikit_learn friendly code w/ Matt Kusner! http://vene.ro/blog/word-movers-distance-in-python.html

## Top 5 Recent News Headlines

21 Apr 2015 Kris Bryant: Which players are most similar? http://www.beyondtheboxscore.com/2015/4/21/8460365/kris-bryant-similar-players-katoh-k-nearest-neighbors
19 Jun 2015 The future of data at scale http://radar.oreilly.com/2015/06/the-future-of-data-at-scale.html
9 Jul 2015 Large-scale Direct Targeting for Drug Repositioning and Discovery http://www.nature.com/articles/srep11970
1 Jul 2015 A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods http://www.nature.com/hdy/journal/v115/n6/full/hdy201557a.html
10 AUG 2015 Amazon talks new predictive model for personalised recommendations http://www.cio.com.au/article/581562/amazon-talks-new-predictive-model-personalised-recommendations/

Date Author Tweet
4 Feb 2012 @bigdata k nearest neighbors: Parallel kNN Joins for Big Data in Hadoop/MapReduce, algo tested on 100M+ records & 30+ Dimensions http://goo.gl/XduYc
21 Feb 2012 @Gephi RT @pcbje: Visualizing k-nearest neighbors with #Gephi http://pcbje.com/post/17432986180/visualizing-k-nearest-neighbors-with-gephi
31 Mar 2014 @peterneubauer Fantastic @_nicolemargaret! RT @newsycombinator: Movie Recommendations with k-Nearest Neighbors and Cosine Similarity http://gist.neo4j.org/?8173017
18 Feb 2014 @YhatHQ K-nearest neighbors implemented in javascript http://bit.ly/1jNUiNt
2 August 2013 @yurevich Nice and easy - http://blog.yhathq.com/posts/classification-using-knn-and-python.html … - using K-nearest neighbors algorithm for classification with python and pandas