Nearest centroid classifier
In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.

Rocchio Classification
When applied to text classification using tf*idf vectors to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]
An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]
Algorithm
- Training procedure: given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to class .
- Prediction function: the class assigned to an observation is .
gollark: ```Internet Protocols and Support webbrowser — Convenient Web-browser controller cgi — Common Gateway Interface support cgitb — Traceback manager for CGI scripts wsgiref — WSGI Utilities and Reference Implementation urllib — URL handling modules urllib.request — Extensible library for opening URLs urllib.response — Response classes used by urllib urllib.parse — Parse URLs into components urllib.error — Exception classes raised by urllib.request urllib.robotparser — Parser for robots.txt http — HTTP modules http.client — HTTP protocol client ftplib — FTP protocol client poplib — POP3 protocol client imaplib — IMAP4 protocol client nntplib — NNTP protocol client smtplib — SMTP protocol client smtpd — SMTP Server telnetlib — Telnet client uuid — UUID objects according to RFC 4122 socketserver — A framework for network servers http.server — HTTP servers http.cookies — HTTP state management http.cookiejar — Cookie handling for HTTP clients xmlrpc — XMLRPC server and client modules xmlrpc.client — XML-RPC client access xmlrpc.server — Basic XML-RPC servers ipaddress — IPv4/IPv6 manipulation library```Why is there, *specifically*, **in the standard library**, a traceback manager for CGI scripts?
gollark: ```Structured Markup Processing Tools html — HyperText Markup Language support html.parser — Simple HTML and XHTML parser html.entities — Definitions of HTML general entities XML Processing Modules xml.etree.ElementTree — The ElementTree XML API xml.dom — The Document Object Model API xml.dom.minidom — Minimal DOM implementation xml.dom.pulldom — Support for building partial DOM trees xml.sax — Support for SAX2 parsers xml.sax.handler — Base classes for SAX handlers xml.sax.saxutils — SAX Utilities xml.sax.xmlreader — Interface for XML parsers xml.parsers.expat — Fast XML parsing using Expat```... why.
gollark: There is no perfect language.
gollark: ```Internet Data Handling email — An email and MIME handling package json — JSON encoder and decoder mailcap — Mailcap file handling mailbox — Manipulate mailboxes in various formats mimetypes — Map filenames to MIME types base64 — Base16, Base32, Base64, Base85 Data Encodings binhex — Encode and decode binhex4 files binascii — Convert between binary and ASCII quopri — Encode and decode MIME quoted-printable data uu — Encode and decode uuencode files```Mostly should be libraries outside of the python core, and why are they not under file formats?
gollark: ```Concurrent Execution threading — Thread-based parallelism multiprocessing — Process-based parallelism The concurrent package concurrent.futures — Launching parallel tasks subprocess — Subprocess management sched — Event scheduler queue — A synchronized queue class _thread — Low-level threading API _dummy_thread — Drop-in replacement for the _thread module dummy_threading — Drop-in replacement for the threading module```Not THAT bad, since they mostly do different things.
See also
References
- Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press.
- Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences. 99 (10): 6567–6572. doi:10.1073/pnas.082099299. PMC 124443. PMID 12011421.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.