Frequent pattern discovery

Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of finding the most frequent and relevant patterns in large datasets.[1][2] The concept was first introduced for mining transaction databases.[3] Frequent patterns are defined as subsets (itemsets, subsequences, or substructures) that appear in a data set with frequency no less than a user-specified or auto-determined threshold.[2][4]

Techniques

Techniques for FP mining include:

For the most part, FP discovery can be done using association rule learning with particular algorithms Eclat, FP-growth and the Apriori algorithm.

Other strategies include:

and respective specific techniques.

Implementations exist for various machine learning systems or modules like MLlib for Apache Spark.[5]

gollark: Yes, all hail xenoodles.
gollark: I also do quite like not having serious conversations be derailed, but mints are nice.
gollark: I like xenowyrms.
gollark: Yes, let the hub fester in its misery.
gollark: *searches current hatchlings/eggs for codes, finds none*

References

  1. Jiawei Han; Hong Cheng; Dong Xin; Xifeng Yan (2007). "Frequent pattern mining: current status and future directions" (PDF). Data Mining and Knowledge Discovery. 15: 55–86. doi:10.1007/s10618-006-0059-1. Retrieved 2019-01-31.
  2. "Frequent Pattern Mining". SIGKDD. 1980-01-01. Retrieved 2019-01-31.
  3. Agrawal, Rakesh; Imieliński, Tomasz; Swami, Arun (1993-06-01). "Mining association rules between sets of items in large databases". ACM SIGMOD Record. 22 (2): 207–216. CiteSeerX 10.1.1.217.4132. doi:10.1145/170036.170072. ISSN 0163-5808.CS1 maint: ref=harv (link)
  4. "Frequent pattern Mining, Closed frequent itemset, max frequent itemset in data mining". T4Tutorials. 2018-12-09. Retrieved 2019-01-31.
  5. "Frequent Pattern Mining". Spark 2.4.0 Documentation. Retrieved 2019-01-31.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.