三種基本的frequent itemset mining方法
(1) Apriori principle
- A downward closure property(向下封閉)
a k-itemset is frequent only if all of its sub-itemsets are frequent.
all subset of a frequent itemset must be frequent.
//一個itemset只有當他所有的subset都是frequent時,才會是frequent - Apriori property
any super-pattern of a nonfrequent pattern cannot be frequent - Antimonotone, if a set cannot pass a test, all its super set will fail the same test as well. It is called "antimonotone" because the property is monotonice in the context of failing a test.
//若一個itemset不frequent,它的superset也不會是frequent - horizontal data format
{TID: itemset}
- FP-tree: frequent pattern tree
- horizontal data format
- vertical data format
{item: TID_set}
a pattern a is a closed frequent pattern in a data set in D if
(1) a is frequent
(2) there exists no proper super-pattern b such that b has the same support as a in D
a pattern a is a maximal frequent pattern in a data set in D if
(1) a is frequent
(2) there exists no super-pattern b such that a
(3) b is frequent in D
Sequential pattern mining
常見的幾種算法
GSP: A Sequential Pattern Mining Algorithm Based on Candidate Generate-and-Test
SPADE: An Apriori-Based Vertical Data Format Sequential Pattern Mining Algorithm
PrefixSpan: Prefix-Projected Sequential Pattern Growth
性能: PrefixSpan > SPADE > GSP
1 則留言:
Interesting story you got here. It would be great to read something more about this theme. Thnx for posting that info.
Joan Stepsen
Hi tech gadgets
張貼留言