2009年7月23日 星期四

Frequent pattern mining

三種基本的frequent itemset mining方法

(1) Apriori principle
  • A downward closure property(向下封閉)
    a k-itemset is frequent only if all of its sub-itemsets are frequent.
    all subset of a frequent itemset must be frequent.
    //一個itemset只有當他所有的subset都是frequent時,才會是frequent

  • Apriori property
    any super-pattern of a nonfrequent pattern cannot be frequent

  • Antimonotone, if a set cannot pass a test, all its super set will fail the same test as well. It is called "antimonotone" because the property is monotonice in the context of failing a test.

    //若一個itemset不frequent,它的superset也不會是frequent

  • horizontal data format
    {TID: itemset}
(2) FP-growth
  • FP-tree: frequent pattern tree
  • horizontal data format
(3) eclat
  • vertical data format
    {item: TID_set}
closed frequent pattern & maximal frequent pattern (max-pattern)
a pattern a is a closed frequent pattern in a data set in D if
(1) a is frequent
(2) there exists no proper super-pattern b such that b has the same support as a in D

a pattern a is a maximal frequent pattern in a data set in D if
(1) a is frequent
(2) there exists no super-pattern b such that aclip_image002b
(3) b is frequent in D

Sequential pattern mining

常見的幾種算法
GSP: A Sequential Pattern Mining Algorithm Based on Candidate Generate-and-Test
SPADE: An Apriori-Based Vertical Data Format Sequential Pattern Mining Algorithm
PrefixSpan: Prefix-Projected Sequential Pattern Growth

性能: PrefixSpan > SPADE > GSP

1 則留言:

匿名 提到...

Interesting story you got here. It would be great to read something more about this theme. Thnx for posting that info.
Joan Stepsen
Hi tech gadgets