Feature tutorial

I:ZI miner brings to the web most of the unique features of the LISp-Miner's association rule mining 4ft procedure.

  • Negation on attributes
  • Disjunction between attributes
  • Subpatterns allows for scoping logical connectives
  • 18 interest measures, which can be freely combined
  • Mines directly on multivalued attributes, no need to create "items"
  • Dynamic binning operators
  • PMML-based import and export
  • Computing grid support

In addition to exposing some of the LISp-Miner capability on the web, I:ZI miner adds several new features:

  • Automatic preprocessing
  • Immediate results
  • Relevance feedback (Experimental)
  • Best pattern extension (Experimental)Automatic report generation

Description of invidiual features

Negation on attributes

Negation is a very concise way to tell the miner to focus on rules not containing a specific value or a set of values. This specific value might be hard set by the user, or left to be determined automatically. In the latter case, the binning wildcards can be used to automatically merge multiple values.

Negation

Disjunction between attributes.

Disjunction between attributes I:ZI Miner allows to input rule parts (antecedent, consequent) The disjunction connective can be placed between attributes

Subpatterns - scoping for logical connectives

Disjunction can be added also only on subpattern. To create a subpattern, first hoover with mouse over the attributes that should form the subpattern, in the floating menu which appears click on the star symbol. After you have stared all the attributes, click on "Group marked fields". Note that there must be at least three attributes in the given rule part (antecedent, consequent) for subpatterns to be enabled. Subpatterns cannot be nested, individual subpatterns are always connected by conjunction.

18 interest measures, which can be freely combined

The list of commonly used interest measures: Confidence, Support, Lift, Fischer, Chi-Square

There are also additional interest measures coming from the GUHA theory: Double Founded Implication, Founded Equivalence,Lower Critical Implication,Upper Critical Implication,Lower Critical Equivalence,Upper Critical Equivalence,Double Lower Critical Implication,Double Upper Critical Implication

Also, frequencies from the four field contingency tables can be used as interest measures: a-frequency, b-frequency, c-frequency, d-frequency, r-frequency. s-frequency, k-frequency, l-frequency.


Sukcedent ¬Sukcedent
Antecedent a b r
¬Antecedent c d s
k l

Four field contingency table

 


Mines directly on multivalued attributes, no need to create "items"

Binning wild cards are uedr to allow multivalued attributes. The individual values are connected with disjunction.

E.g. District(Benesov, Bruntal) is equivallent to District(Benesov) or District(Bruntal). Such result is produced e.g. by setting the Subset with max length 2 binning wildcard on attribute District in the Task setting.

Dynamic binning operators

To allow attributes having many values with low support to be used directly in the mining task without special preprocessing,  I:ZI Miner offers a unique feature – binning wildcards, which allow to group finegrained values on the fly, thus producing ‘items’ with higher support.

 

  • Subset 1-1 wildcard ("Simple wildcard") is the default added when new attribute is tragged to the task pane. This wildcard tells the miner to generate as many ‘items’ as there are values of the attribute. This is similar to other association rule mining systems that support multiple attributes.
  • Subset wildcard with max length n>1 instructs the miner to dynamically merge up to n values into one ‘item’ during mining.
  • Interval wildcard with max length n>1 instructs the minedr to dynamically merge up to n consecutive values into one ‘item’ during mining.
  • Cyclical interval wildcard: same as interval, but the borders of the value range are considered as consecutive
  • Left cut with max length n>1:  up to n lowest values in the attribute range are merged. This is useful for involving only extremely low attribute values.
  • Right cut with max length n>1:  up to n highest values in the attribute range are merged. This is useful for involving only extremely high attribute values.
  • Cut with max length >1: merge of functionality provided by left  cut and right cut
  • One category: Adding a Fixed value attribute to the mining setting allows the user to limit the search space only to rules containing a selected attribute-value pair.

Immediate results

Results are displayed as they are discovered. I:ZI miner does not wait until the mining finishes, but displays the rules to the user as the mining progresses through the search space. First results are displayed while mining is in progress, This feature helps to solve the common problem of association rule mining, which is too many rules being discovered. Since the user sees the going results, at any point she can interrupt the mining and refine the mining task, typically by narrowing the search space.

Relevance feedback

Discovered rules are compared with a knowledge base to check their novelty and disruptiveness. The discovered rule can be found:

  • Confirmation of a known rule: the knowledge base contains a rule, which contains in the antecedent only attributes contained in the discovered rule's  antecedent, and for each of these attributes, there is at least one overlapping value. The same must apply for the consequent.
  • Exception to a known rule: in the knowledge base, there is a rule with the same antecedent and a  consequent which overlaps at least in one attribute, and at least in one of the overlapping attributes there is no overlap in attribute values
  • Interesting rule: if neither of the above applies

If the discovered rule is only a confirmation of a known rule, it is visually suppressed by gray font. In contrast, exception to a known rule is highlighted in red.

Filling the knowledge base

The knowledge base can be input by the domain expert inputting the rules manually. This feature is supported by the knowledge base, however, there is currently no user interface to design these rules.

The main source of rules for the knowledge base is relevance feedback: the user clicks on green tick to move the rule to the Rule clipboard for later use in the report,  this stores positive relevance feedback.