return to home page
Technology
  The DataLever Portfolio    
 

Entity identification, householding, and fuzzy matching

Entity identification, matching, deduplication, record linkage, and householding are essentially the same process: finding records that refer to the same entity without relying on exact matching. DataLever™ offers a variety of tools and techniques for solving this common yet difficult problem.

The power of fuzzy matching

A sophisticated approach to the entity identification problem is fuzzy matching, a technique for finding data that is similar but not necessarily identical. Fuzzy matching is used to compensate for errors in data entry and phonetics, and produces a similarity measure using any combination of fields. In DataLever, fuzzy matching can solve any of the following problems:

  • Cross-match two or more tables to perform a fuzzy join. Often the results of a fuzzy join are used to create hard keys linking the tables.
  • Match records between one or more tables, and household the results into groups. This most often is used to group individuals within an organization (family, business, network, and so on), but can also find links of the same entity (person, place, company) across multiple databases.
  • Match incoming records to a “universe”, and append information from the universe to incoming records for further modeling.
  • Detect and remove duplicates within a single table, or roll up duplicates into a single master record.
  • Merge multiple tables, and purge duplicates from the collection, giving priority to some tables over others.

Matching is highly configurable. Different match criteria can be specified for each pass, picking up matches missed by other comparisons and improving match rates. Matches can be differentiated by the quality (or tightness) of the match, splitting results into “high” and “low” quality comparisons. Confidence factors can be generated with different match configurations.

 


   
       
         
    Copyright ©1998-2008 DataLever Corporation. All rights reserved.