Merge two files with fuzzy / inexact key column

0

I have two spreadsheets with different dimensions. They share a common column, but not all the names are 100% similar. Specifically, I'm merging two data files from counties around the US, one has economic data and another has voting data

In one file, I may have "Harris County" and another would be "Harris" both under the common column. What is the best way to match files, either in Excel or R (though i am less familiar with R). I thought VLOOKUP had a solution for this, but I haven't been able to to get it to work.

County  Total Vote  C   O   E   R   Margin  %Margin Clinton Obama   Edwards Richardson  Other       Clinton Obama   Edwards Richardson  Other
Harris  12,525  1   2   3   4   41  0.33%   37.31%  36.98%  18.85%  4.31%   2.55%   4,673   4,632       2,361   540 319

STATEFP10   COUNTYFP10  census track    county  state   economic variable
48  201 Census Tract 2225.01     Harris County   Texas  5
48  201 Census Tract 4214.01     Harris County   Texas  6

The key columns don't differ in any order 100% of the time, so i can't just do a find and replace. What is a good way to merge these files? The one with the voting information has far (far) fewer rows and columns.

tom

Posted 2014-11-22T00:57:52.633

Reputation: 101

Answers

2

Tom, the most robust solution would be using the Fuzzy add-in. A number of other VBA solutions have been generated using the Levenshtein distance, but I suggest you try the add-in first.

http://www.microsoft.com/en-us/download/details.aspx?id=15011

Petr Fedorov

Posted 2014-11-22T00:57:52.633

Reputation: 559