1
1
The general question entails sorting a large Excel 2007 list to find entries that match smaller subset list.
I have a couple of ideas on how to approach the problem, but I lack the technical sophistication to implement those ideas. I will outline my specific use-case requirement to make the question more clear.
Specific Example:
I have a master list of company names that I manage for my sales territory (approximately 1000 customer accounts.) Every week my company publishes a list of all transacted business across every sales territory in the U.S. (mine and hundreds of other territories.) This transaction log is 10,000+ lines so scanning by eye to find transactions associated with my accounts is nearly impossible.
My current inadequate solution is to highlight my account list in yellow, copy that highlighted list, then paste that highlighted list at the bottom of the weekly transaction log, then sort A-Z, then scroll through manually to the highlighted items. If the transaction log contains one of my accounts, the transaction log entry will be directly above or below the highlighted entry that I inserted. This method is effective but extremely time consuming.
I know how to eliminate duplicates in Excel. Is there a way to eliminate everything BUT duplicates? This would make visually scanning the list easier.
Another problem remains because data inconsistency has limited the use of simple macros, filters, or the "find duplicates" button. Transaction log names are often spelled slightly different than on my master list.
Ex: Acme Widget Company, Inc.; Acme Widget Inc; Acme Widget; Ex: United States Hand-ball Organization; U.S. Handball Org; U S Handball; USHO
I know there are some third-party apps that can use fuzzy logic to match non-exact entries. However, I cannot run plug-ins on my enterprise machine. (Unless there is a very compelling case...)
Is there a macro that could 'normalize' the transaction log by eliminating spaces and punctuation? Is there a macro that can match the first X number of characters (more characters = higher accuracy, but greater chance of missing a near-duplicate entry...)? Is there a macro that can output or filter the resulting 'match' list?
If those tasks are too complicated, I have a much simpler idea. After merging my highlighted account list into the transaction log, it would be nice to be able to hide all other transaction log lines that are less than 5 lines above or below my highlighted items. This would allow some flexibility for non-standard spellings, but greatly simplify the task of visual inspection through the list.
Any input on how to implement these ideas - or completely different approaches - would be greatly appreciated. I think the general answer to this question will be valuable to others beyond the narrow use-case that I have described.
Thanks!
2@Chris, this post has too many questions. In the future, you should ask each one separately. – hyperslug – 2009-08-27T05:17:54.227
In example 2 (US handball), even removing whitespace + special chars and matching first X chars would produce low confidence matches: "US" is not close to "United". But a decent macro might be able to pull out some matches. Not sure if that would reduce your workload or still leave too much manual process. Again, you should break this up or indicate in which area would a solution save you the most time: remove non-duplicates? fuzzy matching? hide lines outside +- 5? – hyperslug – 2009-08-27T05:32:13.670