How to get VLOOKUP to return the *last* match?

11

3

I'm used to working with VLOOKUP but this time I have a challenge. I don't want the first matching value, but the last. How? (I'm working with LibreOffice Calc but an MS Excel solution ought to be equally useful.)

The reason is that I have two text columns with thousands of rows, let's say one is a list of transaction payees (Amazon, Ebay, employer, grocery store, etc.) and the other is a list of spending categories (wages, taxes, household, rent, etc.). Some transactions don't have the same spending category every time, and I want to grab the most recently used one. Note that the list is sorted by neither column (in fact by date), and I don't want to change the sort order.

What I have (excluding error handling) is the usual "first-match" formula:

=VLOOKUP( 
[payee field] , [payee+category range] , [index of category column] , 
0 )

I've seen solutions like this, but I get #DIV/0! errors:

=LOOKUP(2 , 1/( [payee range] = [search value] ) , [category range] )

The solution can be any formula, not necessarily VLOOKUP. I can also swap the payee/category columns around. Just no change in sorting column, please.


Bonus points for a solution that picks the most frequent value rather than the last!

Torben Gundtofte-Bruun

Posted 2014-07-11T18:46:53.363

Reputation: 16 308

Answers

3

You can use an array formula to get data from the last matching record.

=INDEX(IF($A$1:$A$20="c",$B$1:$B$20),MAX(IF($A$1:$A$20="c",ROW($A$1:$A$20))))

Enter the formula using Ctrl+Shift+Enter.

This works like the INDEX/MATCH construction of a VLOOKUP, but with a conditional MAX used instead of MATCH.

Note that this assumes that your table starts at row 1. If your data starts at a different row, you will need to adjust the ROW(...) part by subtracting the difference between the top row and 1.

Excellll

Posted 2014-07-11T18:46:53.363

Reputation: 11 857

I'm confused about that literal "c" - I'd think that the evaluation is always false, so what does it really do? – Torben Gundtofte-Bruun – 2014-07-11T19:23:47.937

I've tested your suggestion (and checked that it was accepted as an array formula). I assume Col A is payee and B is category, right? Unfortunately, LibreOffice returns "ERR:502" which translates into "Invalid argument: Function argument is not valid. For example, a negative number for the SQRT() function, for this please use IMSQRT()". I checked that all the functions exist with that name in LibreOffice, but I wonder whether LibreOffice's IF can't handle arrays. – Torben Gundtofte-Bruun – 2014-07-11T19:34:24.090

Sorry, the literal "c" was just the payee name you wanted to match. That was a relic from my sample data I was playing with. I assume that will be replaced with a cell reference in your sheet. – Excellll – 2014-07-11T20:21:19.450

@TorbenGundtofte-Bruun Care to share the formula you're using? I may be able to troubleshoot it if I can see it. Also, you can always try to step through the formula with Evaluate Formula to see which part of the formula is generating the error. This feature exists in Excel, and I'd be surprised if LibreOffice Calc doesn't have the same feature. – Excellll – 2014-07-11T20:25:54.850

My original formula is straightforward, that's why it's not adequate :-) =VLOOKUP(J1061;$J$2:$K$9999;2;0) where col J contains payees and col K the categories. It returns the first match as expected. – Torben Gundtofte-Bruun – 2014-07-12T11:20:15.250

My formula based on your suggestion is: =INDEX(IF($J$2:$J$9999=J1061;$K$2:$K$9999);MAX(IF($J$2:$J$9999=J1061;ROW($J$2:$J$9999)))) – Torben Gundtofte-Bruun – 2014-07-12T11:23:15.700

@TorbenGundtofte-Bruun Your data starts at row 2, so you need to adjust the ROW() part of the formula by subtracting 1 to make the indices and row numbers match. New formula to try: =INDEX(IF($J$2:$J$9999=J1061;$K$2:$K$9999);MAX(IF($J$2:$J$9999=J1061;ROW($J$2:$‌​J$9999)-1))) – Excellll – 2014-07-12T16:23:52.183

2

(Answering here as no separate question for sorted data.)

If the data were sorted, you could use VLOOKUP with the range_lookup argument TRUE (or omitted, since it's the default), which is officially described for Excel as "search for approximate match".

In other words, for sorted data:

  • setting the last argument to FALSE returns the first value, and
  • setting the last argument to TRUE returns the last value.

This is largely undocumented and obscure, but dates to VisiCalc (1979), and today holds at least in Microsoft Excel, LibreOffice Calc, and Google Sheets. It is ultimately due to the initial implementation of LOOKUP in VisiCalc (and thence VLOOKUP and HLOOKUP), when there was no fourth parameter. The value is found by binary search, using inclusive left bound and exclusive right bound (a common and elegant implementation), which results in this behavior.

Technically this means that one starts the search with the candidate interval [0, n), where n is the length of the array, and the loop invariant condition is that A[imin] <= key && key < A[imax] (the left bound is <= the target, the right bound, which starts one after the end, is > the target; to validate, either check values at endpoints before, or check result after), and successively bisecting and choosing whichever side preserves this invariant: by exclusion one side will, until you get to an interval with 1 term, [k, k+1), and the algorithm then returns k. This need not be an exact match (!): it's just the closest match from below. In case of duplicate matches, this results in returning the last match, as it requires that the next value be greater than the key (or the end of the array). In case of duplicates you need some behavior, and this is reasonable and easy to implement.

This behavior is stated explicitly in this old Microsoft Knowledge Base article (emphasis added): "XL: How to Return the First or Last Match in an Array" (Q214069):

You can use the LOOKUP() function to search for a value within an array of sorted data and return the corresponding value contained in that position within another array. If the lookup value is repeated within the array, it returns the last match encountered. This behavior is true for the VLOOKUP(), HLOOKUP(), and LOOKUP() functions.

Official documentation for some spreadsheets follow; in neither is the "last match" behavior stated, but it's implied in the Google Sheets documentation:

  • Microsoft Excel

    TRUE assumes the first column in the table is sorted either numerically or alphabetically, and will then search for the closest value.

  • Google Sheets:

    If is_sorted is TRUE or omitted, the nearest match (less than or equal to the search key) is returned

Nils von Barth

Posted 2014-07-11T18:46:53.363

Reputation: 191

That nearest match thing was driving me crazy! – dukedave – 2018-03-20T21:42:56.277

1

If the values in the search array are sequential (i.e. you're looking for the largest value, such as the latest date), you don't even need to use the INDIRECT function. Try this simple code:

=MAX(IF($A$1:$A$20="c",$B$1:$B$20,)

Again, enter the formula using CTRL + SHIFT + ENTER

The Stich

Posted 2014-07-11T18:46:53.363

Reputation: 11

0

=LOOKUP([payee field] , [payee range] , [category range])

This will get you the last value

Do I get bonus points for being 3 years late?

DavePenn

Posted 2014-07-11T18:46:53.363

Reputation: 328

0

I had a go at the most frequent value. Not sure if it would work in libreOffice, but it seems to work in excel

=INDEX($B$2:$B$9,MATCH(MAX(--($A$2:$A$9=D2)*COUNTIFS($B$2:$B$9,$B$2:$B$9,$A$2:$A$9,D2)),--($A$2:$A$9=D2)*COUNTIFS($B$2:$B$9,$B$2:$B$9,$A$2:$A$9,D2),0))

Column A would be the payee, column B would be the category, D2 is the payee you want to filter by. I'm not sure why it's putting extra linebreaks in the function above.

My function to find the last cell would be as follows:

=INDIRECT("B" & MAX(--($A$2:$A$9=D2)*ROW($A$2:$A$9)))

Indirect lets me specify the column I want to return and find the row directly (so I don't need to subtract the number of header rows.

Both these functions need to be entered using Ctrl+shift+enter

gtwebb

Posted 2014-07-11T18:46:53.363

Reputation: 2 852

-1

You got #DIV/0! errors because you should rather write your formula like :

=LOOKUP(2;IF(([payee range] = [search value]);1;"");[category range])

this will work and will found last match.

([payee range] = [search value]) : boolean matrix TRUE/FALSE

IF(([payee range] = [search value]);1;"") : pseudo-boolean matrix 1/""

=LOOKUP(2; {pseudo-boolean matrix 1/""} );[category range]) : alway return last 1 position

Samuel

Posted 2014-07-11T18:46:53.363

Reputation: 1

LOOKUP works only on sorted list, the output of your comarison will result in a list of 1s and spaces in a non-sorted way, so it won't give correct result. – Máté Juhász – 2015-09-24T11:19:29.677