(Answering here as no separate question for sorted data.)
If the data were sorted, you could use VLOOKUP
with the range_lookup
argument TRUE
(or omitted, since it's the default), which is officially described for Excel as "search for approximate match".
In other words, for sorted data:
- setting the last argument to
FALSE
returns the first value, and
- setting the last argument to
TRUE
returns the last value.
This is largely undocumented and obscure, but dates to VisiCalc (1979), and today holds at least in Microsoft Excel, LibreOffice Calc, and Google Sheets. It is ultimately due to the initial implementation of LOOKUP
in VisiCalc (and thence VLOOKUP
and HLOOKUP
), when there was no fourth parameter. The value is found by binary search, using inclusive left bound and exclusive right bound (a common and elegant implementation), which results in this behavior.
Technically this means that one starts the search with the candidate interval [0, n)
, where n
is the length of the array, and the loop invariant condition is that A[imin] <= key && key < A[imax]
(the left bound is <= the target, the right bound, which starts one after the end, is > the target; to validate, either check values at endpoints before, or check result after), and successively bisecting and choosing whichever side preserves this invariant: by exclusion one side will, until you get to an interval with 1 term, [k, k+1)
, and the algorithm then returns k
. This need not be an exact match (!): it's just the closest match from below. In case of duplicate matches, this results in returning the last match, as it requires that the next value be greater than the key (or the end of the array). In case of duplicates you need some behavior, and this is reasonable and easy to implement.
This behavior is stated explicitly in this old Microsoft Knowledge Base article (emphasis added): "XL: How to Return the First or Last Match in an Array" (Q214069):
You can use the LOOKUP() function to search for a value within an array of sorted data and return the corresponding value contained in that position within another array. If the lookup value is repeated within the array, it returns the last match encountered. This behavior is true for the VLOOKUP(), HLOOKUP(), and LOOKUP() functions.
Official documentation for some spreadsheets follow; in neither is the "last match" behavior stated, but it's implied in the Google Sheets documentation:
Microsoft Excel
TRUE assumes the first column in the table is sorted either numerically or alphabetically, and will then search for the closest value.
Google Sheets:
If is_sorted
is TRUE
or omitted, the nearest match (less than or equal to the search key) is returned
I'm confused about that literal "c" - I'd think that the evaluation is always false, so what does it really do? – Torben Gundtofte-Bruun – 2014-07-11T19:23:47.937
I've tested your suggestion (and checked that it was accepted as an array formula). I assume Col A is payee and B is category, right? Unfortunately, LibreOffice returns "ERR:502" which translates into "Invalid argument: Function argument is not valid. For example, a negative number for the SQRT() function, for this please use IMSQRT()". I checked that all the functions exist with that name in LibreOffice, but I wonder whether LibreOffice's
IF
can't handle arrays. – Torben Gundtofte-Bruun – 2014-07-11T19:34:24.090Sorry, the literal "c" was just the payee name you wanted to match. That was a relic from my sample data I was playing with. I assume that will be replaced with a cell reference in your sheet. – Excellll – 2014-07-11T20:21:19.450
@TorbenGundtofte-Bruun Care to share the formula you're using? I may be able to troubleshoot it if I can see it. Also, you can always try to step through the formula with
Evaluate Formula
to see which part of the formula is generating the error. This feature exists in Excel, and I'd be surprised if LibreOffice Calc doesn't have the same feature. – Excellll – 2014-07-11T20:25:54.850My original formula is straightforward, that's why it's not adequate :-)
=VLOOKUP(J1061;$J$2:$K$9999;2;0)
where col J contains payees and col K the categories. It returns the first match as expected. – Torben Gundtofte-Bruun – 2014-07-12T11:20:15.250My formula based on your suggestion is:
=INDEX(IF($J$2:$J$9999=J1061;$K$2:$K$9999);MAX(IF($J$2:$J$9999=J1061;ROW($J$2:$J$9999))))
– Torben Gundtofte-Bruun – 2014-07-12T11:23:15.700@TorbenGundtofte-Bruun Your data starts at row 2, so you need to adjust the
ROW()
part of the formula by subtracting 1 to make the indices and row numbers match. New formula to try:=INDEX(IF($J$2:$J$9999=J1061;$K$2:$K$9999);MAX(IF($J$2:$J$9999=J1061;ROW($J$2:$J$9999)-1)))
– Excellll – 2014-07-12T16:23:52.183