I'm trying to do a mailbox search against several mailboxes. I need to find all messages that mention some words AND don't mention other words EXCEPT when those words are on both lists. This is a complex search. I don't know how to twist the Keyword Query Language logic to accomplish this.

I recognize the search command will be complex and take a long time to run. That's fine.


Search for items that mention Teddy.
I want items that mention Roosevelt. I want items that mention mugwump
I want items that mention Rough Riders
I don't want anything that mentions Franklin Roosevelt.
BUT if a message mentions both Teddy and Franklin, I need that.
I don't want anything with "new-deal" in the subject
I don't want anything that mentions Truman.
I want to include wildcards.

Searching for "Roosevelt" will definitely bring in "Franklin." I can't unilaterally exclude "Franklin" without missing some "Teddy"

My current query:

New-MailboxSearch -SearchQuery "Teddy* OR mugwump* OR Roosevelt* OR 'Rough Riders*' NOT ((Franklin* NOT Teddy*) OR Truman* OR subject:'new-deal')"

Ran the above search. Received 9,500 hits. I found some e-mails with "mugwump" and "franklin pierce". Something's not right here.

Back when I ran "Teddy* OR Roosevelt*" I received 10,000 hits.
When I ran "Teddy* OR Roosevelt* NOT (Franklin* NOT Teddy*)" I received 9,500 hits. So I thought it worked?

In case it was a parenthetical issue, I also tried "(Teddy* OR Roosevelt*) NOT (Franklin* NOT Teddy*)". I still received 9,500 hits.
Bracketing the positive search terms didn't make a difference.

I then tried "(Teddy* OR Roosevelt*) NOT (Franklin*) I still received 9,500 hits. There were Franklin's in the results so something is really off here.

Is it a quotation problem? I can't find clear documentation on single quotes ' ' vs double quotes " ", vis-à-vis how they affect search operators and parentheses.
Microsoft's KQL documentation doesn't mention it. Most of the Google hits for KQL are for Sharepoint, which has a different bent (and options) compared to Exchange. A number of Exchange hits are actually for AQS.
I haven't found good examples of a complex KQL query like mine with nested search terms....
I tried swapping all of the single quotes with double quotes. It brought my results down to 4,000. It was just an -EstimateOnly, I haven't had a chance to run the actual job and inspect the results.

1. Will the "NOT (Franklin* NOT Teddy*) double-negative trip over itself?
2. Is there another way to say, "include this, unless it also includes that" ?
3. Is there a better way of arranging the parentheticals?
4. Am I doing something wrong with my quotations?
I FIGURED OUT A SOLUTION. I wanted to share my final query, along with some important lessons I learned along the way.

You can only use one parenthetical in a query. However, you can nest another set of parentheses inside that query.
Ex: -SearchQuery "(a OR b) AND (c OR d)" WILL PRODUCE AN ERROR
Ex: -SearchQuery "a AND (b AND (c OR d))" WORKS
I don't know if you can use multiple parentheticals inside of the parent, or if you can nest a third level down. Didn't have to figure it out.

Confession, I never figured out if the order of operands affects how the SearchQuery parses the results. My final query was mostly comprised of OR statements so I didn't spend any time digging into it.
Ex: -SearchQuery "a AND b OR c AND d"
    - is this equivalent to "(a AND b) OR (c AND d)"?
    - or is this equivalent to "a AND (b OR c) AND d"?
I don't know. Didn't have to figure it out.
Something for future readers to keep in mind.

You can use { }, ' ', or " " for wrapping your search query.
-SearchQuery {wordone OR wordtwo OR "word three"}
-SearchQuery 'wordone OR wordtwo OR "word three"'
-SearchQuery "wordone OR wordtwo OR 'word three'"
I think there might be some idiosyncrasies with how each wrapper parses literal terms with quotes/doublequotes, or wildcards, inside the query. I had some uncertainty with my live tests that made me think that.
By the time I created a test mailbox with a small set of test messages to experiment upon, I had settled on singlequotes with doublequotes inside. That's what I was using when I figured out what worked, so I never went back for further experiments.

I created a test mailbox. I sent that mailbox 16 e-mails.
The subject of each e-mail was "(01 to 16) (Good or Bad)".
The body of each e-mail contained the search terms in various patterns based on real examples.
The desirable pattern e-mails used "Good" in the subject and the undesired used "Bad".
Ex: Subject: "05 Good", Body: "Roosevelt mugwump"
Ex: Subject: "07 Bad", Body: "Franklin Roosevelt"
Ex: Subject: "08 Good", Body: "Teddy Franklin Roosevelt"
If my query result returned any "Bad" e-mails I knew it failed.
If my query result didn't return a "Good" e-mail I knew it failed.
I used this environment to figure out the logic and prune unwanted results.

Using that test mailbox and the lessons above I constructed the following query:

New-MailboxSearch -Name 2019Feb7test13 -Force -SourceMailboxes user1,user2,user3 -StartDate 1/1/1890 -EndDate 1/1/1940 -SearchQuery 'teddy* OR roosevelt* OR mugwump* OR "rough riders*" NOT (franklin* NOT (teddy* OR mugwump* OR "rough riders")) NOT "new-deal" NOT truman*' -TargetMailbox discoveryresults -TargetFolder 2019Feb7test13 -ExcludeDuplicateMessages $true -LogLevel Full -StatusMailRecipients me

(My real query was a bit more complicated than this, but this example works to illustrate my solution.)

Breaking this -SearchQuery apart we have:
Matches any of: teddy, roosevelt, mugwump, rough riders
Does not match: new-deal, truman
Does not match: franklin when it does not also match teddy, mugwump, rough riders

The first condition picks up all "roosevelt" including "franklin roosevelt".
The third condition discards anything with "franklin" without also teddy, mugwump, or rough riders.

If I couldn't find a solution to this issue my backup plan would have had two parts.
1. Do the search without any filtering. Just the positive OR terms. Send the results to a temporary mailbox, not the Discovery box.
2. Run a Search-Mailbox on that temporary mailbox. Find all the items matching the terms I didn't want, and use the -DeleteContent switch.

This would probably have worked but requires an extra hoop. I'm much happier with my solution.

According to your request, filter scope, "items that mention Teddy Roosevelt" including "items that mentions both Teddy and Franklin". you might simplify the cmdlet like below:

New-MailboxSearch -SearchQuery 'Teddy* OR mugwump* OR Roosevelt* OR Rough Riders* NOT (Truman* OR subject:"new-deal")'
  • Sorry, that won't work. I may not have explained clearly. According to your search, if an e-mail only mentions "Franklin Roosevelt" but not Teddy, it will come up your search. I want to ignore all Franklin unless it also says Teddy. Does that make sense? – R_C_III Feb 08 '19 at 15:45
  • With my further research, your request that mentioning 'Teddy Franklin' is corrupt with excluding ''Franklin Roosevelt'. They cant be met at the same time. – Kelvin_D Feb 13 '19 at 09:17