Potential weakness: predictable passwords. I expect the primary weakness is likely to be that users choose a "picture password" that is guessable or predictable. If the user chooses a predictable set of locations/gestures, someone may be able to guess the "picture password".
For Microsoft's evaluation of their own design, you should read this Microsoft blog post on the security of Windows Picture Password. They provide some simple calculations to estimate the strength of a picture password, and to estimate how many bits of entropy they provide. However, I think their security analysis is overly optimistic, because they make several dubious assumptions:
First, they assume that all locations in the image are equally likely to be chosen by the user. I think this is unrealistic; for any given photo, I think some locations are more likely to be picked than others, which it will be predictable which locations are more likely to be picked. Someone is probably more likely to choose some unique feature in the picture, not a random location in the middle of a broad expanse of blue sky. For instance, in the video you shared, the user chose to tap on the location of a window in the image.
Second, they assume that the location of the user's three gestures are all chosen independently. In practice, I don't think this is realistic; I think there will often be some pattern. For instance, in the video you shared, the user chose to tap on three windows in sequence. If you've guessed that the first tap location is over a window, then it would be natural to guess that maybe the next two are on some other windows, too.
Third, they assume that users will use a combination of taps, circles, and lines. (They have three kinds of gestures: tap on a specific location, circle around a particular location with some radius, or drag your finger in a line from one location to another location.) However, tapping is the quickest and easiest and most natural of these gestures. Therefore, I would suspect that many users will just tap on three locations, and not bother with the other gestures. At the same time, tapping has the weakest security, because of the limited number of locations on the picture that someone is likely to select. Therefore, I think some of Microsoft's calculations about security level may be based upon a somewhat optimistic view, and users may not behave the way they are assuming.
As a result, I suspect that the entropy of picture passwords might be significantly lower than the estimates found in those Microsoft blog posts.
Microsoft has a follow-up blog post where they give advice to users on how to choose a hard-to-guess picture password, but I'm not persuaded that the average user will be aware of this or will bother.
That said, Microsoft has deployed one very significant defense against guessing attacks: you only get 5 tries to enter the picture password. After 5 tries, the system locks you out and requires you to enter the textual password. Therefore, someone who gets ahold of your phone will only get 5 tries to guess your picture password -- and if they don't get it right within 5 guesses, they're done. If implemented properly, this seems like a powerful and effective defense against guessing attacks.
Potential weakness: the weakest-link effect. There are now two ways to log into your account: either enter the text password, or enter the picture password. If an attacker can guess either one, he can get access to your account. Therefore, your security is only as good as the weaker of those two passwords. To be secure, both of them need to be well-chosen. This might trip some users up.
Potential weakness: smudge attacks. Suppose someone gets ahold of your phone. Another way they could try to guess your picture password is by looking at the pattern of smudge marks on the screen left by your finger oils.
Past research has looked at smudge marks, in the context of phone lock screens. They found that this attack can be surprisingly effective. If you hold the phone at just the right angle to the light, you can often see the smudge marks clearly. And if they use a digital camera to take a picture at the right angle, the smudge marks become even clearer. Amazingly, they found that the smudge marks remained clearly visible even if the user put the phone in their pocket -- you might expect this would wipe the fingerprints off, but nope, they still remained visible!
The research is described in the following paper:
While I haven't seen any work on this in the context of Windows picture passwords, I would expect that similar methods might help an attacker guess the picture password and significantly reduce the entropy in the password. Fortunately, the attacker only gets 5 guesses, which should help make smudge attacks harder.
For instance, suppose the attacker gets lucky and there are only 3 smudge marks on the screen. Then there are 3! = 321 = 6 possible re-orderings of these. The attacker gets 5 tries to guess the picture password. So, in this scenario, the attacker has a 5/6 chance of guessing the picture password correctly before being locked out. That said, this is almost the best possible case for the attacker, and in practice, the attack will likely be harder to mount, because the attacker will have to guess which of the smudges come from the picture password and which ones come from other use of the touchscreen.
As a lower-tech and more-extreme version of this attack, consider the following photograph of a PIN-entry keypad:
Can you guess the PIN, based upon the wear pattern on the keys? Yes, very good, I knew you could!
Potential weakness: shoulder surfing.
Another possible attack is that, if someone is looking over your shoulder when you log in, they will find it very easy to notice where you are tapping. In fact, it will be hard not to notice the picture password being entered. So, entering your picture password while you are in view of someone else is not safe.
Advantage: convenience.
I expect the picture password to be easier for users to use and more convenient. Right now, all of the methods for authenticating on mobile/touchscreen devices are a pain in the butt. Entering a text password on a touchscreen keypad is a horrible experience, which just drives users to choose poor, short passwords -- and that's not good for security. So, I expect picture passwords will be good for users.
Advantage: no worse than the alternatives.
Right now, on mobile platforms, the main alternatives are asking the user to enter a 4-digit PIN or asking the user to use an unlock gesture, as is done on Android or the iPhone. However, those have their own security weaknesses as well, and they may be no better than the Windows picture password.
Therefore, for mobile platforms, the Windows picture password may represent a pragmatic and reasonable tradeoff between ease of use and security, one that is sufficient for the average user.
Conclusion and takeaways.
My personal impression is: it seems like a plausible scheme, one that might be adequate for many or most users. Overall, it feels like a pragmatic choice to me, given the engineering constraints -- I would be hard-pressed to try to come up with something better. However, it is a new scheme, and more research will be needed to better understand how secure it is, in practice. The security of the scheme will depend heavily upon how users use it, and it's probably too early to say whether it will work well for typical users or not.
Related schemes.
To read about other schemes for authentication on mobile or touchscreen devices, see Are the iPhone “connect the dots” passwords secure? and Is Android's Password Screen Lock Enough Data Theft Protection?.