There is legal precedent for this.
Field v. Google Inc., 412 F. Supp. 2d 1106, (U.S. Dist. Ct. Nevada 2006). Google won a summary judgement based on several factors, most notably that the author did not utilize a robots.txt file in the metatags on his website, which would have prevented Google from crawling and caching pages the website owner did not want indexed.
Ruling pdf
There is NO U.S. law specifically dealing with robots.txt files; however another court case has set some precedent that could eventually lead to robots.txt files being considered as circumventing intentional electronic measures taken to protect content.
In HEALTHCARE ADVOCATES, INC Vs HARDING, EARLEY, FOLLMER & FRAILEY, et. al, Healthcare Advocates argued that Harding et al essentially hacked the capabilities of the Wayback Machine in order to gain access to cached files of pages that had newer versions with robots.txt files. While Healthcare Advocates lost this case, the District Court noted that the problem was not that Harding et al "picked the lock," but that they gained access to the files because of a server-load problem with the Wayback Machine that granted access to the cached files when it shouldn't have and therefore there was "no lock to pick."
Court Ruling pdf
It is only a matter of time IMHO until someone takes this ruling and turns it on its side: The court indicated that robots.txt is a lock to prevent crawling and circumventing it is picking the lock.
Many of these lawsuits, unfortunately, aren't as simple as "I tried to tell your crawler that it is not allowed and your crawler ignored those settings/commands." There are a host of other issues in all these cases that ultimately affect the outcome more than that core issue of whether or not a robots.txt file should be considered electronic protection method under US DCMA law.
That having been said, this is a US law and someone from China can do what they want--not because of the legal issue, but because China won't enforce US trademark and copyright protection, so good luck going after them.
Not a short answer, but there really isn't a short, simple answer to your question!