2

I'm looking for a command-line utility or some other way to test effectiveness of file locks, specifically POSIX advisory locks (which aren't only for POSIX, btw) in a Linux filesystem.

Specifically, I want to ensure POSIX advisory locking (file locking) is working correctly in simfs in a Linux/Ubuntu VM used for continuous integration testing. We've had file corruption that only occurs to a SQLite DB file when there are concurrent writes by 30 processes. This is only being used in testing by one project, but we'd like to help track down the problem so others won't run into it.

According to the SQLite team and documentation, concurrent writes are only supported when POSIX advisory locks are working in the filesystem/OS. The test I have that uses SQLite works in v3.7.7 of SQLite in OS X, but the same test corrupts the DB file in v3.7.9 of SQLite in the Ubuntu VM provided by TravisCI (and hosted by Blue Box). The SQLite team did not indicate that there were any concurrency issues fixed between those two versions, since concurrency is dependent on the OS/filesystem's POSIX advisory locks working.

Additional information about the environment that I'm trying to investigate:

$ sqlite3 -version
3.7.9 2011-11-01 00:52:41 c7c6050ef060877ebe77b41d959e9df13f8c9b5e

$ uname -r
2.6.32-042stab061.2

$ cat /proc/version
Linux version 2.6.32-042stab061.2 (root@rh6-build-x64) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Aug 24 09:07:21 MSK 2012

$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise

(home dir it that is exhibiting the problem is within the / mount.)

$ cat /proc/mounts
/dev/simfs / simfs rw,relatime 0 0
...

$ mount
/vz/private/6062841 on / type simfs (rw)
...

I have a ticket in with those that provide the VM here where they stated that they are not using network filesystems, which commonly are associated with POSIX lock-related issues because of the complexity involved with implementing POSIX locks in such environments. In addition to the info above, though this press release would seem to indicate OpenStack is being used, the path above shows 'vz' in the mount, making it seem OpenVZ is being used.

As for tools to help diagnose POSIX lock failures, the only one that I've heard about is a ping-pong test that is part of called smbtorture which tests POSIX locking with Samba, but I'm not using Samba in this case, so I'm not sure that would help.

If there is no command-line test available, how would I go about testing that it is working if I all I have available to me is limited access to the VM (as sudo doesn't require password as my user, but the commands that should output something using sudo don't work, so I think it is overriden)? Are there commands that I could have the VM administrator run to gather more info to help resolve this problem?

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Gary S. Weaver
  • 113
  • 2
  • 10
  • 2
    How is this utility supposed to know what "corrupt" means to these arbitrary processes? – EEAA Aug 18 '13 at 12:57
  • @EEAA I suppose that all of them could write, doing whatever is necessary to use filesystem to attempt to lock the file, then all write the same set of data while the file is assumed to be locked, and then once complete you could read all of that data and assume that it should be the data you expect, not partial data and not data that is "interrupted" by other data or is in an order that is not expected if multiple writes done while still locked. – Gary S. Weaver Aug 18 '13 at 13:18
  • i.e. it would spawn a number of processes, wait on them to complete, then read the file that they all tried writing to. Although, it wouldn't necessary have to do that if there were other methods to test file locking works (specifically POSIX locks). I'm more interested in that it works than the method I described of determining that. – Gary S. Weaver Aug 18 '13 at 13:20
  • I have never come across such a tool. – fpmurphy Aug 19 '13 at 15:01
  • I've made the question less about corruption and more about testing POSIX advisory file locking in a filesystem from command-line, since people seem to be getting hung up on how you test that a file was corrupted. – Gary S. Weaver Aug 20 '13 at 20:47
  • Read [here](https://access.redhat.com/site/articles/48659) that, "POSIX fcntl locks are not inherited across the fork(2) system call." Perhaps OS X's behavior with Ruby's [fork](http://www.ruby-doc.org/core-2.0/Process.html#method-c-fork) command does not match that of Linux? Ruby allows access to fcntl(2)/posix advisory locking via [IO::fcntl](http://www.ruby-doc.org/stdlib-2.0/libdoc/fcntl/rdoc/Fcntl.html), so I could use that in a test, similar to Matz's [TestFcntlLock](http://bogomips.org/ruby.git/tree/test/fcntl/test_fcntl_lock.rb?h=fcntl-lock&id=0dc9f4c6629eece833e9ec49a6de57f06a97c39f). – Gary S. Weaver Aug 21 '13 at 13:30
  • The Ruby fork doc also suggests that Ruby's [fork](http://www.ruby-doc.org/core-2.0/Process.html#method-c-fork) doesn't work in Windows and BSD, so instead I should be using `pid = spawn(...)`, but that just takes a system/shell command vs. block of Ruby code and would be more difficult to setup. – Gary S. Weaver Aug 21 '13 at 13:31

2 Answers2

3

First off: file locks and pthread mutexes are entirely different beasts. File locks are used to advice the current or other processes that a file is currently not to be used. Pthread mutexes are used to coordinate critical sections between threads in the current process only.

File locking is done flock(2) and friends, and conveniently, there's a shell script wrapper for it. To test whether file locks works, you open two terminals and run this:

In terminal one:

flock /path/to/lockfile sleep 120

And in the other terminal while the first one is holding the lock:

if ! flock -n /tmp/foo.lock true ; then echo "flock works"; else echo "flock fails"; fi

That should tell you whether file locks work.

And if you have to run it in one script, try this:

flock /path/to/lockfile sleep 120 &
if ! flock -n /tmp/foo.lock true ; then echo "flock works"; else echo "flock fails"; fi
kill $!

Another way of locking files is the fcntl system call. It's rather annoying to test with ruby, but this python code should do the trick:

import fcntl, os, time

fd = open('/tmp/test.lock', 'w')
if os.fork():
    fcntl.lockf(fd, fcntl.LOCK_EX)
    os.wait()
else:
    time.sleep(0.1)
    fcntl.lockf(fd, fcntl.LOCK_EX|fcntl.LOCK_NB)

It tries to lock the same file in 2 different processes. The second lock is non-blocking, so should immediately raise an error. The expected output, if fcntl locks are properly working, is:

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    fcntl.lockf(fd, fcntl.LOCK_EX|fcntl.LOCK_NB)
IOError: [Errno 11] Resource temporarily unavailable
Dennis Kaarsemaker
  • 18,793
  • 2
  • 43
  • 69
  • +1 for the info! I don't have access to the console- I can only shell out commands, so did this in the Ruby test: `system 'flock test.lock sleep 5 & sleep 1; if ! flock -n test.lock true ; then echo "flock works"; else echo "flock fails"; fi'`, but that is saying that flock works. From what I'm reading, I think I need to test fcntl, and I think I can only do that with smbtorture's [ping pong](https://access.redhat.com/site/articles/48659) test, but not sure how and not sure if that is only for Samba? I might still be on the wrong track, though. – Gary S. Weaver Aug 20 '13 at 22:21
  • 1
    Looking at the ruby documentation: testing fcntl locks from pure ruby code is a PITA. Do you have to test it in ruby? If you can test it in python, I can give you a quick test. – Dennis Kaarsemaker Aug 21 '13 at 17:10
  • Thanks for the py code! Will convert to single command-line and try it out. – Gary S. Weaver Aug 21 '13 at 20:09
  • I don't think that's gonna work, but you could write it to a file and run system 'python that_file' – Dennis Kaarsemaker Aug 21 '13 at 20:31
  • `py_out = \`python -c "import fcntl, os, time\n\nfd = open('/tmp/test.lock', 'w')\nif os.fork():\n fcntl.lockf(fd, fcntl.LOCK_EX)\n os.wait()\nelse:\n time.sleep(0.1)\n fcntl.lockf(fd, fcntl.LOCK_EX|fcntl.LOCK_NB)\n"\`` :) – Gary S. Weaver Aug 21 '13 at 20:34
  • So whether good or bad, your latest test is passing/not failing. It results in: "Traceback (most recent call last): File "", line 9, in IOError: [Errno 11] Resource temporarily unavailable" in both OS X where it works and in the Linux VM where I thought it'd be failing. – Gary S. Weaver Aug 21 '13 at 20:39
  • I'm thinking now it is because of the statement I mentioned above in the comments from [here](https://access.redhat.com/site/articles/48659), "POSIX fcntl locks are not inherited across the fork(2) system call." What I don't know yet is why Ruby's [fork](http://www.ruby-doc.org/core-2.0/Process.html#method-c-fork) command in OS X does not cause problems with fcntl locks but it does in Linux. In the rdoc it says "Creates a subprocess", but need to look at the Ruby source/do some more reading. Using Tracepoint in Ruby 2, I only see `[7, Kernel, :fork, :c_call]` and then the c_return. – Gary S. Weaver Aug 21 '13 at 20:50
  • btw- [this](https://github.com/ruby/ruby/blob/v2_0_0_247/test/ruby/test_io.rb#L2183) is Ruby 2 p247's test of fcntl lock in linux. The link I put in the comments at top was just the first thing I found in google. Will continue to look later. – Gary S. Weaver Aug 21 '13 at 21:01
  • I don't have openvz vm for do a test, but i think you need to read this note about advisory locking – c4f4t0r Aug 22 '13 at 07:16
  • Ok, found the internal fork source for Ruby 2.0.0p247 at https://github.com/ruby/ruby/blob/v2_0_0_247/process.c#L3307 and SQLite 3.7.9 code [here](http://www.sqlite.org/cgi/src/tarball/SQLite-34aafb743627e469.tar.gz?uuid=34aafb743627e469681d9d3be4f831399418cc17) - In SQLite, src/os_unix.c (among others) has a number of uses of fcntl and tool/getlock.c states "This only works on unix when the posix advisory locking method is used (which is the default on unix) and when the PENDING_BYTE is in its usual place." – Gary S. Weaver Aug 23 '13 at 00:07
  • Well, don't have an answer yet, but @DennisKaarsemaker answered the original question with ways to test flock and fcntl locks at command-line. Thanks again, Dennis! – Gary S. Weaver Aug 23 '13 at 21:55
0

I don't have openvz vm for do a test, but i think you need to read this note about advisory locking, Advisory locking requires cooperation from the participating processes. Suppose process “A” acquires an WRITE lock, and it started writing into the file, and process “B”, without trying to acquire a lock, it can open the file and write into it. Here process “B” is the non-cooperating process. If process “B”, tries to acquire a lock, then it means this process is co-operating to ensure the “serialization”.

Advisory locking will work, only if the participating process are cooperative. Advisory locking sometimes also called as “unenforced” locking.

c4f4t0r
  • 5,149
  • 3
  • 28
  • 41