how to replace contemporary values in a file

I'm trying to write in a single "file.cfg" the values of two variables generated by two independent scripts. The two variables are constantly updated and saved in the "file.cfg". Below is an example of my work.

example "file.cfg" content:

a=null
b=null

example "script_a.sh" update "a" value with:

#!/bin/bash
while : do
    .............
    val_a=1 
    sed -i "s/^\(a=\).*/\1$val_a/" file.cfg
    .............
done

example "script_b.sh" update "b" value with:

#!/bin/bash
while : do
    .............
    val_b=2 
    sed -i "s/^\(b=\).*/\1$val_b/" file.cfg
    .............
done

The scripts work perfectly and the values are updated. But if the two scripts are executed simultaneously one of the two values is not updated.

I discovered that sed with the "-i" option creates a temporary file that is overwritten by the two simultaneous operations. How can I solve?

Jax2171

Posted 2019-02-22T13:01:02.447

Reputation: 43

Answers

This other answer exploits the idea of lockfile. There is another utility: flock(1). From its manual:

flock [options] file|directory command [arguments]
flock [options] file|directory -c command
[…]

This utility manages flock(2) locks from within shell scripts or from the command line.

The first and second of the above forms wrap the lock around the execution of a command, in a manner similar to su(1) or newgrp(1). They lock a specified file or directory, which is created (assuming appropriate permissions) if it does not already exist. By default, if the lock cannot be immediately acquired, flock waits until the lock is available.

And because it uses flock(2) system call, I believe the kernel guarantees no two processes can hold a lock for the same file:

LOCK_EX Place an exclusive lock. Only one process may hold an exclusive lock for a given file at a given time.

In your scripts, instead of sed … run flock some_lockfile sed …, e.g.

flock some_lockfile sed -i "s/^\(a=\).*/\1$val_a/" file.cfg

And that's it, the lock gets released when sed exits. The only disadvantages are:

some_lockfile may already be in use as a lockfile; the safe way is to use mktemp to create a temporary file and use it;
at the end you need to remove some_lockfile (I guess you don't want to leave it as garbage); but if anything else uses the file (probably not as a lockfile), you may not want to remove it; again, mktemp is the way to go: create a temporary file, use it, remove it – regardless of what other processes do.

Why not flock file.cfg sed … then? It would lock the exact file that is operated on; this wouldn't leave garbage at all. Why not?

Well, because this is flawed. To understand it let's see what (GNU) sed -i exactly does:

-i[SUFFIX]
--in-place[=SUFFIX]

This option specifies that files are to be edited in-place. GNU sed does this by creating a temporary file and sending output to this file rather than to the standard output.

[…]

When the end of the file is reached, the temporary file is renamed to the output file’s original name. The extension, if supplied, is used to modify the name of the old file before renaming the temporary file, thereby making a backup copy.

I have tested that flock locks inode rather than name (path). This means just after sed -i renames the temporary file to the original name (file.cfg in your case), the lock no longer applies to the original name.

Now consider the following scenario:

The first flock file.cfg sed -i … file.cfg locks the original file and works with it.
Before the first sed finishes, another flock file.cfg sed -i … file.cfg arises. This new flock targets the original file.cfg and waits for the first lock to be released.
The first sed moves its temporary file to the original name and exits. The first lock is released.
The second flock spawns the second sed which now opens the new file.cfg. This file is not the original file (because of different inode). But the second flock targeted and locked the original file, not the one the second sed just opened!
Before the second sed finishes, another flock file.cfg sed -i … file.cfg arises. This new flock checks the current file.cfg and finds it's not locked; it locks the file and spawns sed. The third sed begins to read the current file.cfg.
There are now two sed -i processes reading from the same file in parallel. Whichever ends first, loses – the other one will overwrite the results eventually by moving its independent copy to the original name.

That's why you need some_lockfile with a rock solid inode number.

Kamil Maciorowski

Posted 2019-02-22T13:01:02.447

Reputation: 38 429

A lockfile should work well, if the lockfile exists then some process is using the target file & other processes will have to wait.

If you've got the lockfile-progs package then you could use it to check for an existing valid lock (within the last 5 minutes) with lockfile-check, and similar lockfile-create & lockfile-remove.

Note that these lockfiles do not lock or block access to the file, but are just informative so your scripts know not to interfere with each other.

lockfile-create has a default delay if a lockfile already exists, it will wait until the file's unlocked before proceeding. Here's an except from it's man page:

-r retry-count, --retry retry-count

Try to lock filename retry-count times before giving up. Each attempt will be delayed a bit longer than the last (in 5 second increments) until reaching a maximum delay of one minute between retries. If retry-count is unspecified, the default is 9 which will give up after 180 seconds (3 minutes) if all 9 lock attempts fail.

Here's a basic example allowing multiple commands while file.cfg is locked (including an exit if lockfile-create fails), but see the man page for more details.:

lockfile-create file.cfg  || { echo "lockfile-create failed, exiting now"; exit; }
...
sed -i ... file.cfg
...
lockfile-remove file.cfg

If you need the lockfile for longer than 5 minutes, use lockfile-touch to "run forever, touching the lock once every minute until killed." Here's an excerpt from the man page:

Locking a file during a lengthy process:

     lockfile-create /some/file
     lockfile-touch /some/file &
     # Save the PID of the lockfile-touch process
     BADGER="$!"
     do-something-important-with /some/file
     kill "${BADGER}"
     lockfile-remove /some/file

If you did want to do something special while waiting for the file to unlock, you could use a while loop like this, but there could be a window of few milliseconds (0.003s in my time tests) between checking and locking the file, but then lockfile-create will just wait until it's safe to proceed anyway

while lockfile-check file.cfg
do
  echo doing stuff waiting for lock to clear
  sleep 1
done

lockfile-create file.cfg || exit
...
sed -i ... file.cfg
...
lockfile-remove file.cfg

And as long as both scripts use & respect lockfiles, sed should never be able to replace the file while it's unlocked, so there should be no file copying & renaming conflicts.

Or there's other similar options like:

dotlockfile
your own test -a FILE & touch...
flock as in Kamil's answer is in the coreutils package which is nice
Store the values in a database program that can handle simultaneous access safely

Xen2050

Posted 2019-02-22T13:01:02.447

Reputation: 12 097

Actually lockfile-create will wait if the files's already locked, I'll edit in it's -r option info... so the while loop's not even necessary if you're happy with it's default delay. It's described as "guaranteed compatible with Debian's file locking policies" so I think should perform equivalent to flock... @KamilMaciorowski

– Xen2050 – 2019-02-23T03:52:26.673

The path/inode sed problem is interesting, but assuming both processes create a lockfile before starting sed, and clear the lockfile when sed's finished & the filename's "stable" I don't think there can be a conflict, since it creates an actual file.cfg.lock, so it's similar to your answer using some_lockfile that needs cleaning (lockfile-remove here) afterwards. – Xen2050 – 2019-02-23T03:54:07.297

Those sound valid, I'll edit in a bit about "holding" the lock for longer, and an "exit on fail" for the code example – Xen2050 – 2019-02-23T10:28:24.027

sed man page

-i[SUFFIX], --in-place[=SUFFIX]

edit files in place (makes backup if extension supplied). The default operation mode is to break symbolic and hard links. This can be changed with --follow-symlinks and --copy.

-c, --copy

use copy instead of rename when shuffling files in -i mode. While this will avoid breaking links (symbolic or hard), the resulting editing operation is not atomic. This is rarely the desired mode; --follow-symlinks is usually enough, and it is both faster and more secure.

Have a look whenever you got some aliases setup or what your command looks like exactly. According to the man page it shouldn't be creating a backup if you just use -i.

It doesn't mean both couldn't access the file simultaneously and overwrite each others changes. Using a mutex or similar might be advisable in that situation.

Seth

Posted 2019-02-22T13:01:02.447

Reputation: 7 657

Thanks for the reply. It's a bit generic though, could you elaborate on it? – Jax2171 – 2019-02-22T14:08:54.410

Elaborate on what? Have a look at this rubber chicken explanation about what a mutex is if you're unsure about that. How to implement it depends on you. To setup a mutex you could use something like flock.

– Seth – 2019-02-25T06:19:46.483

This is the classic real-time update problem: script_a.sh reads file.cfg, and before it writes any changes, script_b.sh reads the same information; then, whichever script writes its update first, its changes will be overwritten when the other script posts its update. It does not matter whether the updates are done through a temporary file or by writing direct.

There is no native semaphore or mutex handling within bash, but you can use the file.cfg itself by adding lines to your scripts, eg in script_a.sh:-

#!/bin/bash
while : do
    .............
    while ! mv file.cfg file.cfg_a 2>/dev/nul; do sleep 0.1; done
    val_a=1
    sed -i "s/^\(a=\).*/\1$val_a/" file.cfg_a
    mv file.cfg_a file.cfg
    .............
done

The changes to script_b.sh are similar, except that the file is renamed to file.cfg_b for updating.

By using a rename command, the script both checks the availability of the file for updating and obtains unique access in a single, uninterruptible process.

I never like polling loops, but without compiling code which supports functions to handle semaphores and mutexes, this is the best that can be easily done.

Note that some versions of sleep do not support fractional delays, in which case you will need to delay for the minimum of a second before retrying, unless you use a different utility.

AFH

Posted 2019-02-22T13:01:02.447

Reputation: 15 470

Best solution for now !!! ;) – Jax2171 – 2019-02-23T14:46:07.843

@Jax2171 Warnings and concerns: (1) Obviously this will overwrite the already existing file.cfg_a, if any. A safe way is to create a temporary file (mktemp -p ./) and to overwrite it. (2) "The two variables are constantly updated". This makes sense when there's a final tool that opens file.cfg at random times (otherwise it would be enough to update the values once, one by one, just before the final tool starts). The final tool now needs to cope with the situation when file.cfg is temporarily missing. (3) In general you cannot be sure mv is atomic, but atomicity is crucial here. – Kamil Maciorowski – 2019-02-24T10:49:21.443

@KamilMaciorowski - Thanks for your comments. (1) file.cfg_a and file.cfg_b were intended as files unique to this procedure, but any other reserved names would do, including those generated by mktemp. (2) It would be better to have a single process doing the updates, maybe taking sed commands from an input pipe, but this is a lot more complex, so I didn't suggest it. My simpler solution requires that file.cfg exists before either script is run. (3) A rename will either succeed or fail: it can't be interrupted, allowing file.cfg to be renamed to two different files. – AFH – 2019-02-24T14:34:13.027

(1) OK. (2) The point is there's probably a program the file.cfg is intended for; it needs now be prepared to not find the file and to wait for it to reappear. (3) True, I guess, if every mv uses rename(2). But if it's copy+delete, the issue is not about file.cfg to be renamed to two different files. It's about mv file.cfg file.cfg_b copying the partial file while mv file.cfg_a file.cfg is still running. This may happen if there's a multi-threaded FUSE involved or if the files are under different mountpoints. It's not obvious a slight change to /tmp/file.cfg_a may be unsafe. – Kamil Maciorowski – 2019-02-24T18:02:04.217

@KamilMaciorowski - (2) OK. I hadn't considered how the file would be processed. I guess the file reader would need a polling loop similar to those in the scripts. (3) As I understand it, mv copies and deletes only when the target is on a different file system, which is not the case here. – AFH – 2019-02-24T21:21:12.630

(3) Not the case here (although FUSE may be), but the point of SE is to create a knowledge base for users with similar problems yet to come. You said "any other reserved names would do" and this is true. Similarly they may think any other available paths would do. They may enter a different filesystem thinking it's not a big deal. (Hey! This guy is right about mktemp. Unfortunately my mktemp doesn't support -p so I'll just drop it. Isn't /tmp for temporary files anyway?) Warning them is the right thing to do. – Kamil Maciorowski – 2019-02-24T21:38:32.500