When is the data in the journal written to the disk?

7

1

(1) mentions that "With a journal the file is first written to the journal, punch-in, and then the journal writes the file to disk when ready. Once it has successfully written to the disk, it is removed from the journal, punch-out, and the operation is complete."

So, when I create a file it's written to the journal and written to the disk later. If I create a file of 1MB, them actually 2MB of data is written to the disk, 1 MB to the journal and another to the disk later. This might actually decrease the lifespan of the disk. My question is when is data in the journal transferred to the disk? If it's not done immediately then subsequent reads for the data in the disk is not possible. Also, is the write complete to the user when the data is written to the journal or to the disk?

Also, there is a mention that because of the journaling the defragmentation in some of the file systems is less. How is disk defragmentation related to journal?

(1) http://www.howtogeek.com/howto/33552/htg-explains-which-linux-file-system-should-you-choose/

Praveen Sripati

Posted 2011-08-28T13:58:40.993

Reputation: 1 385

the journals can be in many locations in the clusters of the drive (non sequential) and they are not normally movable files, so they cannot be normally defragged. could still be defragged at boot, with a program that did that. – Psycogeek – 2011-10-04T13:46:26.930

Answers

1

when is data in the journal transferred to the disk?

Depends on two main things: the file system in use and the physical storage device. XFS uses write barriers. EXT3 uses write barriers, if enabled. EXT4 has barriers on by default. Traditional HDDs use caches. Solid-State Drives may or may not have a cache. Ultimately, it is a combination of the operating system, file system and underlying hardware architecture and specifications that determine when data is persisted on the storage device.

is the write complete to the user when the data is written to the journal or to the disk?

This also depends on the application in use and your operating system. Linux has the fsync system call that applications and file systems use to flush cached data to the physical devices. Not all applications use fsync to explicitly flush cached data to storage. You can always issue a sync command to manually flush file system buffers.

How is disk defragmentation related to journal?

Disk fragmentation affects performance, especially when dealing with large files whose blocks are not contiguous. There are different techniques for mitigating fragmentation. For example, XFS and other file systems use an allocate-on-flush technique to minimize fragmentation.

Dan Cruz

Posted 2011-08-28T13:58:40.993

Reputation: 1 095

2Although the poster marked this as the answer, this doesn't address the question at all : The cache refers to memory buffers and delayed writes, and so has nothing to do with journaling. – harrymc – 2011-10-05T05:54:43.533

3

Some better links for information about journaling are :

Journaling file system
Anatomy of Linux journaling file systems

The later explains the three journaling strategies : writeback, ordered, and data; where ordered is normally the default :

Ordered mode is metadata journaling only but writes the data before journaling the metadata. In this way, data and file system are guaranteed consistent after a recovery.

So, unless you have set your journaling strategy to data mode (also called Journal mode), where both metadata and data are journaled, your disk will not suffer much from the fact that it is journaled.

The journal itself is allocated on a fixed area of the disk, and therefore doesn't add to the fragmentation. Some filesystem variants will also let it grow and shrink, so some fragmentation may occur.

On a journaling file-system, fsck will normally run the journal automatically, and if the filesystem is otherwise clean, will skip doing a full filesystem check.

harrymc

Posted 2011-08-28T13:58:40.993

Reputation: 306 093

Great read, thanks So now there is 2 times the liklyhood of data not finishing, the journal and the write that could have finished if it wasnt there. the journaling is OS controled, and it helps when the OS FAILS, but an OS fail does not constitute a hard drive fail, so the hard drive could have finished, but the journaling had a dependancy, on what failed. EXT file system is Slow, and many users using it HAVE had file problems that didnt do squat to repair themselves. I could believe more, if the theories had every played out as planned. This stuff might be more usefull in a server – Psycogeek – 2011-10-04T14:32:12.523

And everyone knows that OSes never fail :-) and that hard drives and flash cards fail in ~5 years, so it makes prefect sence that an OS dependant system should spend 1/2 of its time screwing up the storage because of its OS dependacy, because in 5 years it will fix a hard disk or flash disk once , right before you have to retire the hard drive or flash disk, because it isnt working right :-) – Psycogeek – 2011-10-04T14:43:20.657

1

You might also like to read Comparison of file systems. I believe that currently both Linux and Windows are doing a good job, and will do even better in the near future. These filesystems try to protect against crashes and supply security and be efficient. They are not perfect, but are still evolving.

– harrymc – 2011-10-04T17:40:11.210

Yup there is no turning back, need that 4+ gig files, and 5000terrabyte drives. the rest of it they can keep, most of the things they add "dont impress me much", i dont run a server (yet). encryption can be done 50ways if i had anything to hide. – Psycogeek – 2011-10-04T18:21:35.857

0

There's no evidence that disk life is correlated with activity level. An unused, but spinning, disk tends to last about as long as a heavily used disk.

In any event, actual file data is normally not journaled. It's not necessary. Normally only metadata, necessary to maintain the integrity of the filesystem, is journaled.

David Schwartz

Posted 2011-08-28T13:58:40.993

Reputation: 58 310

I think There's no evidence that disk life is correlated with activity level. is incorrect. See http://www.storagereview.com/guide/specCycles.html for more info; specifically Each time the drive starts and stops a small amount of wear occurs to the heads and also to other components such as the spindle motor. For this reason, hard drives are given a specification for the minimum number of start/stop cycles they are designed to handle during their service life..

– Dan Cruz – 2011-10-04T12:47:52.967

Meta data only . . . A file system with a logical journal still recovers quickly after a crash, but may allow unjournaled file data and journaled metadata to fall out of sync with each other, causing data corruption. https://secure.wikimedia.org/wikipedia/en/wiki/Journaling_file_system In a metadata-only journal, step 3 would not be logged. If step 3 was not done, but steps 1 and 2 are replayed during recovery, the file will be appended with garbage. The whole of it doesnt fit here

– Psycogeek – 2011-10-04T17:36:53.493