For efficiency reasons, the OS saves all changes made to the file system in RAM for a while. When it has stored a few changes, they are dumped to disk. This is done because prolonged writes are more efficient than punctual ones and to allow the OS to efficiently organize where to write the data. The journaling of a file system is an area of the disk, a file, or even a separate disk where all operations to be performed are recorded, such as moving files, deleting them, creating them, etc. These annotations are made immediately.
There are two types of journals:
Physical:
All changes in metadata and data are written to this disk before being dumped to the final disk. It is usually a smaller and faster unit than the final disk.
Assuming we have a file and we are going to save data making it grow:
- The OS saves the new data in RAM
- Dumps to the physical journal the size changes in the inode and the data
- When the OS deems it appropriate, it dumps the data from RAM to the final disk
If a problem arises while writing to the journal disk, the partial information will have an incorrect CRC, it will be discarded, and the changes will be lost, but the file system will remain in a consistent state. If the information is in the journal but not in the final disk, it is reconstructed from the journal.
Logical:
It is usually the fastest part of the disk, and there are several operating modes allowed by Ext4:
Writeback Mode:
data=writeback: Only the metadata is journaled, not the data. This is the fastest journaling system.
First, the metadata is written to the journal, then to the filesystem’s inode table, and finally the data is written to the disk. If the write operation is interrupted while updating the inode table during boot, the journal will be read and it will be seen that there should be an entry in the table that does not exist. It will be generated and a file of that size with garbage data will be written so that the filesystem remains consistent. On the other hand, if the operation is interrupted while the data is being written to the disk, the journal will show that the file occupies X (what it would occupy if it had finished writing) and in the filesystem we have a size of Y, the data partially written to the file. On restart, it will be detected that X != Y, so the file has been corrupted. It will be filled with garbage until it occupies X, thus losing the file :(, but the filesystem will be in a consistent state since all files start and end where the inode table indicates.
Ordered Mode:
data=ordered: Only the metadata is journaled, but it is written to disk using transactions. This system is a bit slower than writeback.
The metadata is related to the data in such a way that “auto-magically” atomic transactions are formed. These transactions are written to disk and then the metadata is written to the journal. If there is a problem while the transactions are being written to disk, the file data will be lost, but the filesystem will always be in a consistent state since there cannot be data on one side and metadata on the other since both form an atomic and indivisible unit.
NOTE: I cannot understand why the journal is written after the data and metadata have been written to the disk. I would appreciate it if someone could clarify this point for me by email: kr0m@alfaexploit.com
Journal Mode:
data=journal: Both data and metadata are written to the journal and finally to the disk.
This implies a considerable penalty in the I/O system performance, but we obtain greater reliability. The operation is identical to that of a physical journal, only the file information is lost in case the writing in the journal is not completed.
Some references that may be of interest:
http://www.ibm.com/developerworks/library/l-journaling-filesystems/
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
http://www.ibm.com/developerworks/library/l-anatomy-ext4/
https://ext4.wiki.kernel.org/index.php/Main_Page?
http://en.wikipedia.org/wiki/Journaling_file_system