Hello there everyone,
I've run into an issue trying to save disk space. I'm running a map, similar to Google Maps, that is based on a set of tiles. Many of these tiles are duplicates. In fact 314,000 of the 340,000 tiles are duplicates. Now the size of the duplicate tiles are only 103 bytes, but it doesn't take long to figure out that if I had link these duplicates to the two distinct tiles - land and water, beige and blue - I can save about 314,000 x 103 bytes or 92.3% disk space used. With the current limit of EXT4 at 65,000 hard links to a single file, I've only saved about 900M of space.
I have full control over the server, I can even recompile the kernel if it's required. I noticed that just as in EXT3, the EXT4_MAX_LINK constant is set in /usr/src/linux-18.104.22.168/fs/ext4/ext4.h. I suspect the only way I'm going to be able to increase this hard-coded limit is to do so in the source by changing the following line:
Symlinks will not work with such small files due to the way a filesystem uses inodes. Right now the filesystem has inode set to 4k, from what I read, in the source, the minimum for ext4 is 1k. Any file will take up size on the disk rounded up to the nearest 1k, or 4k in my case. So no actual space is saved.
#define EXT4_LINK_MAX 65000
The difference between a symlink and a hard link is the reason that a hard link will save me space and a symlink will not.
Imagine a directory is just a list of links - inodes - to files. When you create a hard link you are putting a link directly to the file in the directory listing of inodes. When you create a symlink you create a link in the directory listing to a file that links to the source file.