40

A hard link is defined as a pointer to an inode. A soft link, also known as a symbolic link, is defined as an independent file pointing to another link without the restrictions of hard links.

What is the difference between a file and a hard link? A hard link points to an inode, so what is a file? The inode entry itself? Or an inode with a hard link?

Let's say I create a file with touch. Then an inode entry is created in the inode table. And I create a hard link, which has the same inode number as the file. So did I create a new file? Or is the file just defined as an inode?

7
  • This is almost certainly a duplicate of unix.stackexchange.com/questions/9575/…
    – infixed
    Commented Feb 20, 2017 at 23:10
  • 8
    @infixed Exactly not, I am asking the difference of a file and a hard link. Commented Feb 20, 2017 at 23:17
  • So I've undeleted my original answer that I believe was also covered in the answers to that linked question. So is it still 'exactly not'?
    – infixed
    Commented Feb 20, 2017 at 23:50
  • 7
    The difference between a file and a hardlink is the same as the difference between you and the line with your name in the phonebook. Commented Feb 21, 2017 at 10:25
  • 2
    This is a duplicate of unix.stackexchange.com/questions/234402/…
    – seumasmac
    Commented Feb 21, 2017 at 14:25

8 Answers 8

67

The very short answer is:

  • a file is an anonymous blob of data
  • a hardlink is a name for a file
  • a symbolic link is a special file whose content is a pathname

Unix files and directories work exactly like files and directories in the real world (and not like folders in the real world); Unix filesystems are (conceptually) structured like this:

  • a file is an anonymous blob of data; it doesn't have a name, only a number (inode)
  • a directory is a special kind of file which contains a mapping of names to files (more specifically inodes); since a directory is just a file, directories can have entries for directories, that's how recursion is implemented (note that when Unix filesystems were introduced, this was not at all obvious, a lot of operating systems didn't allow directories to contain directories back then)
  • these directory entries are called hardlinks
  • a symbolic link is another special kind of file, whose content is a pathname; this pathname is interpreted as the name of another file
  • other kinds of special files are: sockets, fifos, block devices, character devices

Keeping this metaphor in mind, and specifically keeping in mind that Unix directories work like real-world directories and not like real-world folders explains many of the "oddities" that newcomers often encounter, like: why can I delete a file I don't have write access to? Well, for one, you're not deleting the file, you are deleting one of many possible names for the file, and in order to do that, you only need write access to the directory, not the file. Just like in the real world.

Or, why can I have dangling symlinks? Well, the symlink simply contains a pathname. There is nothing that says that there actually has to be a file with that name.

My question is simply what is the difference of a file and a hard link ?

The difference between a file and a hard link is the same as the difference between you and the line with your name in the phone book.

Hard link is pointing to an inode, so what is a file ? Inode entry itself ? Or an Inode with a hard link ?

A file is an anonymous piece of data. That's it. A file is not an inode, a file has an inode, just like you are not a Social Security Number, you have a SSN.

A hard link is a name for a file. A file can have many names.

Let's say, I create a file with touch, then an Inode entry is created in the Inode Table.

Yes.

And I create a hard link, which has the same Inode number with the file.

No. A hard link doesn't have an inode number, since it's not a file. Only files have inode numbers.

The hardlink associates a name with an inode number.

So did I create a new file ?

Yes.

Or the file is just defined as an Inode ?

No. The file has an inode, it isn't an inode.

5
  • 15
    I'd never really understood (or properly thought about) what metaphor was behind the word "directory". The phone book example is a great one; perhaps you should introduce it earlier (when you first mention the real world). Similarly, most people rarely deal with "files" outside of a computer, so perhaps it would be clearer to say "just like paper files, and a directory like a phone book".
    – IMSoP
    Commented Feb 21, 2017 at 13:08
  • 3
    @IMSoP It's a generation gap. Before computers, a phone book was one of the kinds of a directory. Cambridge dictionary says: "directory: a book that gives a list of names, addresses, or other facts [...example] Look up their number in the telephone directory."
    – kubanczyk
    Commented Feb 22, 2017 at 12:11
  • 2
    @kubanczyk Indeed - to people who worked in pre-digital offices, I guess the metaphors seem so obvious that it feels almost condescending to explain them. But to those of my generation and below, it's as obscure as why the storage area at the back of a car is called a "boot" or "trunk", so you have to really spell it out.
    – IMSoP
    Commented Feb 22, 2017 at 12:33
  • The word "have" in the phrase "A hard link doesn't have an inode number" is possibly misleading, because then you say that "The hardlink associates a name with an inode number". The "hardlink" directory entry's data-structure actually contains the inode # - this is how the link is "associated" with the inode #. By "doesn't have" I think you mean the hardlink doesn't have an inode # that indicates where the link is stored on disk.
    – Kelvin
    Commented Feb 22, 2017 at 17:28
  • 2
    Saying that a file has an inode is somewhat backwards. The inode is the structure that contains the information about where the "blob of data" is. If there's no inode, there's no file.
    – Barmar
    Commented Feb 22, 2017 at 18:20
19

A hard link is a directory entry. A file may have multiple directory entries, if it's present under different names or in different directories. A directory entry is called “hard link” when it's put in relation with other directory entries for the same file.

The inode contains the file's metadata other than its name and contents (location of the contents, permissions, timestamps, etc.). There's one inode per file. (Not all filesystems put the metadata in a clearly identifiable space on disk that you could call “inode”, but it's a common architecture.) A directory entry links a name to an inode. It's possible for more than one directory entry to link to the same inode, hence the term “link”. Such a link is called a “hard link” by opposition to “soft links” or “symbolic links” which don't say “for this name, use this inode” but “for this name, look up that other name”.

Think of files as rooms and directory entries as doors. “Open the file /foo/bar” means “go to corridor /foo and go to room bar”. “Go to room bar” really means “open the door marked bar and enter the room” but “go to room bar” is an unremarkable way to say the same thing in a shorter way. It's possible to have more than one door leading to the same room.

When you create a hard link to an existing file (ln existing new), you're creating a second link to the same file, i.e. you're creating a new directory entry that links to the already-existing file. After creation, the two directory entries have equal status: there isn't one that is “primary” and one that's “secondary”, they're just both links to the same file.

You can also remove all the links to a file without removing the file itself. This happens if you delete a file (i.e. you remove all its directory entries) while a program still has the file open. The file remains on the filesystem, it's only actually removed when the last process that had the file open closes it. In the room-and-doors metaphor, a room that has no doors still takes up space.

5
  • when were hard and soft links first introduced, respectively?
    – n611x007
    Commented Feb 21, 2017 at 2:52
  • 2
    @n611x007: Could you please open a new question if you have a new or follow-up question? The comment section is not suitable or meant for new questions or extended discussion. Thanks. Commented Feb 21, 2017 at 11:52
  • 1
    @n611x007 Hard links are older than Unix, v1 had them. Symlinks in Unix are a bit newer; Wikipedia has some history. Commented Feb 21, 2017 at 12:46
  • Rooms and doors is a great analogy! Symlinks are then like signs to the doors. Commented Feb 22, 2017 at 2:39
  • 1
    @curiousdannii: Symlinks are more like rooms with a person sitting in them who says "oi m8 wrong office go to #234 instead" Commented Feb 22, 2017 at 20:02
8

In addition to all other answers I want to point out the following important properties:

A softlink is a true reference, i.e. it is a small file that contains a pathname. Resolving a softlink happens transparently to the application: if a process opens a file, say /this/path/here which is a symlink pointing to /that/other/path then the entire handling of opening /that/other/path is done by the OS. Furthermore, if /that/other/path happens to be a symlink itself, then this is also being dealt with by the OS. In fact, the OS follows the chain of symlinks until it finds something else (e.g. a regular file) or until it reaches SYMLOOP_MAX (see sysconf(3)) many entries, in which case the OS (more precisely: the according system call) returns an error and sets errno to ELOOP. Thus, a circular reference like xyz -> xyz will not stall the process. (For Linux systems see path_resolution(7) for full details.)

Note that a process can check whether a pathname is a symlink or not through the use of lstat(2) and may modify its file attributes (stored in the inode table) through lchown(2) and others (see symlink(7) for the whole story.)

Now, in terms of permission you will notice that symlinks always have permissions 777 (rwxrwxrwx in symbolic notation). This is the due to the fact that any other permissions can be bypassed by accessing the actual file, anyway. Conversely, 777 for a symlink does not make the symlinked file accessible if it was not accessible in the first place. For instance, a symlink with permissions 777 pointing to a file with permissions 640 does the file not make accessible for "other" (the general public). In other words, a file xyz is accessible through a symlink if and only if it is directly accessible, i.e. without indirection. Thus, the symlink's permissions have no security effect whatsoever.

One of the main visible differences between hardlinks and symlinks (a.k.a. softlinks) is that symlinks work across filesystems while hardlinks are confined to one filesystem. That is, a file on partition A can be symlinked to from partition B, but it cannot be hardlinked from there. This is clear from the fact that a hardlink is actually an entry in a directory, which consists of a file name and an inode number, and that inode numbers are unique only per file system.

The term hardlink is actually somewhat misleading. While for symlinks source and destination are clearly distinguishable (the symlink has its own entry in the inode table), this is not true for hardlinks. If you create a hardlink for a file, the original entry and the hardlink are indistinguishable in terms of what was there first. (Since they refer to the same inode, they share their file attributes such as owner, permissions, timestamps etc.) This leads to the statement that every directory entry is actually a hardlink, and that hardlinking a file just means to create a second (or third, or fourth...) hardlink. In fact, each inode stores a counter for the number of hardlinks to that inode.

Finally, note that ordinary users may not hardlink directories. This is because this must be done with utmost caution: an unwary user may introduce cycles into the otherwise strictly hierarchical file tree, which all usual tools (like fsck) and the OS itself are not prepared to deal with.

7

A simple answer:

  • A file entry in a directory is a hard link to that file.

  • Some files have more than one such hard link, as multiple hard links to the same file are allowed.

3

In the early days of Unix, the files internally were inodes on a particular disk drive. The file names were a more friendly way to access them.

A hard link was assigning more than one file name to an inode. You could create a file, hard link a second name to it and delete the first name and it was indistinguishable from simply having made the file with the second name in the first place.

Indeed, the system call that a program needs use to delete a file is 'unlink(2)`. The data doesn't go away until the last name is unlinked from the inode. (and the inode isn't open by a process somewhere)

This is what makes it easier for Linux to upgrade things while still running programs. If a process is running an executable, and an update happens then the program name gets reused, but the inode containing the old version still exists so it can continue to run. And when the last process running that old version stops, that old version storage is released.

Soft links came about because when you have a unitary file tree, with multiple mount points, you couldn't make a hard link from one hard drive to an inode on another another. So soft links were invented.

3
  • I guess this is a duplicate of unix.stackexchange.com/questions/9575/…
    – infixed
    Commented Feb 20, 2017 at 23:09
  • 2
    early days why is it any different now? your answer doesn't seem to reflect that view anyhow?
    – n611x007
    Commented Feb 21, 2017 at 2:44
  • @n611x007 Because 'these days' things like Linux can mount non-unix type file systems that don't fit the inode model. Like FAT derivatives and ISO-9660 for example. It's a much more diverse file system ecology instead of a one-size fits all
    – infixed
    Commented Feb 21, 2017 at 14:39
1

A file is the data written on the disk. This data is referenced by its inode, which contains metadata about the file telling the system what blocks on the disk are used by this file, among other things. A hard link points to the inode number of this file.

So technically, yes, you are creating a new file, but all this file contains is the inode number for the file it references and a it's name. It's better to think of it as creating a pointer to the inode, or a pointer to the file.

1

File is a widely used concept about entries in a filesystem.

Usually it includes Directory, Regular File (hard link), and Symbolic Link (soft link). And may even include device and socket.

My question is simply what is the difference of a file and a hard link ? Hard link is pointing to an inode, so what is a file ? Inode entry itself ? Or an Inode with a hard link ?

Let's say, I create a file with touch, then an Inode entry is created in the Inode Table. And I create a hard link, which has the same Inode number with the file. So did I create a new file ? Or the file is just defined as an Inode ?

Since even symbolic link is usually counted as file, a hard link itself can also be counted as a file. You can say it's a file regardless of whether it's a hard or soft link.

The concept is a bit ambiguous so it's also okay to say that an inode entry is a file, though you may actually want to refer to the data.

If you are a C++ or Java programmer you might want to read about std::filesystem::file_type, java.io.File, and java.nio.file.Files.

Details about differences between hard link and soft link can be found in the link in infixed's comment.

1

The difference between a "file" with a given name and a "hard link" is one of history. A (regular) file with a given name is created using a creat system call, a hard link is created using a link system call.

However, while humans talk about and remember the history of directory entries and call them files and hard links accordingly, the file system doesn't. The directory entries of "original file" and "hard link" are totally indistinguishable in quality: both establish a reference between a file name and the inode of a file, and once the last such reference is gone (references are not just file names for a file but also file descriptors with which an opened file can be accessed), the file for the unreferenced inode is considered deleted and the inode and associated file space are reclaimed.

So when humans contrast "files" and "hard links", the first came into being with a "link count of 1", and all the others came into being with a larger link count. The difference is academical, and indeed renaming a file at one time consisted in creating a hard link for the target name and then removing the link for the source name. Nowadays, usually a single system call is used that does this atomically.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .