Symbolic and hard links for dummies

Let me start by saying that this will be mostly for users of Vista and Windows 7 (as those are the ones I’ve tested and understood). Apologies to users of WinXP and MacOS, what I cannot test I cannot comment on.

First of all, I wanted to write this as it became apparent that there is no one place out there to offer solid and valid information about the topic subject. Once I understood what each term is, it became so clear to understand what the general misconceptions and misunderstandings are.

Most of it comes from using the wrong terms. For instance, many sites describe the mklink /j (path) (path) command as creating a hard link to a directory. The mistake is that windows does not allow hard links to directories, it uses junctions (hence the /j switch of the command which stands for junction). Another mistake is that they provide the same explanation of hard links and junctions when they are two different things.

***WARNING***
none of the links here are applicable for network connections outside of your own local network, so no connections to FTP servers or cloud backups etc.

Let’s start with the simplest:

Shortcut

has no correlation to symbolic links, hard links or junctions. For normal home use it is best to be thought of as being only able to be acted upon by the user. This is the only way to access the file or directory that the shortcut points to. No program (other than Windows Explorer) can understand it.

Note to purists; yes, I have heard of programs that can work with shortcuts and mimic the way symbolic links work, but that is pertinent to the program used and not the whole system. It might be time to change those shortcuts with symbolic links so that every program you use can potentially use the data you point to without worrying about compatibility.

Some general rules to remember from now on.

All of the links are transparent to all the applications you will use. No program has any kind of way to understand it (except maybe some backup programs that try to be smart, but that is very specific and you should read the manual for explanations or contact their support to find out how they use links). The effect is that you can point ANY program you like ANYWHERE on your system. Examples include moving Google Chrome cache, Steam games and iTunes library. The reason you might want to move something does not matter, you simply can. What matters is choosing the right type of link for the job at hand, and even that is so simple that you will almost always use a symbolic link over all others. This is why I will describe the links from the most restrictive to the least one, so you can choose the right one.

Hard link

this is all about files and only files. You cannot use hard links with folders (remember we are talking about Windows here), you cannot use it across volumes (so no hard link on C: pointing to any other volume other than C:), and you cannot use it across the local network.

So what is it? To understand that you must understand what a file is. For the purposes of this discussion, a file is the filename and the data it contains. So you have this database file (maybe your expense-income file?) that is 10MB and has a filename house.db and it is inside the folder Peter (I’m keeping it simple, don’t rip my head off). For whatever reason you see fit, you decide to make a hard link to that file in your temp folder so you use mklink /h c:\temp\house1.db c:\Peter\house.db

Let’s break that down a bit. All the links we talk about here are made with the mklink command. The switch /h denotes you want to make a hard link. Then comes the full path of the link you are creating along with its name, and then the original filename with its full path. The name of the link can be anything you want, including the same as the original file, except in the case where you are creating the link in the same folder as the original file, in which case it must be different.

mklink /h c:\temp\house.db c:\Peter\house.db (correct)
mklink /h c:\temp\house1.txt c:\Peter\house.db (Correct, you can do that, but programs will get confused about the txt extension just as if you had renamed the original file)
mklink /h c:\test\house.db c:\Peter\house.db (correct)
mklink /h c:\Peter\house1.db c:\Peter\house.db (correct)
mklink /h c:\Peter\house.db c:\Peter\house.db (INCORRECT)

What you are doing here is creating a new FILENAME for the same DATA on your disk. Remember that a file is not just the data or the filename, but the combination of the two. This means that now the DATA on your disk has two filenames. Opening either one will access the same data, so if you open house.db, make changes, save and then open house1.db, the changes you made are there.

Furthermore, because all you do is create a different filename for the same data, you don’t use any extra hard disk space, no matter how many hard links you make for the same file. You can have 5 hard links of the same 50MB file and the space occupied on your system is still 50MB. Hard links are just pointers with different filenames to the same data. This is important to understand as deleting ANY hard link does NOT delete the data on your disk, as long as there is still at least one hard link available. Only when all hard links are deleted will the data get deleted.

If you don’t understand that, it might because you don’t see the point. Why would one need something like that, especially considering that hard links cannot cross volumes or network. Apparently, hard links are advanced stuff and thus only address advanced needs not usually needed by everyday users.

One common use is renaming. You don’t actually see that, but when the OS renames a file it is actually first creating a hard link with the name you want, and then deleting the old one. All this is transparent to you, but that is also one of the best uses of hard links in programming, since a programmer can use each step instead of calling a rename which allows him to avoid various problems. Why this is, how did systems rename before hard links and other such questions are outside the scope of this text, so I will stop it here.

Another use for hard links is to make the same file available to multiple programs (or users) without having multiple copies on your system that would consume space. If 10 programs want to access a file that is 100MB but they all want it on different folders, you would need 10 x 100MB = 1,0000 MB. Hard linking will, in this case, save you 900MB.

All in all, hard links are extremely specific in their uses and cover a very small percentage of what a normal user might want to do. Here is one danger. Remember Peter’s file house.db and the hard link he made. If Peter opens house.db and starts working on it, then Marry opens up house1.db and start working on it, each user will most probably make different changes. The data are being accessed by both users without a problem, but if Peter saves his changes, Marry doesn’t know about it. After a while, she saves her changes and in effect the data now show only changes made by Marry. Peter’s changes are gone forever. However, if you know a file is accessed by multiple users (or programs) that do not (or cannot) change anything, hard links are a good way to save hard disk space.

As a recap:
>> Used only for files
>> Used only on the same drive letter (volume)
>> Cannot be used on the same physical hard drive if the target and source are on different volumes
>> Definitions must be absolute paths (the whole path starting with the drive letter and ending with the filename)
>> They do not consume extra hard disk space
>> For the data to be deleted from your disk and the space reclaimed, you need to delete ALL the hard links pointing to it
>> Beware of syncing problems

Junction

this has given me the most trouble when first trying to figure it out. Most suggest that this is a hard link for directories. However it has nothing in common with the way hard links work.

A junction will not provide an alternate entry to the same directory if the original directory is deleted (unlike hard links which wait for all to be deleted before deleting the data). Deleting the original data will indeed delete them, as in remove the data from the hard drive and releasing the space back to you. The junction will still exist but point to nothing, which makes it an “orphan”. Trying to access it will produce a system error message informing you that the target is not there. You can make a new folder and give it the same name as the one you deleted (and it must be in the same place as the old folder) and then the junction will work again as if nothing happened. Of course, all the data the old folder contained will still be gone.

A junction can cross volumes so you can create a junction folder in C: and point it to D: so your data are in D: but programs can access them from C: without issue.

The only similarities to hard links are that junctions cannot point to network locations, and they both require absolute paths in their definitions.

The command is in this form:

mklink /j c:\Peter d:\Home\Peter

This will create a junction folder Peter in your C: drive root that points to data in your D: drive that are contained in the folder \Home\Peter

If you are in WindowsXP or Windows2000 then junctions are all you have available for folder links. If you are using Vista or Win7, then there is really no point in using junctions, use symbolic links instead, as they work in exactly the same way, and can work over your local network. Junctions are available in Vista and Win7 for reasons of backwards compatibility. I suggest you learn to use symlinks from now on.

As a recap:
>> Used only for folders
>> Can be used over all available local disk volumes
>> Definitions must be absolute paths (the whole path starting with the drive letter and ending with the filename)
>> They do not consume extra hard disk space
>> Deleting the junction will not delete the original data (Win2000 users can delete all containing files when deleting a junction from within Explorer. I have tested it on Vista and Win7; every time I did just that, using delete command, delete key, clearing recycle bin and even shift+delete key, and every time the original folder remained intact)
>> Deleting the original folder will “orphan” the junction. If you try to access the junction, a system error will be produced that it cannot find the folder it was expecting.
>> Recreating a folder of the same name and path as the old folder will make the junction work again.

Symbolic link

used for both files and directories, can be used to point to different volumes (that means between different drive letters as C: and D:) and can also be used across a local network. Furthermore, they can be used across platforms if the target system supports the correct commands, so a Vista machine can have a symlink to a Win7 or a Linux or any other number of POSIX-compliant OS. There are some restrictions to note for cross platform links, such as Linux allowing special characters in filenames that Windows cannot read, or that Windows only allowing 31 links where other systems allow more. Again, those are too advanced topics for this text, so I’ll move on.

Symlinks also allow relative paths in their definitions. Relative paths are in the form of:

Sample\file.txt
..\temp
\Utils

All these are relative paths and the definitions can get complicated if you don’t know what you are doing, however they are not that hard and they could prove to provide much needed functionality. To explain the above examples, lets assume a path c:\Utils\test\Sample\

>> To use Sample\file.txt you must be in c:\Utils\temp\ and you are telling the mklink to look for (or create) a file (or a file symlink) called file.txt inside the folder Sample
>> To use ..\temp you cannot be in the root folder but at least one folder inside, as you are telling the mklink command to look for (or make) a folder (or a folder symlink) with the name temp by going one folder back and looking in there. If you are in C:\ there is no “back” to go to.
>> To use \Utils it doesn’t matter where you are, as you are telling mklink to just look at the root of the volume you are in now.

Symlinks act the same way for both file and folder symlinks. What is true for one is true for the other. What makes them different is what you define at the start. If you want to point to a file you make a file symlink with the command

mklink c:\Utils\John.pst "g:\My Documents\Outlook\personal.pst"

There is no switch here. Also note that the source file path is enclosed in “” because the path contains a significant character, the space between My and Documents. This is relevant to all mklink commands, regardless the switch you use. If your path contains space characters, then you need to enclose it in “” no matter the link you are making.

If you are making a folder symlink then the command is

mklink /d "\Marry Housekeeping\Home Theatre" d:\manuals\hdtvcombo

which is telling it to make a folder symlink inside the Marry Housekeeping folder called Home Theatre. If there is no folder Marry Housekeeping in the root folder (in other words, if the path :\Marry Housekeeping does not exist) it will not work, the command would have to make a folder Marry Housekeeping and then make the folder symlink in there. This is something that mklink does not do.

Finally, if you have a machine in your network that you want to point on your system someplace, then you have to map the network drive it is on and then use that path as the source drive. Let’s say that your whole music collection is not available locally to you. You map the network drive that the music is in and you give it a volume letter (and for some reason you choose K). Now you can see all the music on K:\Family\Music but that is not where you want it on your machine, but in c:\My Documents\Family Music

For one, you must not have a folder Family Music inside the My Documents folder. Then you just type:

mklink /d "c:\My Documents\Family Music" k:\Family\Music

and that is all.

As a recap:
>> Used for folders and files
>> Can be used over all available local disk volumes
>> Can be used over local network with any platform using POSIX commands and SMB network protocol
>> Definitions can be absolute paths or relative paths
>> They do not consume extra hard disk space
>> Deleting the symlink will not delete the original data
>> Deleting the original folder or file will “orphan” the symlink. If you try to access the symlink, a system error will be produced that it cannot find the folder or file it was expecting.
>> Recreating a folder or file of the same name and path as the old one will make the symlink work again.

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.