File extension

From LQWiki
Jump to navigation Jump to search

A file extension is a more or less arbitrary component of a filename, used to identify the type. Due to DOS and Windows' widespread and fairly consistent use of them, they have become widely known and there is wide consensus on them, with subtle variations. A failing of file extensions is that any extension can be applied to any file, so the extension is no guarantee of contents. The extension takes the form of a 'dot' and various letters, usually three, appended to the filename. For a list of commonly used file extensions, see List of file extensions.

DOS

DOS filenames supported a maximum of eight characters and, optionally, an extension up to three letters separated by a dot. The most familiar example would probably be 'AUTOEXEC.BAT' (DOS was all uppercase.) This extension had the rare feature of being meaningful to the operating system. Since DOS did not have an executable permission bit, this was the how the system knew to execute a non-binary executable. Rename foo.bat to foo.baz and it would not run. Most extensions were made up by the user (MYLETTER.LTR) or generated by the application on a per-application basis (MYDATA.DBF). Since DOS was a command line system, the extension didn't matter for many things but, with text interface shells (file managers), extensions could be 'associated' with applications, such that hitting 'enter' on a .TXT file would open that file in EDIT, for instance.

Windows

With the introducton of DOS7/Windows4 (Windows 95) the 8.3 restriction was removed and filenames like 'ReallyLongFilename.html' were possible. If a DOS user downloaded a set of webpages full of links to 'html' pages while his system could only save them as .HTM, the links would be broken. However, as the file format is actually called 'HyperText Markup Language' and HML somehow escaped people, '.html' has become the dominant extension. The same applies to .JPG --> .jpeg, except that .jpg seems to have remained standard.

Linux

Given Unix's "Everything Is A File" philosophy, file extensions are a late comer to the Unix/Linux world. Restricting a file to only be used with only a certain program (or type of program), or even suggesting that a file be used with a certain program, would go against that philosophy. However, over time file extensions, or at least file naming conventions, have been introduced without quite elaborating it into a logical, consistent, and useful semi-standard. Examples include:

  • Many configuration files such as 'bashrc' end in 'rc' without a dot and many configuration files don't.
  • ~ - Not strictly an extension, as it's appended to a file with or without an extension (letter~, letter.txt~) usually indicates a backup file.
  • Gcc expects a C source file to end in dot-C (mysource.c), an assembler source file to end in dot-S (myasm.s). Object files that have not yet been linked with libraries will have a .o extension.
  • Code libraries' names end in .a for static libraries, and .so usually with further numeric extensions for shared libraries.
  • The shell has the capacity to execute a shell script whatever its name, due to its executable bit, but some text editors implement syntax highlighting on a simplistic check for a '.sh' extension, rather than looking to the shebang magic number (the characteristic sequence of #!). So some shell scripts may have the .sh extension and some may not.
  • Many non-Microsoft formats such as X pixmaps have long been known by the '.xpm' extension.

Overall, Unix/Linux file extension development has been fairly random.

Since extensions are so useful and indicative, and as Unix/Linux has become increasingly graphically oriented (thus needing to execute applications and load files indirectly), extensions have become more prevalent and naturalized.