On a Mac, Excel produces csv files with the wrong line endings, which causes problems for git (amongst other things).
This issue plagues at least Excel 2008 and 2011, and possibly other versions.
Basically, saving a file as comma separated values (csv) uses a
\r rather than a line feed
\n as a newline. Way
back before OS X, this was actually the correct Mac file ending, but
after the move to be more unix-y, the correct line ending should be
Given that nothing has used this as the proper line endings for over a decade, this is a bug. It’s a real pity that Microsoft does not see fit to fix it.
Why this is a problem
This breaks a number of scripts that require specific line endings.
This also causes problems when version controlling your data. In
particular, tools like
git diff basically stop working as they work
line-by-line and see only one long line
diff work properly makes it really hard to see where
changes have occurred in your data.
Git has really nice facilities for translating between different line endings – in particular between Windows and Unix/(new) Mac endings. However, they do basically nothing with old-style Mac endings because no sane application should create them. See here, for example.
The solution is to edit
.git/config (within your repository) to add
1 2 3
and then create a file
.gitattributes that contains the line
This translates the line endings on import and back again on export
(so you never change your working file). Things like
git diff use
the “clean” version, and so magically start working again.
.gitattributes file can be (and should be) put under
version control, the
.git/config file needs to be set up separately
on every clone. There are good reasons for this (see
It would be possible to automate this to some degree with the
--config argument to
git clone, but that’s still basically manual.
This seems to generally work, but twice in use large numbers of files have been marked as changed when the filter got out-of-sync. We never worked out what caused this, but one possible culprit seems to be Dropbox (but you probably should not keep repositories on dropbox anyway).
The nice thing about the clean/smudge solution is that it leaves files in the working directory unmodified. An alternative approach would be to set up a pre-commit-hook that ran csv files through a similar filter. This will modify the contents of the working directory (and may require reloading the files in Excel) but from that point on the file will have proper line endings.
More manually, if files are saved as “Windows comma separated (.csv)”
you will get windows-style line endings (
\r\n) which are at least
treated properly by git and are in common usage this century.
However, this requires more remembering and makes saving csv files
from Excel even more tricky than normal.