On a Mac, Excel produces csv files with the wrong line endings, which causes problems for git (amongst other things).
This issue plagues at least Excel 2008 and 2011, and possibly other versions.
Basically, saving a file as comma separated values (csv) uses a
carriage return \r
rather than a line feed \n
as a newline. Way
back before OS X, this was actually the correct Mac file ending, but
after the move to be more unix-y, the correct line ending should be
\n
.
Given that nothing has used this as the proper line endings for over a decade, this is a bug. It’s a real pity that Microsoft does not see fit to fix it.
Why this is a problem
This breaks a number of scripts that require specific line endings.
This also causes problems when version controlling your data. In
particular, tools like git diff
basically stop working as they work
line-by-line and see only one long line
(e.g. here).
Not having diff
work properly makes it really hard to see where
changes have occurred in your data.
Git has really nice facilities for translating between different line endings – in particular between Windows and Unix/(new) Mac endings. However, they do basically nothing with old-style Mac endings because no sane application should create them. See here, for example.
A solution
There are at leat two stack overflow questions that deal with this (1 and (2).
The solution is to edit .git/config
(within your repository) to add
lines saying:
1 2 3 |
|
and then create a file .gitattributes
that contains the line
1
|
|
This translates the line endings on import and back again on export
(so you never change your working file). Things like git diff
use
the “clean” version, and so magically start working again.
While the .gitattributes
file can be (and should be) put under
version control, the .git/config
file needs to be set up separately
on every clone. There are good reasons for this (see
here.
It would be possible to automate this to some degree with the
--config
argument to git clone
, but that’s still basically manual.
Issues
This seems to generally work, but twice in use large numbers of files have been marked as changed when the filter got out-of-sync. We never worked out what caused this, but one possible culprit seems to be Dropbox (but you probably should not keep repositories on dropbox anyway).
Alternative solutions
The nice thing about the clean/smudge solution is that it leaves files in the working directory unmodified. An alternative approach would be to set up a pre-commit-hook that ran csv files through a similar filter. This will modify the contents of the working directory (and may require reloading the files in Excel) but from that point on the file will have proper line endings.
More manually, if files are saved as “Windows comma separated (.csv)”
you will get windows-style line endings (\r\n
) which are at least
treated properly by git and are in common usage this century.
However, this requires more remembering and makes saving csv files
from Excel even more tricky than normal.