I’m a Unix guy, but the participants in my Python classes overwhelmingly use Windows. Inevitably, when we get to talking about working with files in Python, someone will want to open a file using the complete path to the file. And they’ll end up writing something like this:
filename = 'c:\abc\def\ghi.txt'
But when my students try to open the file, they discover that Python gives them an error, indicating that the file doesn’t exist! In other words, they write:
for one_line in open(filename):Â Â Â print(one_line)
What’s the problem? This seems like pretty standard Python, no?
Remember that strings in Python normally contain characters. Those characters are normally printable, but there are times when you want to include a character that isn’t really printable, such as a newline. In those cases, Python (like many programming languages) includes special codes that will insert the special character.
The best-known example is newline, aka ‘\n’, or ASCII 10. If you want to insert a newline into your Python string, then you can do so with ‘\n’ in the middle. For example:
s = 'abc\ndef\nghi'
When we print the string, we’ll see:
>>> print(s) abc def ghi
What if you want to print a literal ‘\n’ in your code? That is, you want a backslash, followed by an “n”? Then you’ll need to double the backslash:The “\\” in a string will result in a single backslash character. The following “n” will then be normal. For example:
s = 'abc\\ndef\\nghi'
When we say:
>>> print(s) abc\ndef\nghi
It’s pretty well known that you have to guard against this translation when you’re working with \n. But what other characters require it? It turns out, more than many people might expect:
- \a — alarm bell (ASCII 7)
- \b — backspace (ASCII
- \f — form feed
- \n — newline
- \r — carriage return
- \t — tab
- \v — vertical tab
- \ooo — character with octal value ooo
- \xhh — character with hex value hh
- \N{name} — Unicode character {name}
- \uxxxx — Unicode character with 16-bit hex value xxxx
- \Uxxxxxxxx — Unicode character with 32-bit hex value xxxxxxxx
In my experience, you’re extremely unlikely to use some of these on purpose. I mean, when was the last time you needed to use a form feed character? Or a vertical tab? I know — it was roughly the same day that you drove your dinosaur to work, after digging a well in your backyard for drinking water.
But nearly every time I teach Python — which is, every day — someone in my class bumps up against one of these characters by mistake. That’s because the combination of the backslashes used by these characters and the backslashes used in Windows paths makes for inevitable, and frustrating, bugs.
Remember that path I mentioned at the top of the blog post, which seems so innocent?
filename = 'c:\abc\def\ghi.txt'
It contains a “\a” character. Which means that when we print it:
>>> print(filename) c:bc\def\ghi.txt
See? The “\a” is gone, replaced by an alarm bell character. If you’re lucky.
So, what can we do about this? Double the backslashes, of course. You only need to double those that would be turned into special characters, from the table I’ve reproduced above: But come on, are you really likely to remember that “\f” is special, but “\g” is not? Probably not.
So my general rule, and what I tell my students, is that they should always double the backslashes in their Windows paths. In other words:
>>> filename = 'c:\\abc\\def\\ghi.txt' >>> print(filename) c:\abc\def\ghi.txt
It works!
But wait: No one wants to really wade through their pathnames, doubling every backslash, do they? Of course not.
That’s where Python’s raw strings can help. I think of raw strings in two different ways:
- what-you-see-is-what-you-get strings
- automatically doubled backslashes in strings
Either way, the effect is the same: All of the backslashes are doubled, so all of these pesky and weird special characters go away. Which is great when you’re working with Windows paths.
All you need to do is put an “r” before the opening quotes (single or double):
>>> filename = r'c:\abc\def\ghi.txt' >>> print(filename) c:\abc\def\ghi.txt
Note that a “raw string” isn’t really a different type of string at all. It’s just another way of entering a string into Python. If you check, type(filename) will still be “str”, but its backslashes will all be doubled.
Bottom line: If you’re using Windows, then you should just write all of your hard-coded pathname strings as raw strings. Even if you’re a Python expert, I can tell you from experience that you’ll bump up against this problem sometimes. And even for the best of us, finding that stray “\f” in a string can be time consuming and frustrating.
PS: Yes, it’s true that Windows users can get around this by using forward slashes, like we Unix folks do. But my students find this to be particularly strange looking, and so I don’t see it as a general-purpose solution.
Enjoyed this article? Join more than 11,000 other developers who receive my free, weekly “Better developers” newsletter. Every Monday, you’ll get an article like this one about software development and Python:
In my experience, an approach like this works well for paths created from a concatenated environment variable and string:
adobe_common_path = os.getenv(‘APPDATA’) + r’AdobeCommon’
Oh…ironically, it seems backslashes get removed here. Let’s try that again.
with ascii-encoded backslashes…
adobe_common_path = os.getenv(‘APPDATA’) + r’\Adobe\Common’
or maybe double backslashes are required…
adobe_common_path = os.getenv(‘APPDATA’) + r’\Adobe\Common’
I have repeatedly tried double , forward slash, and os, and pathlib methods. The problem is that open chokes on them all.
As you can see from this code, pathlib gets it right, open doesn’t.
The code:
print(pathlib.Path(‘c:Data’,’file1.csv’))
with open(pathlib.Path(‘c:Data’,’file1.csv’),’r’) as csvfile:
The output:
c:Datafile1.csv
Traceback (most recent call last):
File ~anaconda3Libsite-packagesspyder_kernelspy3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:datauntitled0.py:23
with open(pathlib.Path(‘c:Data’,’file1.csv’),’r’) as csvfile:
FileNotFoundError: [Errno 2] No such file or directory: ‘c:\Data\file1.csv’
So the question remains – How to open a file in Windows using Python?
BTW: this is Anaconda install with Spyder IDE
Thank you.
Thank you for this article, it helped to get a grasp of using raw strings in Python.
One addition: Users can not always get around using “/”. If you want to automate starting applications, you’ll have to use the OS specific path seperators while emulating command line calls with the subprocess library for example. Same procedure for using Python in Blender.
One more addition: You could use platform independent tools like “os.path.join(“path”, “to”, “file”). This would be a good workaround to be compatible in both Windows and Linux
Thank you so much, this was very clear.
I still have one issue though. Using the command os.getcwd() I get the path with one backward slash. How can I easily transform this/work with it? Logically adding r’ just makes the os.getcwd() a string.
If you’re getting ‘\\’ back from os.getcwd(), then I think that means you’re currently in the root Windows directory. I honestly don’t know that much about Windows, so I’m not 100% sure — I actually thought you needed to have a drive name before it, such as ‘c:\\’ or the like.
You can’t add ‘r’ to an existing string; the ‘r’ before the opening quote is used when creating a new string. So once you have the string from os.getcwd(), you can just use it as a string; it has already doubled whatever you need.
Boy, so much text for a simple thing. Could have been a 5 liner.
[…] Note: In case you were wondering what the “r” does in the subprocess line, it tells Python that the following backslashes are not escape characters and therefore allows us to use “normal” Windows paths. Before the “r” was introduced we had to use two backslashes that confused things somewhat. For a more detailed explanation read this post. […]
[…] Source link […]
Use forward slashes (and avoid drive letters) if you want your paths to work under multiple operating systems. This can work splendidly for relative paths, or well-defined fragments of paths.
Use raw strings if you have to copy and paste paths between Python and other Windows programs (e.g., the command prompt).
If both apply, then you need to think carefully, and get creative.
True, forward slashes work just fine (as I mention at the “PS” at the bottom of the post). But for people who only work on Windows, and who are relatively new to cross-platform issues, forward slashes seem super-weird and non-intuitive to them. I generally encourage people to use raw strings when they have to hard-code paths, so as to avoid the (nearly inevitable, it would seem) conflicts that they’ll experience.
Actually the Windows version of Python accepts regular slashes in the file paths. Example:
>>> infile = open(“C:/python27/tools/scripts/suff.py”)
>>> infile.read()
“#! /usr/bin/env python\n\n# suff\n …
PowerShell also accepts regular slashes in file paths. Example:
PS C:\> dir c:/python27/tools/scripts
Directory: C:\python27\tools\scripts
Mode LastWriteTime Length Name
—- ————- —— —-
-a—- 2015-08-21 3:37 PM 96 2to3.py
-a—- 2015-08-21 3:37 PM 4335 analyze_dxp.py
-a—- 2015-08-21 3:37 PM 4075 byext.py
-a—- 2015-08-21 3:37 PM 1699 byteyears.py
-a—- 2015-08-21 3:37 PM 4825 checkappend.py
OTOH, cmd.exe requires backslashes in file paths.
Right, at the bottom of the post I mentioned that you can use forward slashes — but that for many Windows people, this is weird and/or unintuitive.
Or, perhaps even better:
>>> filename = os.path.join(“C:” + os.sep, “abc”, “def”, “ghi.txt”)
>>> print(filename)
C:\abc\def\ghi.txt
This is true, but if people are just trying to hard-code a path in their program and then open a file, that seems like overkill…
In 10 years time future you will be stuck dealing with a mess of hardcoded hacks, and will wish you had done it “properly”.
It all depends on context.
If you’re writing a quick, one-off program, then it doesn’t matter.
But yes, if you’re writing production code that will need to survive for years and be maintained by others, then a hard-coded path is almost certainly a bad idea, and using something like os.path.join is almost certainly better. Or even better than that, using pathlib, which solves even more problems.