Introduction β Files in Python
A file is a sequence of bytes stored on disk. Python treats files as objects you interact with through a file handle returned by open(). There are two broad categories:
- Text files β contain human-readable characters (e.g.,
.txt,.csv,.json). Python handles newline translation and encoding for you. - Binary files β contain raw bytes (e.g., images, PDFs, executables). You work with
bytesobjects instead of strings.
Before any operation you must open the file; after you're done you must close it. Python's with statement automates the closing step, even when exceptions occur.
The open() Function
The signature of open() is:
open(file, mode='r', encoding=None, errors=None, newline=None)
| Mode | Meaning | Creates file? | Truncates? |
|---|---|---|---|
'r' | Read (default) | No β raises FileNotFoundError | No |
'w' | Write | Yes | Yes β overwrites existing content |
'a' | Append | Yes | No β adds to end of file |
'x' | Exclusive create | Yes β fails if exists | N/A |
'r+' | Read + write | No | No |
'b' suffix | Binary mode (e.g., 'rb') | β | β |
Use encoding="utf-8" to avoid surprises across different operating systems. Windows defaults to the system codepage (e.g., cp1252), which differs from the UTF-8 default on Linux/macOS.
Context Managers β The with Statement
The with statement is the correct way to open files. It guarantees file.close() is called even if an exception is raised:
# β Old-style (risky if an exception occurs before close)
f = open("notes.txt", "r", encoding="utf-8")
content = f.read()
f.close() # might not run if read() raises!
# β
Modern style β close is guaranteed
with open("notes.txt", "r", encoding="utf-8") as f:
content = f.read()
# f is automatically closed here, even on exception
print(f.closed) # True
Reading Files
Python provides several methods for reading file content depending on how much you need at once:
# Assume "haiku.txt" contains three lines of a haiku
# read() β entire file as one string
with open("haiku.txt", "r", encoding="utf-8") as f:
entire = f.read()
print(repr(entire))
# 'An old silent pond\nA frog jumps into the pond\nSplash! Silence again.\n'
# readline() β one line at a time (including '\n')
with open("haiku.txt", "r", encoding="utf-8") as f:
first_line = f.readline()
second_line = f.readline()
print(first_line.strip()) # An old silent pond
print(second_line.strip()) # A frog jumps into the pond
# readlines() β list of all lines
with open("haiku.txt", "r", encoding="utf-8") as f:
lines = f.readlines()
print(f"Line count: {len(lines)}") # 3
# Iteration (most memory-efficient β processes one line at a time)
with open("haiku.txt", "r", encoding="utf-8") as f:
for line_no, line in enumerate(f, start=1):
print(f"{line_no}: {line.rstrip()}")
For files that are megabytes or gigabytes in size, file.read() loads everything into RAM. Iterating over the file object reads one line at a time, keeping memory usage constant regardless of file size.
Writing Files
Use mode 'w' to create or overwrite, and 'a' to append without touching existing content:
# write() β write a string
with open("output.txt", "w", encoding="utf-8") as f:
f.write("Line 1\n")
f.write("Line 2\n")
f.write("Line 3\n")
# writelines() β write a list of strings (does NOT add newlines automatically)
lines = ["alpha\n", "beta\n", "gamma\n"]
with open("output.txt", "w", encoding="utf-8") as f:
f.writelines(lines)
# Append mode β add to an existing file
with open("log.txt", "a", encoding="utf-8") as f:
f.write("2024-01-15 09:23:01 INFO Server started\n")
f.write("2024-01-15 09:23:05 DEBUG Listening on port 8080\n")
# Write multiple lines cleanly with print()
with open("report.txt", "w", encoding="utf-8") as f:
for i in range(1, 6):
print(f"Item {i}: {'β
' * i}", file=f)
# Verify the written content
with open("report.txt", "r", encoding="utf-8") as f:
print(f.read())
Working with Binary Files
Use 'rb' and 'wb' modes for images, audio files, compressed archives, or any non-text data:
# Copy a file in binary mode (works for any file type)
def copy_file(src, dst, chunk_size=65536):
"""Copy src to dst reading chunk_size bytes at a time."""
with open(src, "rb") as source, open(dst, "wb") as dest:
while True:
chunk = source.read(chunk_size)
if not chunk:
break
dest.write(chunk)
print(f"Copied '{src}' β '{dst}'")
# Read the first 4 bytes (magic bytes) of a PNG file
def is_png(filepath):
PNG_MAGIC = b'\x89PNG'
try:
with open(filepath, "rb") as f:
return f.read(4) == PNG_MAGIC
except (FileNotFoundError, IOError):
return False
File Position β seek() and tell()
with open("output.txt", "r", encoding="utf-8") as f:
print(f.tell()) # 0 β at the start
first = f.read(5)
print(f"Read: {first!r}")
print(f.tell()) # 5
f.seek(0) # rewind to start
print(f.tell()) # 0
f.seek(0, 2) # seek to end (0 bytes from end)
print(f"File size: {f.tell()} bytes")
The os Module β File System Operations
The os module provides functions for creating, removing, and inspecting files and directories:
import os
# Current working directory
print(os.getcwd()) # /home/user/project
# Check existence
print(os.path.exists("notes.txt")) # True or False
print(os.path.isfile("notes.txt")) # True if it's a regular file
print(os.path.isdir("data")) # True if it's a directory
# File metadata
size = os.path.getsize("notes.txt")
print(f"Size: {size} bytes")
# Create a directory (and nested directories)
os.makedirs("data/output", exist_ok=True)
# List directory contents
for entry in os.listdir("."):
print(entry)
# Rename / move a file
os.rename("old_name.txt", "new_name.txt")
# Delete a file (raises FileNotFoundError if missing)
if os.path.exists("temp.txt"):
os.remove("temp.txt")
# Walk a directory tree
for root, dirs, files in os.walk("project"):
for filename in files:
full_path = os.path.join(root, filename)
print(full_path)
Modern File Paths with pathlib
pathlib.Path (introduced in Python 3.4) provides an object-oriented API for paths. It is more readable and cross-platform than string-based os.path manipulation:
from pathlib import Path
# Create a Path object
p = Path("data/notes.txt")
# Path inspection
print(p.name) # notes.txt
print(p.stem) # notes
print(p.suffix) # .txt
print(p.parent) # data
print(p.absolute()) # /home/user/project/data/notes.txt
# Build paths with / operator (cross-platform!)
base = Path("project")
config = base / "config" / "settings.json"
print(config) # project/config/settings.json
# Check existence
print(p.exists()) # True / False
print(p.is_file())
print(p.is_dir())
# Read / write text directly
config_path = Path("config.txt")
config_path.write_text("debug=true\nport=8080\n", encoding="utf-8")
content = config_path.read_text(encoding="utf-8")
print(content)
# Read / write bytes directly
Path("data.bin").write_bytes(b'\x00\x01\x02\x03')
# Create directories
Path("logs/2024").mkdir(parents=True, exist_ok=True)
# List files matching a pattern (glob)
project = Path(".")
for py_file in project.glob("**/*.py"):
print(py_file)
# File size
print(f"Size: {p.stat().st_size} bytes")
For new code, prefer pathlib. It handles path separators on Windows (\) and Unix (/) automatically, and the dot-attribute API is far more readable than nested os.path.join(os.path.dirname(...)) calls.
Working with CSV and JSON Files
import csv
import json
# ββ CSV ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Writing a CSV
students = [
{"name": "Alice", "grade": "A", "score": 95},
{"name": "Bob", "grade": "B", "score": 82},
{"name": "Carol", "grade": "A", "score": 91},
]
with open("students.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["name", "grade", "score"])
writer.writeheader()
writer.writerows(students)
# Reading a CSV
with open("students.csv", "r", newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['name']}: {row['score']}")
# ββ JSON βββββββββββββββββββββββββββββββββββββββββββββββββββββ
config = {
"debug": True,
"port": 8080,
"allowed_hosts": ["localhost", "127.0.0.1"],
}
# Write JSON
with open("config.json", "w", encoding="utf-8") as f:
json.dump(config, f, indent=2)
# Read JSON
with open("config.json", "r", encoding="utf-8") as f:
loaded = json.load(f)
print(loaded["port"]) # 8080
Handling File Errors
from pathlib import Path
def safe_read(filepath):
"""Read a file and return its content or None on failure."""
try:
return Path(filepath).read_text(encoding="utf-8")
except FileNotFoundError:
print(f"File not found: {filepath}")
except PermissionError:
print(f"Permission denied: {filepath}")
except IsADirectoryError:
print(f"Expected a file, got a directory: {filepath}")
except OSError as e:
print(f"OS error ({e.errno}): {e.strerror}")
return None
def safe_write(filepath, content):
"""Write content to filepath, creating parent directories as needed."""
try:
path = Path(filepath)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
return True
except (PermissionError, OSError) as e:
print(f"Could not write {filepath}: {e}")
return False
result = safe_read("data/report.txt")
if result:
print(result[:100])
ποΈ Practical Exercise
Write a word_frequency(filepath) function that:
- Reads a text file line by line (efficient for large files).
- Splits each line into words, converts to lowercase, and strips punctuation.
- Counts how often each word appears using a dictionary.
- Returns the 10 most common words with their counts.
- Handles
FileNotFoundErrorandPermissionErrorgracefully.
π₯ Challenge Exercise
Build a simple logging system using only file I/O and pathlib. Create a Logger class that:
- Accepts a log directory path and a log level (
DEBUG,INFO,WARNING,ERROR). - Writes each log message to a daily log file named
YYYY-MM-DD.log. - Rotates (compresses or moves) logs older than 7 days.
- Exposes
.debug(),.info(),.warning(), and.error()methods.
Interview Questions
- What does the
withstatement guarantee when working with files? - What is the difference between
'w'and'a'file modes? - How do you read a very large file without loading it entirely into memory?
- What is the difference between
os.pathandpathlib.Path? - What does
newline=""do when opening a CSV file? - How do you create nested directories in one call in Python?
- What exception is raised when you try to open a file that doesn't exist?
- What is the difference between
read(),readline(), andreadlines()? - How do you check if a path exists and is a file (not a directory)?
- What does
seek(0)do to a file handle?
π Summary
- Use
open(file, mode, encoding="utf-8")to get a file handle. - Always use the
withstatement β it closes the file automatically, even on exceptions. - Modes:
'r'(read),'w'(write/overwrite),'a'(append),'x'(exclusive create); append'b'for binary. - Read methods:
read()(all at once),readline()(one line),readlines()(list), or iterate the file object (memory-efficient). - Write methods:
write(str),writelines(list), orprint(..., file=f). pathlib.Pathprovides a modern, object-oriented API; use/to join paths.os.makedirs(..., exist_ok=True)creates nested directories safely.- Always handle
FileNotFoundError,PermissionError, andOSErrorwhen doing file I/O.
Related Topics
Frequently Asked Questions
By default, Python buffers writes and flushes when the buffer is full or when the file is closed. Call f.flush() to force an immediate flush without closing. The with block closes the file (and thus flushes) when it exits, so you rarely need to call flush() manually.
Use errors='replace' or errors='ignore' in open() to handle bad bytes. For unknown encodings, install the chardet library (pip install chardet) and call chardet.detect(raw_bytes) to detect the encoding before opening.
Yes. Separate them with a comma: with open("src.txt") as src, open("dst.txt", "w") as dst:. Both files are opened and both are closed when the block exits, regardless of exceptions.
For new code, always prefer pathlib. It is more Pythonic, cross-platform by design, and supports method chaining. Use os.path only when working with legacy codebases or libraries that require string paths (though Path objects can be passed to most modern APIs).