As a Python developer, being able to programmatically determine the size of files is an important and useful skill. Whether you need to check disk usage, compare file sizes, or perform other file-related tasks, Python provides simple ways to get a file‘s size.
In this comprehensive guide, I‘ll cover several methods for getting a file‘s size in Python using modules available in the standard library.
Why Check File Sizes in Python?
Here are some common reasons you may want to check file sizes in a Python script or application:
-
Disk usage monitoring – Check how much space is being used on a disk by analyzing file sizes. This can help catch disks nearing capacity.
-
Data processing – When processing a set of files, you may want to filter, sort or access files based on size to optimize operations.
-
User information – Display file sizes to users so they understand how much space their files consume.
-
Comparison – Compare sizes of different files to see which is larger or smaller.
-
Validation – Check if a file meets size requirements, like ensuring an image is under 2 MB.
-
File transfer – When sending/receiving files over a network, having the file size can help calculate transfer time or monitor progress.
So in summary, file sizes are useful in many scenarios – getting them in Python is a good tool for any Python programmer.
Overview of File Size Methods in Python
Python contains built-in modules with functions to return a file‘s size in bytes. Here‘s a quick overview:
- os.path.getsize() – Get size from a file path
- os.stat() – Get size from a file path
- pathlib – Get size from a Path object
These functions generally return the file size measured in bytes as an integer. You can easily convert the byte count to other units like KB, MB or GB in Python if desired.
Now let‘s look at each method in more detail.
Method 1: os.path.getsize()
The simplest way to get a file‘s size in Python is using os.path.getsize()
. This method is available in the os.path
module and takes in a file path as a string.
Here‘s an example:
import os
file_path = ‘folder/report.pdf‘
size = os.path.getsize(file_path)
print(size)
# Prints out size in bytes, e.g. 250047
To break this down:
- Import
os
module - Specify file path as a string
- Call
getsize
and pass file path - This returns the size in bytes as an integer
Some key points about os.path.getsize()
:
-
Returns size in bytes – The integer returned is number of bytes. You‘ll need to manually convert to other units like KB or MB if needed.
-
Also works on directories – Get size can work on a folder path too, returning total sizes of all contents.
-
Handles errors – Returns OS errors like FileNotFoundError if path doesn‘t exist.
-
Lightweight – Very fast and lightweight way to check file size.
Let‘s look at a more advanced example, showing some size conversions and error handling:
import os
def get_size(file_path):
"""Return file size in bytes, KB and MB"""
try:
size_bytes = os.path.getsize(file_path)
size_kb = size_bytes / 1024
size_mb = size_kb / 1024
return size_bytes, size_kb, size_mb
except OSError as error:
print(f‘Error getting size for {file_path}: {error}‘)
return 0, 0, 0
print(get_size(‘data/test.pdf‘))
# Example output: (2500474, 2436.83203125, 2.37890625)
Here we calculated the size in bytes, KB and MB automatically by simply dividing by 1024 to go from bytes -> KB and KB -> MB.
We are also gracefully handling any OS errors with exception handling, printing custom error messages but allowing our code to continue execute.
So in summary, os.path.getsize()
is easy way to get file sizes that handles errors well. It‘s part of Python‘s standard library so works anywhere.
Method 2: os.stat()
An alternative in the os
module is using the os.stat()
method. This provides additional file properties beyond just size.
os.stat()
takes in a file path and returns a tuple with info like size, modified time, creation time, access mode (read/write/execute permissions) and more.
Here‘s an example usage:
import os
import time
file_info = os.stat(‘data/reports/monthly.csv‘)
file_size = file_info.st_size
modified = time.ctime(file_info.st_mtime)
print(f‘File size is {file_size} bytes‘)
print(f‘Last modified: {modified}‘)
Output:
File size is 2560004 bytes
Last modified: Tue Mar 5 17:35:43 2022
The main things to notice when using os.stat()
:
- It returns a tuple with many indexes holding file properties
- Use
st_size
field to get the file size in bytes - Other useful fields include st_mtime, st_atime, st_ctime for file times
The full tuple contains indexes like:
- st_mode – Protection bits
- st_ino – Inode number
- st_dev – Device
- st_nlink – Number of hard links
- st_uid – User ID of owner
- st_gid – Group ID of owner
- st_size – Total size, in bytes
- st_atime – Time of last access
- st_mtime – Time of most recent content modification
- st_ctime – Platform-dependent; time of most recent metadata change on Unix, or the time of creation on Windows)
So in summary os.stat()
provides more metadata around a file than just size, but retrieving the size using st_size
field is simple. The benefit over os.path.getsize()
is getting additional context if you need it.
Method 3: pathlib
The pathlib
module also contains methods to extract a file size. pathlib
contains a Path class that represents the path as an object.
Here‘s an example:
from pathlib import Path
data_file = Path(‘data/2023/sales.csv‘)
file_size = data_file.stat().st_size
The key things to note when using pathlib:
- Import Path class from pathlib
- Create a Path object for the file path
- Call
.stat()
method on that Path object – similar to os.stat() - Access the
st_size
field for size in bytes
This mirrors the os.stat()
method but using Path objects instead of passing strings.
Pathlib has many advantages for file path manipulation so is handy to use alongside getting the file size.
Some benefits of pathlib include:
- Object oriented approach to paths
- Easier manipulation than raw strings
- Built-in methods like
.glob()
,.read_text()
,.write_text()
- Integrates well with other modules that utilize Path objects
So in summary, pathlib
enables an object-oriented approach to getting a file size while providing many other helpful file-related methods.
Comparing the 3 File Size Methods
Method | Returns | Notes |
---|---|---|
os.path.getsize() | Bytes | Simple. Lightweight. Handles errors well. |
os.stat() | Tuple including size | More metadata around file. Use st_size for bytes. |
Path().stat() | Tuple including size | Integrates well with pathlib‘s other Path methods. |
In most cases, the simplest os.path.getsize()
is perfect for checking a file size.
But os.stat()
and pathlib options are useful if you need access to other file properties at the same time. Or want to leverage Path objects for easier path wrangling.
So choose the best approach based on your specific application!
Converting File Sizes
A final piece worth covering is converting the file size returned in bytes to more readable units like KB, MB or GB.
All the methods return the byte count as an integer. Here is simple code to display the size in KB or MB automatically:
file_bytes = os.path.getsize(‘data/media/video.mov‘)
file_kb = file_bytes / 1024
file_mb = file_kb / 1024
print(f‘File size is {file_bytes} bytes‘)
print(f‘File size is {file_kb:.2f} KB‘)
print(f‘File size is {file_mb:.2f} MB‘)
The key thing is dividing the byte count by 1024 will convert to the next higher unit. Bytes -> KB -> MB -> GB.
I also formatted the KB and MB versions to display with 2 decimal places to be more readable using f-string formatting.
This allows easy human-readable output for conveying file sizes.
Summary
Determining file sizes is useful in many Python applications. Key takeaways:
- Use
os.path.getsize()
for a simple way to get bytes - Leverage
os.stat()
or Path objects if you want additional file metadata - Access the
st_size
attribute of returned tuples for size in bytes - Divide byte count by 1024 to automatically display in other standard units like KB or MB
So in Python checking a file‘s size is trivial. Now you have several options to fit varying needs and can describe sizes to users in human-friendly formats.
Implementing these file size capabilities provides a foundation for building many useful Python programs ranging from disk utilities to automated media processing pipelines.