Pandas is one of the most popular data manipulation libraries in Python. It allows you to easily load, analyze, and transform tabular data. A common task when working with Pandas DataFrames is inserting new rows.
In this comprehensive guide, we will cover multiple methods to insert rows into a Pandas DataFrame:
- Using DataFrame.loc
- Using DataFrame.append()
- Using DataFrame.insert()
- Using DataFrame.concat()
- Inserting at the beginning of a DataFrame
- Inserting multiple rows
We will look at code examples and explanations for each method. By the end, you will have complete mastery of inserting rows with Pandas.
What is a Pandas DataFrame?
Before we dive into the various row insertion methods, let‘s briefly go over Pandas DataFrames.
A DataFrame is a 2-dimensional tabular data structure with labeled rows and columns. You can think of it like a spreadsheet or SQL table.
Here is an example DataFrame with 3 columns (Name, Age, Location) and 4 rows:
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Claire", "Dan"],
"Age": [25, 30, 27, 32],
"Location": ["California", "Texas", "New York", "Washington"]
}
df = pd.DataFrame(data)
print(df)
Name Age Location
0 Alice 25 California
1 Bob 30 Texas
2 Claire 27 New York
3 Dan 32 Washington
The DataFrame allows easy access to the data. For example, we can select a column like this:
ages = df["Age"]
print(ages)
# [25, 30, 27, 32]
Now that we know the basics of Pandas DataFrames, let‘s look at the various methods to insert new rows.
1. Insert Row Using loc
The loc property allows you to access rows by their index. To insert a new row, specify the index where you want to insert and assign the new row values.
new_row = {"Name": "Erin", "Age": 28, "Location": "Ohio"}
df.loc[4] = new_row
print(df)
Name Age Location
0 Alice 25 California
1 Bob 30 Texas
2 Claire 27 New York
3 Dan 32 Washington
4 Erin 28 Ohio
The new row got inserted with index 4. This makes sense since our original DataFrame had 4 rows from 0 to 3 index.
loc also allows inserting multiple rows in one shot:
rows = [{"Name": "Frank", "Age": 33, "Location": "Florida"},
{"Name": "Grace", "Age": 26, "Location": "Arizona"}]
df.loc[5:6] = rows
print(df)
This inserted two rows at once at index 5 and 6.
2. Insert Row Using DataFrame.append()
The append() method allows adding a new row to the DataFrame. For example:
new_row = {"Name": "Hannah", "Age": 34, "Location": "Georgia"}
df = df.append(new_row, ignore_index=True)
print(df)
This will add the row to the bottom of the DataFrame and reassign the row indexes starting from 0.
We have to specify ignore_index=True
to reset the indexes, otherwise the index continues from the existing highest value.
append() also allows appending multiple rows together:
rows2 = [{"Name": "Isaac", "Age": 40, "Location": "Michigan"},
{"Name": "Julia", "Age": 29, "Location": "Pennsylvania"}]
df = df.append(rows2, ignore_index=True)
print(df)
So append() provides a convenient way to insert one or more rows easily.
One difference between loc and append() is that append creates a new DataFrame copy, while loc inserts in-place into the existing DataFrame.
3. Insert Rows Using DataFrame.insert()
Pandas also provides a direct insert() method to insert at a particular location:
new_row = {"Name": "Kate", "Age": 30, "Location": "Hawaii"}
# Insert at 2nd index
df.insert(2, "New Row", new_row)
print(df)
Name Age Location
0 Alice 25 California
1 Bob 30 Texas
2 New Row 30 Hawaii
3 Claire 27 New York
4 Dan 32 Washington
The arguments for insert() are:
- Index location for insertion
- Name of new column (optional)
- Data for new row
This allows precise index-based insertion, similar to loc. But the syntax is a bit different.
4. Inserting Rows Using concat()
Pandas concat function joins DataFrames together, similar to SQL UNION or Excel copy-paste.
We can leverage this for row insertion too:
# Original DataFrame
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
# New row to insert
df2 = pd.DataFrame({"A": [5], "B": [6]})
result = pd.concat([df1, df2]).reset_index(drop=True)
print(result)
A B
0 1 3
1 2 4
2 5 6
So concat works for inserting rows. But it‘s usually overkill for just inserting. The other methods are simpler for most use cases.
5. Insert Row at Beginning of DataFrame
To insert a row at the top, you can specify a negative index with loc:
new_row = {"Name": "Zara", "Age": 40, "Location": "Oregon"}
df.loc[-1] = new_row # Insert at top
print(df)
The existing data is pushed down, and we now have the new row at index 0.
6. Inserting Multiple Rows
We‘ve looked at various methods to insert a single row. Let‘s look at some techniques for efficient insertion of multiple rows.
Method 1: Construct New DataFrame and Concatenate
One method is to construct a new DataFrame with the data for additional rows. Then concatenate it with the original DataFrame.
For example:
new_rows = [{"Name": "Nina", "Age": 32, "Location": "Alabama"},
{"Name": "Oliver", "Age": 19, "Location": "Rhode Island"}]
new_df = pd.DataFrame(new_rows)
updated_df = pd.concat([df, new_df]).reset_index(drop=True)
print(updated_df)
Method 2: Extend existing DataFrame
Rather than creating a new DataFrame, we can also directly extend the original DataFrame:
more_rows = [{"Name": "Piper", "Age": 35, "Location": "Minnesota"},
{"Name": "Quinn", "Age": 25, "Location": "Wisconsin"}]
df.loc[len(df)] = more_rows
print(df)
Here we insert starting from the current length to automatically extend at the end.
In summary, multiple row insertion can be done either by:
- Concatenating a new DataFrame
- Directly extending existing DataFrame length
Choose the approach based on your use case.
Summary
We went over several methods for inserting rows into Pandas DataFrames:
- loc: Precise index-based row insertion
- append(): Insert row at end of DataFrame
- insert(): Insert at a particular location
- concat(): Join DataFrames for row insertion
- Negative index: Insert rows at beginning
- Construct new DataFrame or Extend existing: Helper methods for multiple rows
You now have a solid grasp of all common techniques to insert rows with Pandas. The best method depends on your specific needs – whether you want to insert at a particular location, end, etc.
Now go ahead and apply your learnings to wrangle tabular data in Python effectively! Let me know in the comments if you have any other row insertion tricks in Pandas.