Pandas DataFrame Operations: A Comprehensive Guide

Scaibu
4 min readSep 7, 2024

--

In this guide, we’ll explore how to manipulate data using Pandas — one of the most powerful and popular libraries in Python. Whether you’re new to Pandas or brushing up on your skills, this step-by-step walkthrough will make working with DataFrames intuitive and even fun! Let’s dive into CRUD operations (Create, Read, Update, Delete) for both columns and rows, and then level up with advanced features like grouping, merging, and time series analysis.

Get Hands-On with Pandas: Step-by-Step

Column Operations in Pandas

Columns in Pandas are like fields in a spreadsheet — they hold specific types of information. We’ll cover how to create, read, update, and delete columns effortlessly.

Create: How Do You Add a New Column?

Adding columns in Pandas is as simple as giving your DataFrame a new “header” and assigning values to it. For example, let’s add a new ‘Salary’ column to our existing data.

# Add a 'Salary' column
df['Salary'] = [50000, 60000, 70000]
print(df)

Output:

      Name  Age           City  Salary
0 Alice 25 New York 50000
1 Bob 30 San Francisco 60000
2 Charlie 35 Los Angeles 70000

Read: How Do You Access a Column in Pandas?

Let’s say you want to know the ages of all the people in your dataset. With Pandas, that’s as simple as calling the column name.

# Access the 'Age' column
print(df['Age'])

Output:

0    25
1 30
2 35
Name: Age, dtype: int64

Update: How Do You Modify Column Values?

Need to adjust your data? Maybe you realize everyone’s a year older now. No problem — Pandas makes it easy to update column values.

# Increment everyone's age by 1
df['Age'] = df['Age'] + 1
print(df)

Output:

      Name  Age           City  Salary
0 Alice 26 New York 50000
1 Bob 31 San Francisco 60000
2 Charlie 36 Los Angeles 70000

Delete: How Do You Remove a Column?

Not every piece of data is always useful. Let’s say you no longer need the ‘City’ information. With Pandas, dropping a column is a breeze.

# Drop the 'City' column
df = df.drop('City', axis=1)
print(df)

Output:

      Name  Age  Salary
0 Alice 26 50000
1 Bob 31 60000
2 Charlie 36 70000

Row Operations in Pandas

Rows represent individual records in your DataFrame. Here’s how to manipulate them with Pandas.

Create: Adding a New Row

Adding a row is like filling in a new entry in your table. Let’s add a new person, ‘David’, to our data.

# Add a new row for David
new_row = {'Name': 'David', 'Age': 40, 'Salary': 80000}
df = df.append(new_row, ignore_index=True)
print(df)

Output:

      Name  Age  Salary
0 Alice 26 50000
1 Bob 31 60000
2 Charlie 36 70000
3 David 40 80000

Read: How to Access a Specific Row

What if you want to view all the details for Bob? Use Pandas’ iloc[] to grab rows based on index.

# Get the row with index 1 (Bob)
print(df.iloc[1])

Output:

Name      Bob
Age 31
Salary 60000
Name: 1, dtype: object

Update: Modifying Row Values

Suppose Charlie just got a raise. Updating that in Pandas is as easy as finding the row and modifying the value.

# Update Charlie's salary
df.loc[2, 'Salary'] = 75000
print(df)

Output:

      Name  Age  Salary
0 Alice 26 50000
1 Bob 31 60000
2 Charlie 36 75000
3 David 40 80000

Delete: How to Remove a Row

Need to delete an entry? Maybe Alice has left the dataset. With Pandas, removing rows is painless.

# Remove the first row (Alice)
df = df.drop(0)
print(df)

Output:

      Name  Age  Salary
1 Bob 31 60000
2 Charlie 36 75000
3 David 40 80000

Advanced Pandas Operations

Now that we’ve covered the basics, let’s explore some advanced operations to unlock the full potential of Pandas.

Basic Aggregations: Crunching the Numbers

Pandas lets you compute useful statistics quickly, like averages or totals. Let’s find the average salary and the maximum age in our dataset.

# Aggregations
print("Average Salary:", df['Salary'].mean())
print("Maximum Age:", df['Age'].max())
print("Total Salary:", df['Salary'].sum())

Output:

Average Salary: 71666.67
Maximum Age: 40
Total Salary: 215000

Groupby and Aggregation: Summarizing by Groups

If you want to get insights like the average salary by department, you can group by any column and apply aggregations.

# Group by department and calculate salary stats
dept_stats = df1.groupby('Department').agg({
'Salary': ['mean', 'max', 'min'],
'Age': 'mean'
})
print(dept_stats)

Merging DataFrames: Combining Information

Merging two datasets in Pandas is like performing an SQL join. Let’s combine two DataFrames based on the ‘Name’ column.

# Merge two DataFrames
merged_df = pd.merge(df1, df2, on='Name', how='outer')
print(merged_df)

Output:

      Name  Age  Salary  Department         City  Skills
0 Alice 25 50000 HR NaN NaN
1 Bob 30 60000 IT New York Python
2 Charlie 35 70000 Finance San Francisco Java
3 David 40 80000 Marketing NaN NaN
4 Eve NaN NaN NaN Chicago C++
5 Frank NaN NaN NaN Houston SQL

Applying Functions: Transforming Data

Need to categorize data on the fly? Pandas’ apply() function is perfect for applying custom logic.

# Categorize salaries into 'High' and 'Low'
df1['Salary_Category'] = df1['Salary'].apply(lambda x: 'High' if x > 65000 else 'Low')
print(df1)

Time Series Operations: Handling Dates and Times

Pandas excels at handling time series data. You can create date ranges and perform time-based aggregations, such as calculating quarterly sales.

# Create a time series and aggregate by quarter
date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='M')
time_series_df = pd.DataFrame({
'Date': date_range,
'Sales': np.random.randint(1000, 5000, size=len(date_range))
})
print(time_series_df.set_index('Date').resample('Q')['Sales'].agg(['mean', 'max', 'min']))

Final Thoughts

And there you have it — a comprehensive guide to using Pandas for data manipulation. From basic CRUD operations to advanced techniques like merging, grouping, and time series analysis, you’re now equipped with the tools to handle almost any dataset that comes your way. Whether you’re a data scientist, analyst, or developer, mastering these Pandas operations will dramatically boost your productivity and insight into your data. Happy coding!

--

--

Scaibu
Scaibu

Written by Scaibu

Revolutionize Education with Scaibu: Improving Tech Education and Building Networks with Investors for a Better Future

No responses yet