7 Ways To Filter A Pandas Dataframe

When you need to deal with data inside your code in python pandas is the go-to library. There are so many subjects and functions we could talk about but now we are only focusing on what pandas dataframe filtering options are available and how to use them effectively to filter stuff out from your existing dataframe.

Filtering functions

So below I go through some of the functions that you can use for dataframe filtering purposes. As a starting point, let’s create a simple dataframe in IPython that we are going to use in this article:

import pandas as pd
data = {'name': ['Jack', 'Frank', 'Kelly', 'Rebecca', "Monica"], 
        'year': [2015, 2011, 2010, 2014, None], 
        'reports': [24, 4, 2, 31, None]}
df = pd.DataFrame(data, index = ['New York', 'New Orleans', 'Budapest', 'Helsinki', "Cologne"])
df
                name  reports    year
New York        Jack     24.0  2015.0
New Orleans    Frank      4.0  2011.0
Budapest       Kelly      2.0  2010.0
Helsinki     Rebecca     31.0  2014.0
Cologne       Monica      NaN     NaN

View Only Specific Columns

Showing only one column

df['name']
			      name
New York          Jack
New Orleans      Frank
Budapest         Kelly
Helsinki       Rebecca

Showing multiple columns:

df[['name', 'year']]
                name  year
New York        Jack  2015
New Orleans    Frank  2011
Budapest       Kelly  2010
Helsinki     Rebecca  2014

Filter Rows Where…

Showing only the rows where the year is greater than 2012:

df[df['year'] > 2012]
             name  reports  year
New York     Jack       24  2015
Helsinki  Rebecca       31  2014

Showing only the rows where the year is greater than 2012 AND reports is smaller than 10:

df[(df['year'] > 2012) & (df['reports'] < 30)]
          name  reports  year
New York  Jack       24  2015

First/Last Rows

Showing only the first 2 rows:

df[:2]
              name  reports  year
New York      Jack       24  2015
New Orleans  Frank        4  2011

Showing only the last row:

df[-1:]
             name  reports  year
Helsinki  Rebecca       31  2014

Query String

You can also use a query string (which has to be a boolean expression) to filter your dataframe using the query function. It may come handy when your filter options are dynamic.

Showing only the rows where the year is greater than 2012 OR name is “Frank”:

df.query('year > 2012 | name == "Frank"')
                name  reports  year
New York        Jack       24  2015
New Orleans    Frank        4  2011
Helsinki     Rebecca       31  2014

Where Value Is In Specified List

Let’s say we have a list:

numbers = [4, 2]

We want to view rows where the reports value is in our numbers list:

df[df['reports'].isin(numbers)]
              name  reports  year
New Orleans  Frank        4  2011
Budapest     Kelly        2  2010

Where Value Is/Not null(NaN)

Show rows where year value is not null (aka. exists):

df[df['year'].notnull()]

Show rows where year value is null (aka. missing):

df[df['year'].isnull()]

Wrapping Up

Okay so we covered some functions that we can use to filter our dataframe. Sure, there are more ways to filter stuff out but these are the ones that I find the most useful and easiest to use.

Click here if you’re interested in how pandas is used in a real world data analysis project.