Tamara Shatar May 13, 2021
Pandas Header

Creating groups


In this example, we will work with a DataFrame that contains annual life expectancy and food availability (energy) data for 175 countries.

groupby()

 

groupby()

We can group by country and use the resulting groupby object to simplify reporting group statistics:

groupby Python
groupby Python


To report multiple statistics for each group, we can use the groupby object in combination with the agg() (aggregate) method.

groupby Python
groupby Python

Alternatively, we can change the grouping criteria, to instead report summaries of values by year.
groupby Python
groupby Python


Accessing groups of data

Individual groups can be accessed from the groupby object using the get_group() method. It can be used to simplify accessing subsets of data for exporting or other operations, like plotting.

groupby Python
groupby Python
groupby Python image
groupby Python image


Looping

The groupby object can also be used to perform iterative operations. We can loop over each group, retrieving both the data and the name of the group. The name of the group is the value of the variable used to group the data. The code below can be used to automate the process of exporting each group to its own csv file. Each file is saved with the group name.

groupby Python code
groupby Python code

By grouping on multiple columns, we can avoid using nested loops. In the example below, a column has been added (early_years) that indicates whether the data was collected prior to 1975 or after this date. By grouping the data on both country and entity, we can send data for each country, to two separate files; one for data prior to 1975 and one for data collected since 1975. Note, that the group names consist of tuples. These have been concatenated using the join() function to create a single name from each element of the tuple.

groupby Python screenshot
groupby Python screenshot


Filtering on number of data points

The groupby() method can also be used to determine the total number of rows of data for each group and used in combination with the filter() method to remove groups if they contain insufficient data.

The first line of code produces a regular DataFrame, that contains data only for those countries that have more than 53 years of data. By regrouping and reporting the count, we can see how many years of data are available for each of the remaining groups.

groupby Python example
groupby Python example


Learn more about Python and Pandas check out our Python Training Courses. Available remote online across South-East Asia. 

Related Topics

Contact Us

Why Nexacu? 

Valued by Individuals

4.72 / 5
Over 77289 Reviews

Trusted by Business

Awards and Accreditations

Follow us