Geek Logbook

Tech sea log book

Pandas Dataframe: apply method

Calculating Discounts, Taxes, and Total Amount in a DataFrame

Suppose you have the following data in a DataFrame:

ProductPriceCategory
0A100Electronic
1B200Cloth
2C150Electronic
3D300Colth
4E250Electronic

In this DataFrame, you want to create three new columns: discount, taxes, and total_amount. However, you want to calculate the discount and taxes in different ways. The taxes are based on the product category, and the discount is based on the amount spent by the client.

DataFrame and Function Definitions

First, let’s create the DataFrame and define the functions to compute the discount and taxes

import pandas as pd

data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Price': [100, 200, 150, 300, 250],
    'Category': ['Electronic', 'Cloth', 'Electronic', 'Colth', 'Electronic']
}

products = pd.DataFrame(data)

def taxes(price, category):
    tax = 0
    if category == 'Electronic':
        tax = price * 0.15  # 15% 
    elif category == 'Cloth':
        tax = price * 0.10  # 10%
    return tax

def discount(price):
    discount = 0
    if price > 200:
        discount =  price * 0.10  # 10% 
    else:
        discount =  price * 0.05  # 5% 

    return discount

If you want to do it you have to use the apply function. With this method you can

Apply a function along an axis of the DataFrame.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

Applying the Functions to the DataFrame

To create new columns for discount, taxes, and total_amount, we’ll use the apply function. This method allows you to apply a function along an axis of the DataFrame.

Calculating the Discount

products['Discount'] = products['Price'].apply(discount)

Here, apply is used to apply the discount function to each element in the ‘Price’ column. Since discount only needs the ‘Price’ value, apply can be used directly on the ‘Price’ column without a lambda function.

Calculating the Taxes

products['Taxes'] = products.apply(lambda x: taxes(x['Price'], x['Category']), axis=1)

In this case, the taxes function needs both the ‘Price’ and ‘Category’ columns. Therefore, we use apply with a lambda function to pass both columns to the taxes function. The axis=1 argument indicates that the function should be applied to each row.

Importance of axis

The axis parameter is crucial as it defines whether the function is applied to each column or each row:

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the function is applied:

  • 0 or ‘index’: apply function to each column.
  • 1 or ‘columns’: apply function to each row.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

At the end, total amount is the same as Taxes

Calculating the Total Amount

products['Total_Amount'] = products.apply(lambda x: x['Price'] - x['Discount'] + x['Taxes'], axis=1)

Finally, to calculate the total_amount, we use apply with a lambda function to operate on each row, taking into account the ‘Price’, ‘Discount’, and ‘Taxes’ columns.

Final DataFrame

After applying all the transformations, the resulting DataFrame will look like this:

ProductPriceCategoryDiscountTaxestotal_amount
0A100Electronic5.015.0120.0
1B200Cloth10.020.0230.0
2C150Electronic7.522.5180.0
3D300Colth30.00.0330.0
4E250Electronic25.037.5312.5

Conclusion

In conclusion, the apply method in pandas allows you to use custom functions to perform operations on DataFrame columns or rows. By using apply with or without lambda functions and specifying the correct axis, you can efficiently calculate new columns based on existing data.

Tags: