Pandas Dataframe: apply method
Calculating Discounts, Taxes, and Total Amount in a DataFrame
Suppose you have the following data in a DataFrame:
| Product | Price | Category | |
|---|---|---|---|
| 0 | A | 100 | Electronic |
| 1 | B | 200 | Cloth |
| 2 | C | 150 | Electronic |
| 3 | D | 300 | Colth |
| 4 | E | 250 | Electronic |
In this DataFrame, you want to create three new columns: discount, taxes, and total_amount. However, you want to calculate the discount and taxes in different ways. The taxes are based on the product category, and the discount is based on the amount spent by the client.
DataFrame and Function Definitions
First, let’s create the DataFrame and define the functions to compute the discount and taxes
import pandas as pd
data = {
'Product': ['A', 'B', 'C', 'D', 'E'],
'Price': [100, 200, 150, 300, 250],
'Category': ['Electronic', 'Cloth', 'Electronic', 'Colth', 'Electronic']
}
products = pd.DataFrame(data)
def taxes(price, category):
tax = 0
if category == 'Electronic':
tax = price * 0.15 # 15%
elif category == 'Cloth':
tax = price * 0.10 # 10%
return tax
def discount(price):
discount = 0
if price > 200:
discount = price * 0.10 # 10%
else:
discount = price * 0.05 # 5%
return discount
If you want to do it you have to use the apply function. With this method you can
Apply a function along an axis of the DataFrame.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
Applying the Functions to the DataFrame
To create new columns for discount, taxes, and total_amount, we’ll use the apply function. This method allows you to apply a function along an axis of the DataFrame.
Calculating the Discount
products['Discount'] = products['Price'].apply(discount)
Here, apply is used to apply the discount function to each element in the ‘Price’ column. Since discount only needs the ‘Price’ value, apply can be used directly on the ‘Price’ column without a lambda function.
Calculating the Taxes
products['Taxes'] = products.apply(lambda x: taxes(x['Price'], x['Category']), axis=1)
In this case, the taxes function needs both the ‘Price’ and ‘Category’ columns. Therefore, we use apply with a lambda function to pass both columns to the taxes function. The axis=1 argument indicates that the function should be applied to each row.
Importance of axis
The axis parameter is crucial as it defines whether the function is applied to each column or each row:
axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
- 0 or ‘index’: apply function to each column.
- 1 or ‘columns’: apply function to each row.
At the end, total amount is the same as Taxes
Calculating the Total Amount
products['Total_Amount'] = products.apply(lambda x: x['Price'] - x['Discount'] + x['Taxes'], axis=1)
Finally, to calculate the total_amount, we use apply with a lambda function to operate on each row, taking into account the ‘Price’, ‘Discount’, and ‘Taxes’ columns.
Final DataFrame
After applying all the transformations, the resulting DataFrame will look like this:
| Product | Price | Category | Discount | Taxes | total_amount | |
|---|---|---|---|---|---|---|
| 0 | A | 100 | Electronic | 5.0 | 15.0 | 120.0 |
| 1 | B | 200 | Cloth | 10.0 | 20.0 | 230.0 |
| 2 | C | 150 | Electronic | 7.5 | 22.5 | 180.0 |
| 3 | D | 300 | Colth | 30.0 | 0.0 | 330.0 |
| 4 | E | 250 | Electronic | 25.0 | 37.5 | 312.5 |
Conclusion
In conclusion, the apply method in pandas allows you to use custom functions to perform operations on DataFrame columns or rows. By using apply with or without lambda functions and specifying the correct axis, you can efficiently calculate new columns based on existing data.