Geek Logbook

Tech sea log book

Loading JSON Data into a Pandas DataFrame

When working with data, it’s common to encounter various file formats. JSON (JavaScript Object Notation) is a popular format for data exchange due to its readability and ease of use. In this post, we’ll explore how to load JSON data into a pandas DataFrame for further analysis.

Let’s assume we have a JSON file, data.json, containing survey responses. Each line in the file represents a separate JSON record. Here’s a sample of what the data might look like:

{"__key__":{"namespace":"","app":"s~app-12345","path":"\"responses_2022\", \"user1@example.com|XYZ123|2022-02-11|1|1\"","kind":"responses_2022","name":"user1@example.com|XYZ123|2022-02-11|1|1"},"__error__":[],"__has_error__":false}
{"startDate":"2021-10-20","stage":"I. Detection Stage","questionNum":"1","communityAdmin":"admin@example.com","title":"First Questionnaire","lastName":"Doe","endDate":"2022-08-15","firstName":"John","email":"user1@example.com","question":"Where do people migrate to in your community?","community":"Green Valley","response":"d) People do not migrate","age":"30","timestamp":"2022-01-28 21:13:59.289 UTC","__key__":{"namespace":"","app":"s~app-12345","path":"\"responses_2022\", \"user1@example.com|XYZ123|2021-10-20|1|1\"","kind":"responses_2022","name":"user1@example.com|XYZ123|2021-10-20|1|1"},"__error__":[],"__has_error__":false}
{"startDate":"2021-10-21","stage":"I. Detection Stage","questionNum":"4","communityAdmin":"admin@example.com","title":"First Questionnaire","lastName":"Smith","endDate":"2022-08-15","firstName":"Jane","email":"user2@example.com","question":"Did you pay for the recruiter's service?","community":"Blue Lake","response":"a) Yes","age":"25","timestamp":"2022-02-14 19:47:00.205 UTC","__key__":{"namespace":"","app":"s~app-12345","path":"\"responses_2022\", \"user2@example.com|XYZ123|2021-10-21|1|4\"","kind":"responses_2022","name":"user2@example.com|XYZ123|2021-10-21|1|4"},"__error__":[],"__has_error__":false}

Load JSON Data into a DataFrame

We will use the pd.read_json function from pandas to load the JSON data into a DataFrame. Since our JSON file contains records on separate lines, we’ll use the lines=True parameter.

import pandas as pd

# Load JSON data from file
df = pd.read_json('data.json', lines=True)

# Display the DataFrame
print(df)

Inspect the DataFrame

Once the data is loaded, it’s a good idea to inspect the DataFrame to understand its structure and contents. You can use methods like head(), info(), and describe() to get a quick overview.

# Display the first few rows
print(df.head())

# Display information about the DataFrame
print(df.info())

# Display summary statistics
print(df.describe())

Conclusion

Loading JSON data into a pandas DataFrame is a straightforward process, thanks to the pd.read_json function. This approach allows you to easily manipulate and analyze JSON data using pandas’ powerful data analysis tools. Whether you’re working with survey responses, log files, or any other JSON-formatted data, pandas provides a robust solution for your data processing needs.

Tags: