Geek Logbook

Tech sea log book

Matei Zaharia – Spark: The Definitive Guide. Common Operations

Define Schemas manually When using Spark for production Extract, Transform, and Load (ETL), it is often a good idea to define your schemas manually, especially when working with untyped data sources like CSV and JSON, because schema inference can vary depending on the type of data that you read in. (Chambers, 2017, 66) SQL Expressions