Geek Logbook

Tech sea log book

Matei Zaharia – Spark: The Definitive Guide. Common Operations

Define Schemas manually When using Spark for production Extract, Transform, and Load (ETL), it is often a good idea to define your schemas manually, especially when working with untyped data sources like CSV and JSON, because schema inference can vary depending on the type of data that you read in. (Chambers, 2017, 66) SQL Expressions

Kleppmann – Designing Data Intensive Applications

A data-intensive application is typically built from standard building blocks that provide commonly needed functionality. For example, many applications need to: • Store data so that they, or another application, can find it again later (databases) • Remember the result of an expensive operation, to speed up reads (caches) • Allow users to search data

Testing in Python: Pytest Vs Unit test

How important are the tests? Testing is one of the most important skills we need to develop once we join the industry. In fact, knowing about testing is something that is not as evaluated as it could be. In general, the challenges are related to having logic related to programming skills or knowing about design

Setting environments in Python

When we start a project in Python we make the beginner mistake of installing each tool in any place. However, as we advance in our knowledge and looking to improve what we do we start thinking about good practice. One of them is the “virtual environment”. The official documentation says the following: A virtual environment

Empowerment for the new leaders in tech

Once a new hire is designing as a team leader of a team. One of the first challenges is how it could be possible that this new person could achieve ownership of the project and the inspiration of the team members. Companies have been talking about empowerment in recent years, but I couldn’t see it

Agro Analytics Datasets

Looking for data set to put into practice some knowledge about Agroanalytics, I find some interesting challenges: There are a lot of courses about it, for example, at Wageningen University (In fact, there are some exciting courses in @edXOnline) But, I’ve had some problems in getting valuable Agro Analytics datasets to work or play in

What is a bastion host?

Definition of Bastion Host A bastion host is a specific computer in a network that has the objective of not affecting another part of the system by the attack from outside the network. For Example, the internet.