Geek Logbook

Tech sea log book

Basic concepts about Amazon Redshift

One of the first things you will know when you do the course Getting Started with Amazon Redshift are the following

Redshift is based on PostgreSQL, and there are four key concepts to understand about it:

Concepts about ways of work with Amazon Redshift

  • Massive Parallel processing (MPP) – “Amazon Redshift distributes the rows of a table to the compute nodes so that data can be processed in parallel.”
  • Columnar Storage: “each data block stores values of a single column for multiple rows”
  • There are different ways to ingest data Redshift:
    • Amazon Glue
    • Using the Copy Command
    • Third party ETL tools
  • There are different ways to access the data from redshift (Data Access)
    • Amazon Redshift Query Editor
    • ODBC, JDBC
    • Quicksight
    • Amazon Redshift Data API
    • Amazon Redshift RSQL

The importance of understanding the “Node Types”

Amazon Redshift offers two node types to choose from, depending on the required performance, data size, and growth.

  • Use DC2 nodes for compute-intensive data warehouses with local solid-state drive (SSD) storage included. DC2 nodes store your data locally for high performance, and as the data size grows, you can add more compute nodes to increase the storage capacity of the cluster. For datasets under 1 TB uncompressed, we recommend DC2 node types for the best performance at the lowest price.
  • With RA3, you choose the number of nodes based on your performance requirements and pay only for the managed storage that you use.

Can I resize the cluster? Yes, but you have to read the guideline first: Resizing clusters in Amazon Redshift