%md This and the next five notebooks are an elaboration of the [http://spark.apache.org/docs/latest/sql-programming-guide.html](http://spark.apache.org/docs/latest/sql-programming-guide.html) by Ivan Sadikov and Raazesh Sainudiin. # Spark Sql Programming Guide - Overview - SQL - DataFrames - Datasets - Getting Started - Starting Point: SQLContext - Creating DataFrames - DataFrame Operations - Running SQL Queries Programmatically - Creating Datasets - Interoperating with RDDs - Inferring the Schema Using Reflection - Programmatically Specifying the Schema - Data Sources - Generic Load/Save Functions - Manually Specifying Options - Run SQL on files directly - Save Modes - Saving to Persistent Tables - Parquet Files - Loading Data Programmatically - Partition Discovery - Schema Merging - Hive metastore Parquet table conversion - Hive/Parquet Schema Reconciliation - Metadata Refreshing - Configuration - JSON Datasets - Hive Tables - Interacting with Different Versions of Hive Metastore - JDBC To Other Databases - Troubleshooting - Performance Tuning - Caching Data In Memory - Other Configuration Options - Distributed SQL Engine - Running the Thrift JDBC/ODBC server - Running the Spark SQL CLI
This and the next five notebooks are an elaboration of the http://spark.apache.org/docs/latest/sql-programming-guide.html by Ivan Sadikov and Raazesh Sainudiin.
Spark Sql Programming Guide
- Overview
- SQL
- DataFrames
- Datasets
- Getting Started
- Starting Point: SQLContext
- Creating DataFrames
- DataFrame Operations
- Running SQL Queries Programmatically
- Creating Datasets
- Interoperating with RDDs
- Inferring the Schema Using Reflection
- Programmatically Specifying the Schema
- Data Sources
- Generic Load/Save Functions
- Manually Specifying Options
- Run SQL on files directly
- Save Modes
- Saving to Persistent Tables
- Parquet Files
- Loading Data Programmatically
- Partition Discovery
- Schema Merging
- Hive metastore Parquet table conversion
- Hive/Parquet Schema Reconciliation
- Metadata Refreshing
- Configuration
- JSON Datasets
- Hive Tables
- Interacting with Different Versions of Hive Metastore
- JDBC To Other Databases
- Troubleshooting
- Generic Load/Save Functions
- Performance Tuning
- Caching Data In Memory
- Other Configuration Options
- Distributed SQL Engine
- Running the Thrift JDBC/ODBC server
- Running the Spark SQL CLI
Last refresh: Never
SDS-2.x, Scalable Data Engineering Science
Last refresh: Never