// Databricks notebook source exported at Sun, 19 Jun 2016 02:52:45 UTC
Scalable Data Science
prepared by Raazesh Sainudiin and Sivanand Sivaram
This is an elaboration of the Apache Spark 1.6 sql-progamming-guide.
Any contributions in this ‘databricksification’ of the programming guide are most welcome. Please feel free to send pull-requests or just fork and push yourself at https://github.com/raazesh-sainudiin/scalable-data-science.
Spark Sql Programming Guide
- Overview
    
- SQL
 - DataFrames
 - Datasets
 
 - Getting Started
    
- Starting Point: SQLContext
 - Creating DataFrames
 - DataFrame Operations
 - Running SQL Queries Programmatically
 - Creating Datasets
 - Interoperating with RDDs
        
- Inferring the Schema Using Reflection
 - Programmatically Specifying the Schema
 
 
 - Data Sources
    
- Generic Load/Save Functions
        
- Manually Specifying Options
 - Run SQL on files directly
 - Save Modes
 - Saving to Persistent Tables
 
 - Parquet Files
        
- Loading Data Programmatically
 - Partition Discovery
 - Schema Merging
 - Hive metastore Parquet table conversion
            
- Hive/Parquet Schema Reconciliation
 - Metadata Refreshing
 
 - Configuration
 
 - JSON Datasets
 - Hive Tables
        
- Interacting with Different Versions of Hive Metastore
 
 - JDBC To Other Databases
 - Troubleshooting
 
 - Generic Load/Save Functions
        
 - Performance Tuning
    
- Caching Data In Memory
 - Other Configuration Options
 
 - Distributed SQL Engine
    
- Running the Thrift JDBC/ODBC server
 - Running the Spark SQL CLI
 
 

