python sparksession

Innocent Iguana answered on February 12, 2024 Popularity 9/10 Helpfulness 7/10

answer python sparksession

python sparksession

Comment

Tip Innocent Iguana 1 GREPCC

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder \
	.master('local[*]') \ # Location of cluster, use all cores of local computer
    .appName("Load and Query CSV with SQL") \
    .getOrCreate()
# Define schema
schema = StructType([
StructField("col1", StringType()),
StructField("col2", IntegerType()),
StructField("col3", DoubleType())
])
# Load the CSV file into a DataFrame
df = spark.read.csv("file.csv",sep=',', header=True, inferSchema=True, nullValue='NA') # schema= schema
# Check column types
df.printSchema()
df.dtypes
# Register the DataFrame as a temporary table or view
df.createOrReplaceTempView("my_table")
# Print the tables in the catalog
print(spark.catalog.listTables())
# Run SQL queries on the DataFrame
query_result = spark.sql("SELECT * FROM my_table WHERE column_name = 'value'")
query_result.show()

sc = spark.sparkContext # Access the SparkContext from SparkSession
spark = SparkSession(sc) # Create a SparkSession from SparkContext
spark.stop() # Stop SparkSession

xxxxxxxxxx

from pyspark.sql import SparkSession

# Create a SparkSession

spark = SparkSession.builder \

    .master('local[*]') \ # Location of cluster, use all cores of local computer

    .appName("Load and Query CSV with SQL") \

    .getOrCreate()

# Define schema

schema = StructType([

StructField("col1", StringType()),

StructField("col2", IntegerType()),

StructField("col3", DoubleType())

])

# Load the CSV file into a DataFrame

df = spark.read.csv("file.csv",sep=',', header=True, inferSchema=True, nullValue='NA') # schema= schema

# Check column types

df.printSchema()

df.dtypes

# Register the DataFrame as a temporary table or view

df.createOrReplaceTempView("my_table")

# Print the tables in the catalog

print(spark.catalog.listTables())

# Run SQL queries on the DataFrame

query_result = spark.sql("SELECT * FROM my_table WHERE column_name = 'value'")

query_result.show()

sc = spark.sparkContext # Access the SparkContext from SparkSession

spark = SparkSession(sc) # Create a SparkSession from SparkContext

spark.stop() # Stop SparkSession

Popularity 9/10 Helpfulness 7/10 Language python

Source: Grepper

Tags: python

Link to this answer
Share Copy Link

Contributed on Feb 12 2024

Innocent Iguana

0 Answers Avg Quality 2/10

python sparksession

Contents

More Related Answers

python sparksession

Grepper

Documentation

Social

Legal

Contact

Oops, You will need to install Grepper and log-in to perform this action.