How to configure Jupyter notebook using pyspark ?

Step 1: First you need to install Jupyter notebook using below this command:

pip install jupyter
pip install "ipython[notebook]"

Step 2: Then you need to setup your environment variable in ~/.bashrc

export SPARK_HOME="/usr/hdp/current/spark-client"
export PYSPARK_SUBMIT_ARGS="--master local[2]"

Step 3: Then type in your terminal below this command

jupyter notebook --generate-config

Step 4: Now you need to open this file using vim or text editor


                   and paste this below lines
                   c.NotebookApp.ip = '*'
                   c.NotebookApp.open_browser = False
                   c.NotebookApp.port = 8889
                   c.NotebookApp.notebook_dir = u'/usr/hdp/current/spark-client/'

Step 5: And then create a file name “” in “/root/.ipython/profile_default/” location and paste this given text

import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")

sys.path.insert(0, os.path.join(spark_home, 'python/lib/'))

pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

execfile(os.path.join(spark_home, 'python/pyspark/'))

Step 6: Now type in your terminal “jupyter notebook” command

And enjoy jupyter notebook using pyspark