1、安装软件
1、python 3.10
2、hadoop-3.3.4 里面的winutils 要记得添加
3、java-17
4、spark-3.5.1-bin-hadoop3
python 安装 pyspark,Jupyter notebook
pip install pyspark
pip install jupyter notebook
2、添加环境变量
- JAVA_HOME=C:\PySparkService\java-17
- HADOOP_HOME=C:\PySparkService\hadoop-3.3.4
- SPARK_HOME=C:\PySparkService\spark-3.5.1-bin-hadoop3
- %JAVA_HOME%\bin
- %HADOOP_HOME%\bin
- %SPARK_HOME%\bin
下面环境不配置会报错
PYSPARK_PYTHON=python
#jupyter notebook 启动 pyspark
# 自己安装 jupyter notebook 使用下面环境变量
PYSPARK_DRIVER_PYTHON=jupyter
# anaconda 可能是下面的
PYSPARK_DRIVER_PYTHON=ipython
PYSPARK_DRIVER_PYTHON_OPTS=notebook
cmd 命令行启动pyspark
启动成功
PYSPARK_PYTHON=python
上面环境不设置会报下面错误
Py4JJavaError: An error occurred while calling o56.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage