```python
from pyspark import SparkConf, SparkContext
conf=SparkConf().setMaster("local").setAppName("My App")
sc=SparkContext(conf=conf)
logFile="file:///usr/local/spark/README.md"
logData=sc.textFile(logFile,2).cache()
numAs=logData.filter(lambda line: 'a' in line).count()
numBs=logData.filter(lambda line: 'b' in line).count()
print('Lines with a: %s, Lines with b: %s '%(numAs,numBs))
保存为.py文件 在命令行直接运行python **.py
报错:
ModuleNotFoundError: No module named 'pyspark'
路径问题
解决:
export SPARK_HOME=/usr/local/spark
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

报错:
ModuleNotFoundError: No module named 'py4j'
解决:
到spark安装目录下查找自己对应的py4版本(/usr/local/spark/python/lib/)
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.8.1-src.zip:$PYTHONPATH

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐