1、读取json数据(python)
sparkDF = spark.read.json("/FileStore/shared_uploads/2020_10_01_9_json.gz")
2、输出数据Schema
sparkDF.printSchema()
3、输出数据
display(sparkDF)
4、基于data中的数据创建视图
%sql
Create temporary view json_table
using json
options (path "/FileStore/shared_uploads/2020_10_01_9_json.gz")
5、查询数据
%sql
select type as Event_Type, actor as Actor,repo as Repository,created_at as Date_Time from json_table limit 10;
6、对于json字段,查询指定字段
%sql
select type as Event_Type, actor.login as Handle,repo.name as Repository,created_at as Date_Time from json_table limit 10;
《新程序员》:云原生和全面数字化实践50位技术专家共同创作,文字、视频、音频交互阅读