《Hive实战—通过指定经纬度点找出周围的POI列表》要点:
本文介绍了Hive实战—通过指定经纬度点找出周围的POI列表,希望对您有用。如果有疑问,可以联系我们。
维易PHP培训学院每天发布《Hive实战—通过指定经纬度点找出周围的POI列表》等实战技能,PHP、MYSQL、LINUX、APP、JS,CSS全面培养人才。
Hive实战-通过指定经纬度点找出周围的POI列表
需求描述:
有60多个店铺,按经纬度圈出周围1公里内所有POI列表
输入:
店铺1:116.123324,39.343313
店铺2:116.34232423,40.3423423
店铺3:116.4231,40.2343
店铺4:116.2342,39.3433
......
店铺65:.......
输出结果:
excel表,表头包含:店铺id、POI名称、一级分类、二级分类、经纬度
数据描述:
aoipoi表
主要字段:name, category_id, lng, lat
name:aoi或poi名称
type:name的类型(aoi或poi)
category_id:分类id.存储格式为“一级分类id|二级分类id,一级分类id|二级分类id,一级分类id|二级分类id”,可为空字符
lng:经度
lat:纬度
mst_dict表:
主要字段:id, name
id:分类的id
name:分类的名称
思路:
分别得到每个店铺周围1公里内的数据,然后合并在一起,再把分类id转化为中文存入HDFS文件.
(把id转化放在后面可以减少需要转化的id)
Hive SQL:
# 添加临时函数:指定点经纬度1km内的数据(下面的UDF在以后讲UDF/UDAF/UDTF时一并贴出)
add jar /home/zyl/hive-udf-1.0-SNAPSHOT.jar;
create temporary function dis_lnglat as 'com.zyl.udf.CalculatedLnglatDistance';
# 查询:店铺ID、POI名称、一级分类名称、二级分类名称、经度、纬度
with q1 as
(
select name, category_id, lng, lat from zyl.aoipoi a where a.type='poi'
)
insert overwrite directory '/user/zyl/tmp_cpy_tab/20171017/lnglat'
select /*+mapjoin(b)*/concat_ws("\t", a.id, poiname, one_cate, coalesce(name, ''), lng, lat) line from (
select /*+mapjoin(b)*/a.id, poiname, coalesce(name, '') one_cate, two_cate, lng, lat from (
select name poiname, coalesce(split(category_id2,'\\|')[0], '') one_cate, coalesce(split(category_id2,'\\|')[1], '') two_cate, lng, lat, id from (
select name, category_id, lng, lat, '1' id from q1 where dis_lnglat(lng,lat,116.123324,39.343313)<=1000
union all
select name, category_id, lng, lat, '2' id from q1 where dis_lnglat(lng,lat,116.34232423,40.3423423)<=1000
union all
select name, category_id, lng, lat, '3' id from q1 where dis_lnglat(lng,lat,116.4231,40.2343)<=1000
union all
select name, category_id, lng, lat, '4' id from q1 where dis_lnglat(lng,lat,116.2342,39.3433)<=1000
) a LATERAL VIEW explode(split(category_id,'\\,')) myTable1 AS category_id2
) a left join zyl.mst_dict b on (a.one_cate = b.id)
) a left join zyl.mst_dict b on (a.two_cate = b.id)
group by a.id, poiname, one_cate, name, lng, lat
讲解
函数:
coalesce()
split()
explode()
非空查找函数: COALESCE
语法: COALESCE(T v1, T v2,…)
返回值: T
说明: 返回参数中的第一个非空值;如果所有值都为NULL,那么返回NULL
举例:
select COALESCE(null,'is null','not null');
is null
select COALESCE(null, 0, 1);
0
select COALESCE('abc', 0, 1);
abc
分割字符串函数: SPLIT
语法: split(string str, stringpat)
返回值: array
说明:按照pat字符串分割str,会返回分割后的字符串数组
举例:
select split('abtcdtef','t');
["ab","cd","ef"]
select split('ab,cd,ef','\\,');
["ab","cd","ef"]
行拆列函数:EXPLODE
explode(ARRAY) 列表中的每个元素生成一行.在本篇文章中用到了split之后为数组(ARRAY)交给explode处理.
explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列
限制:
1、No other expressions are allowed in SELECT
SELECT pageid, explode(adid_list) AS myCol... is not supported
2、UDTF's can't be nested
SELECT explode(explode(adid_list)) AS myCol... is not supported
3、GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
下面语法/功能在以后再讲:
with q1 as table
union all
insert overwrite directory
LATERAL VIEW ... myTable AS ... alias
mapjoin
left join
转义(\\|、\\,)
注册函数