ITPub博客

首页 > Linux操作系统 > Linux操作系统 > ITPUB SQL大赛第三期

ITPUB SQL大赛第三期

原创 Linux操作系统 作者:yangtingkun 时间:2011-04-06 23:55:10 0 删除 编辑

SQL大赛第三期解法。

第三期题目参考:http://www.itpub.net/thread-1408182-1-1.html

版主newkid点评参考:http://www.itpub.net/thread-1415335-1-1.html

 

 

根据评委的点评,第三次答题中存在一些考虑不周的地方,不过这并不妨碍分享优化的思路。至于是哪里出了问题,由于个人原因,暂时实在没有时间深究了,等过一段时间空闲下来,再仔细研究一下。

还是仿照前两期,贴出SQL以及优化的过程:

SQL> WITH ROUTE_ALL_S (C1, C2, LINES, DIS) AS
  2  (
  3     SELECT CITY1, CITY2, CITY1 || CITY2, DISTANCE
  4     FROM ROUTES
  5     UNION ALL
  6     SELECT CITY1, CITY2, LINES || '-' || CITY1 || CITY2, DISTANCE + DIS
  7     FROM ROUTES R, ROUTE_ALL_S A
  8     WHERE A.C2 = R.CITY1
  9  ),
 10  ROUTE_S AS
 11  (
 12     SELECT SUBSTR(LINES, 1, 1) R, SUBSTR(LINES, LENGTH(LINES)) T, MIN(DIS) DIS
 13     FROM ROUTE_ALL_S
 14     GROUP BY SUBSTR(LINES, 1, 1), SUBSTR(LINES, LENGTH(LINES))
 15  ),
 16  ROUTE_D AS
 17  (
 18     SELECT R, T, DIS
 19     FROM ROUTE_S
 20     UNION ALL
 21     SELECT T, R, DIS
 22     FROM ROUTE_S
 23  ),
 24  ROUTE_ALL_D (C1, C2, LINES, DISTANCE) AS
 25  (
 26     SELECT R, T, R || T, DIS
 27     FROM ROUTE_D
 28     UNION ALL
 29     SELECT SUBSTR(LINES, 1, 1), T, LINES || '-' || R || T, DIS + DISTANCE
 30     FROM ROUTE_D R, ROUTE_ALL_D A
 31     WHERE A.C2 = R.R
 32     AND INSTR(LINES, R || T) = 0
 33     AND INSTR(LINES, T || R) = 0
 34     AND C1 != T
 35     AND INSTR(LINES, R, 1, 2) = 0
 36     AND INSTR(LINES, T, 1, 2) = 0
 37     AND DISTANCE + DIS <= NVL((SELECT DIS FROM ROUTE_D RS WHERE C1 = RS.R AND R.T = RS.T), 9.9E38)
 38  ),
 39  RESULT AS
 40  (
 41     SELECT C1 R, C2 T,
 42             SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1, C2) COST,
 43             SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1) COST_CITY
 44     FROM ROUTE_ALL_D R, CITIES C
 45     WHERE R.C2 = C.CITY_NAME(+)
 46     GROUP BY C1, C2
 47  )
 48  SELECT R, NVL(T, 'TOTAL') T, NVL(SUM(COST), 0) COST
 49  FROM RESULT
 50  WHERE COST_CITY = (SELECT MIN(COST_CITY) FROM RESULT)
 51  GROUP BY GROUPING SETS ((R, T), R)
 52  ORDER BY R, DECODE(T, 'TOTAL', '0', T);

R          T                COST
---------- ---------- ----------
D          TOTAL           68356
D          A                3200
D          B                3224
D          C                3634
D          E                7300
D          F                7598
D          G                3840
D          H               14580
D          I                4400
D          J               12352
D          K                8228
D          L                   0

已选择12行。

已用时间:  00: 00: 03.61

这是原始SQL语句,首先构造出当前路径中两点的最短距离,然后利用UNION ALL获取所有路径,再次利用递归WITH获取所有可能的路径,并去除路径中重复的情况。最后计算起点和终点间最短的路径,并根据人数计算出每个城市的费用,进而算出哪个城市举办花费最少。

SQL> WITH ROUTE_D AS
  2  (
  3     SELECT CITY1 R, CITY2 T, DISTANCE DIS
  4     FROM ROUTES
  5     UNION ALL
  6     SELECT CITY2, CITY1, DISTANCE
  7     FROM ROUTES
  8  ),
  9  ROUTE_ALL_D (C1, C2, LINES, DISTANCE) AS
 10  (
 11     SELECT R, T, R || T, DIS
 12     FROM ROUTE_D
 13     UNION ALL
 14     SELECT A.C1, T, LINES || R || T, DIS + DISTANCE
 15     FROM ROUTE_D R, ROUTE_ALL_D A
 16     WHERE A.C2 = R.R
 17     AND C1 != T
 18     AND INSTR(LINES, R, 1, 2) = 0
 19     AND INSTR(LINES, T, 1, 2) = 0
 20     AND DISTANCE + DIS <= NVL((SELECT DIS FROM ROUTE_D RS WHERE A.C1 = RS.R AND R.T = RS.T), 9.9E38)
 21  ),
 22  RESULT AS
 23  (
 24     SELECT C1 R, C2 T,
 25             SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1, C2) COST,
 26             SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1) COST_CITY
 27     FROM ROUTE_ALL_D R, CITIES C
 28     WHERE R.C2 = C.CITY_NAME(+)
 29     GROUP BY R.C1, R.C2
 30  )
 31  SELECT R, NVL(T, 'TOTAL') T, NVL(SUM(COST), 0) COST
 32  FROM RESULT
 33  WHERE COST_CITY = (SELECT MIN(COST_CITY) FROM RESULT)
 34  GROUP BY GROUPING SETS ((R, T), R)
 35  ORDER BY R, DECODE(T, 'TOTAL', '0', T);

R          T                COST
---------- ---------- ----------
D          TOTAL           68356
D          A                3200
D          B                3224
D          C                3634
D          E                7300
D          F                7598
D          G                3840
D          H               14580
D          I                4400
D          J               12352
D          K                8228
D          L                   0

已选择12行。

已用时间:  00: 00: 00.09

随后优化的时候发现,原始SQL中第一步获取两个城市间最多距离的步骤完全没有必要,而且使得获取全部路径后数据量增大,使得查询性能下降。

这里直接将ROUTES表通过UNION ALL获取全部路径,然后利用递归WITH语句来获取所有可能的路径,除了添加判断路径中出现过的城市不在重复出现外,还添加判断,如果当前的两点间距离已经大于ROUTES表中保存的两点间距离,则路径被过滤,如果当前的路径在ROUTES表中找不到,那么路径保留。

优化后对于测试数据仅需要不到0.1秒。

SQL> WITH ROUTE_D AS
  2  (
  3     SELECT CITY1 R, CITY2 T, DISTANCE DIS
  4     FROM ROUTES
  5     UNION ALL
  6     SELECT CITY2, CITY1, DISTANCE
  7     FROM ROUTES
  8  ),
  9  ROUTE_ALL_D (C1, C2, LINES, DISTANCE) AS
 10  (
 11     SELECT R, T, CAST('"' || R || '"' || T || '"' AS VARCHAR2(4000)), DIS
 12     FROM ROUTE_D
 13     UNION ALL
 14     SELECT A.C1, T, LINES || T || '"', DIS + DISTANCE
 15     FROM ROUTE_D R, ROUTE_ALL_D A
 16     WHERE A.C2 = R.R
 17     AND INSTR(LINES, '"' || T || '"', 1, 1) = 0
 18     AND DISTANCE + DIS <= NVL((SELECT DISTANCE FROM ROUTES RS WHERE (A.C1 = RS.CITY1 AND R.T = RS.CITY2) OR (A.C1 = RS.CITY2 AND R.T = RS.CITY1)), 9.9E38)
 19  ),
 20  RESULT AS
 21  (
 22     SELECT C1 R, C2 T,
 23             SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1, C2) COST,
 24             SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1) COST_CITY
 25     FROM ROUTE_ALL_D R, CITIES C
 26     WHERE R.C2 = C.CITY_NAME(+)
 27     GROUP BY R.C1, R.C2
 28  )
 29  SELECT R, NVL(T, 'TOTAL') T, NVL(SUM(COST), 0) COST
 30  FROM RESULT
 31  WHERE COST_CITY = (SELECT MIN(COST_CITY) FROM RESULT)
 32  GROUP BY GROUPING SETS ((R, T), R)
 33  ORDER BY R, DECODE(T, 'TOTAL', CHR(0), T);

R          T                COST
---------- ---------- ----------
D          TOTAL           68356
D          A                3200
D          B                3224
D          C                3634
D          E                7300
D          F                7598
D          G                3840
D          H               14580
D          I                4400
D          J               12352
D          K                8228
D          L                   0

已选择12行。

已用时间:  00: 00: 00.07

对过滤条件进行了优化,去掉了一些不需要的过滤条件,在关联ROUTES表时,不再关联UNION ALL后的结果,而是通过OR条件进行过滤,这使得效率又提到了近1/5。而且这里还考虑城市名称互相包含的情况。

SQL> WITH ROUTE_D AS /* get full routes */
  2  (
  3     SELECT CITY1 R, CITY2 T, DISTANCE DIS
  4     FROM ROUTES
  5     UNION ALL
  6     SELECT CITY2, CITY1, DISTANCE
  7     FROM ROUTES
  8  ),
  9  ROUTE_ALL_D (C1, C2, LINES, DISTANCE) AS /* get every route of any two cities */
 10  (
 11     SELECT R, T, CAST('"' || R || '"' || T || '"' AS VARCHAR2(4000)), DIS /* avoid ora-1489 error */
 12     FROM ROUTE_D
 13     WHERE R != (SELECT MAX(CITY_NAME) FROM CITIES)
 14     UNION ALL
 15     SELECT A.C1, T, LINES || T || '"', DIS + DISTANCE /* '"' for avoid city name contains other city name */
 16     FROM ROUTE_D R, ROUTE_ALL_D A
 17     WHERE A.C2 = R.R
 18     AND INSTR(LINES, '"' || T || '"', 1, 1) = 0 /* avoit duplicate city */
 19     AND DISTANCE + DIS <= /* filter the distance longer than routes */
 20             NVL
 21             (
 22                     (
 23                             SELECT DISTANCE
 24                             FROM ROUTES RS
 25                             WHERE (A.C1 = RS.CITY1
 26                                     AND R.T = RS.CITY2)
 27                             OR (A.C1 = RS.CITY2
 28                                     AND R.T = RS.CITY1)
 29                     ),
 30                     9.9E38
 31             )
 32  ),
 33  RESULT_HALF AS
 34  (
 35     SELECT C1 R, C2 T, MIN(DISTANCE) DIS
 36     FROM ROUTE_ALL_D
 37     WHERE C1 < C2
 38     GROUP BY C1, C2
 39  ),
 40  RESULT_ALL AS
 41  (
 42     SELECT R, T, DIS
 43     FROM RESULT_HALF
 44     UNION ALL
 45     SELECT T, R, DIS
 46     FROM RESULT_HALF
 47  ),
 48  RESULT AS
 49  (
 50     SELECT R, T,
 51             DIS * 2 * C.MEMBERS COST,
 52             SUM(DIS * 2 * C.MEMBERS) OVER(PARTITION BY R) COST_CITY
 53     FROM RESULT_ALL R, CITIES C
 54     WHERE R.T = C.CITY_NAME(+) /* any city in route can win even the city have no member */
 55  )
 56  SELECT R, NVL(T, 'TOTAL') T, NVL(SUM(COST), 0) COST
 57  FROM
 58  (
 59     SELECT R, T, COST, RANK() OVER(ORDER BY COST_CITY) RN
 60     FROM RESULT
 61  )
 62  WHERE RN = 1
 63  GROUP BY GROUPING SETS ((R, T), R)
 64  ORDER BY R, DECODE(T, 'TOTAL', CHR(0), T);

R          T                COST
---------- ---------- ----------
D          TOTAL           68356
D          A                3200
D          B                3224
D          C                3634
D          E                7300
D          F                7598
D          G                3840
D          H               14580
D          I                4400
D          J               12352
D          K                8228
D          L                   0

已选择12行。

已用时间:  00: 00: 00.06

这是最后提交的版本,优化主要是考虑到ABBA的最终距离是等价的,因此最后可以先计算CITY1小于CITY2的情况,然后通过UNION ALL获取全部的结果。而且这种方法使得在递归WITH构造路径的时候,不再需要构造起点为名称最大的城市,因为这个城市为起点的所有路径,均可以从其他城市到这个城市的路径获得。

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-691816/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
暂无介绍

注册时间:2007-12-29

  • 博文量
    1955
  • 访问量
    10525202