微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何在Google BigQuery中进行转换

假设我将以下查询发送到BQ:

SELECT shipmentID, category, quantity
FROM [myDataset.myTable]

此外,假设查询返回如下数据:

shipmentID  category  quantity
1           shoes     5
1           hats      3
2           shirts    1
2           hats      2
3           toys      3
2           books     1
3           shirts    1

如何从BQ中调整结果以产生输出,如下所示:

 shipmentID   shoes  hats  shirts  toys  books
 1            5      3     0       0     0
 2            0      2     1       0     1
 3            0      0     1       3     0

作为一些额外的背景,我实际上有2000个需要转动的类别,数据量是这样的,我不能直接通过Python中的Pandas DataFrame(使用所有内存,然后慢速爬行).我尝试使用关系数据库,但遇到了列限制,所以我希望能够直接在BQ中执行它,即使我必须通过python构建查询本身.有什么建议?

**编辑1
我应该提一下,转动数据本身可以在块中完成,因此不是问题.真正的麻烦在于之后尝试进行聚合,因此每个shipmentID只有一行.这就是吃掉所有RAM的原因.

**编辑2
在尝试下面接受的答案后,我发现尝试使用它来创建2k列数据透视表导致“资源超出”错误.我的BQ团队能够重构查询以将其分解为更小的块并允许它通过.查询的基本结构如下:

SELECT
  SetA.*,
  SetB.*,
  SetC.*
FROM (
  SELECT
    shipmentID,
    SUM(IF (category="Rocks", qty, 0)),
    SUM(IF (category="Paper", qty, 0)),
    SUM(IF (category="Scissors", qty, 0))
  FROM (
    SELECT
      a.shipmentid shipmentid,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetA
INNER JOIN EACH (
  SELECT
    shipmentID,
    SUM(IF (category="Jello Molds", quantity, 0)),
    SUM(IF (category="Torque Wrenches", quantity, 0))
  FROM (
    SELECT
      a.shipmentID shipmentID,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetB
ON
  SetA.shipmentid = SetB.shipmentid
INNER JOIN EACH (
  SELECT
    shipmentID,
    SUM(IF (category="Deep Thoughts", qty, 0)),
    SUM(IF (category="Rainbows", qty, 0)),
    SUM(IF (category="Ponies", qty, 0))
  FROM (
    SELECT
      a.shipmentid shipmentid,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetC
ON
  SetB.shipmentID = SetC.shipmentID

通过一个一个添加INNER JOIN EACH段,可以无限期地继续上述模式.对于我的应用程序,BQ能够处理每个块大约500列.

解决方法:

这是一种方法

select shipmentID,
  sum(IF (category='shoes', quantity, 0)) AS shoes,
  sum(IF (category='hats', quantity, 0)) AS hats,
  sum(IF (category='shirts', quantity, 0)) AS shirts,
  sum(IF (category='toys', quantity, 0)) AS toys,
  sum(IF (category='books', quantity, 0)) AS books,
from
  (select 1 as shipmentID,           'shoes' as category,    5 as quantity),
  (select 1 as shipmentID,           'hats' as category,      3 as quantity),
  (select 2 as shipmentID,           'shirts' as category,    1 as quantity),
  (select 2 as shipmentID,           'hats' as category,      2 as quantity),
  (select 3 as shipmentID,           'toys' as category,      3 as quantity),
  (select 2 as shipmentID,           'books' as category,     1 as quantity),
  (select 3 as shipmentID,           'shirts' as category,    1 as quantity),
group by shipmentID

返回:

+-----+------------+-------+------+--------+------+-------+---+
| Row | shipmentID | shoes | hats | shirts | toys | books |   |
+-----+------------+-------+------+--------+------+-------+---+
|   1 |          1 |     5 |    3 |      0 |    0 |     0 |   |
|   2 |          2 |     0 |    2 |      1 |    0 |     1 |   |
|   3 |          3 |     0 |    0 |      1 |    3 |     0 |   |
+-----+------------+-------+------+--------+------+-------+---+

请参阅其他pivot table example的手册.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐