总体执行流程示意图:html
DLA在上海region的VPC参数信息:mysql
DLA Region | 可用区 | VPC id | VSwitch id |
---|---|---|---|
华东1(杭州) | cn-hangzhou-g | vpc-bp1g66t4f0onrvbht2et5 | vsw-bp1nh5ri8di2q7tkof474 |
华东2(上海) | cn-shanghai-d | vpc-uf6wxkgst74es59wqareb | vsw-uf6m7k4fcq3pgd0yjfdnm |
华北2(北京) | cn-beijing-g | vpc-2zeawsrpzbelyjko7i0ir | vsw-2zea8ct4hy4hwsrcpd52d |
华南1(深圳) | cn-shenzhen-a | vpc-wz9622zx341dy24ozifn3 | vsw-wz91ov6gj2i4u2kenpe42 |
华北3(张家口) | cn-zhangjiakou-a | vpc-8vbpi1t7c0devxwfe19sn | vsw-8vbjl32xkft0ewggef6g9 |
新加坡 | ap-southeast-a | vpc-t4n3sczhu5efvwo1gsupf | vsw-t4npcrmzzk64r13e3nhhm |
英国(伦敦) | eu-west-1a | vpc-d7ovzdful8490upm8b413 | vsw-d7opmgixr2h34r1975s8a |
在AnalyticDB中为DLA建立VPC的专有网络,注意,要使用MySQL命令行链接AnalyticDB的经典网络连接,执行:sql
alter database txk_cldsj set zone_id='xxx' vpc_id='xxx' vswitch_id='xxx';
其中,“zone_id”、“vpc_id”和“vswitch_id”分别填同region的DLA对应的VPC id和VSwitch id,见上表。网络
命令执行成功后,刷新DMS for AnalyticDB控制台页面,应该能看到一个VPC的URL。异步
具体AnalyticDB的建表文档请参考:https://help.aliyun.com/document_detail/26403.htmlasync
-- 例如: -- 目标表为实时维度表: CREATE DIMENSION TABLE etl_ads_db.etl_ads_dimension_table ( col1 INT, col2 STRING, col3 INT, col4 STRING, primary key (col1) ) options (updateType='realtime'); -- 目标表为实时分区表: CREATE TABLE etl_ads_db.etl_ads_partition_table ( col1 INT, col2 INT, col3 INT, col4 INT, col5 DOUBLE, col6 DOUBLE, col7 DOUBLE primary key (col1, col2, col3, col4) ) PARTITION BY HASH KEY(col1) PARTITION NUM 32 TABLEGROUP xxx_group options (updateType='realtime');
这种状况下,建表语句会比较简单。
其中,以下参数须要指明:阿里云
-- 目标AnalyticDB LOCATION = 'jdbc:mysql://etl_ads_db-e85fbfe8-vpc.cn-shanghai-1.ads.aliyuncs.com:10001/etl_ads_db' -- 目标AnalyticDB的访问用户名 USER='xxx' -- 目标AnalyticDB的访问密码 PASSWORD='xxx'
CREATE SCHEMA `etl_dla_schema` WITH DBPROPERTIES ( CATALOG = 'ads', LOCATION = 'jdbc:mysql://etl_ads_db-e85fbfe8-vpc.cn-shanghai-1.ads.aliyuncs.com:10001/etl_ads_db', USER='xxx', PASSWORD='xxx' ); USE etl_dla_schema; CREATE EXTERNAL TABLE etl_ads_dimension_table ( col1 INT, col2 VARCHAR(200), col3 INT, col4 VARCHAR(200), primary key (col1) ); CREATE EXTERNAL TABLE etl_ads_partition_table ( col1 INT, col2 INT, col3 INT, col4 INT, col5 DOUBLE, col6 DOUBLE, col7 DOUBLE primary key (col1, col2, col3, col4) )
CREATE SCHEMA oss_data_schema with DBPROPERTIES( LOCATION = 'oss://my_bucket/', catalog='oss' ); CREATE EXTERNAL TABLE IF NOT EXISTS dla_table_1 ( col_1 INT, col_2 VARCHAR(200), col_3 INT, col_4 VARCHAR(200) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://my_bucket/oss_table_1'; CREATE EXTERNAL TABLE IF NOT EXISTS dla_table_2 ( col_1 INT, col_2 INT, col_3 INT, col_4 INT, col_5 DOUBLE, col_6 DOUBLE, col_7 DOUBLE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://my_bucket/oss_table_2';
INSERT FROM SELECT一般为长时运行任务,建议经过异步执行方式:
注意:用MySQL命令行执行时,链接时,须要在命令行指定-c参数,用来识别MySQL语句前的hint:es5
mysql -hxxx -Pxxx -uxxx -pxxx db_name -c
示例:spa
-- 执行OSS到AnalyticDB的全量数据插入 /*+run-async=true*/ INSERT INTO etl_dla_schema.etl_dla_dimension_table SELECT * FROM oss_data_schema.dla_table_1; -- 执行OSS到AnalyticDB的数据插入,包含对OSS数据的筛选逻辑 /*+run-async=true*/ INSERT INTO etl_dla_schema.etl_dla_partition_table (col_1, col_2, col_3, col_7) SELECT col_1, col_2, col_3, col_7 FROM oss_data_schema.dla_table_2 WHERE col_1 > 1000 LIMIT 10000;
注意:命令行
若是在DMS for Data Lake Analytics控制台(https://datalakeanalytics.console.aliyun.com/))执行,请选择“异步执行”。
而后能够从“执行历史” 中,点击“刷新”,查看任务的执行状态。
异步执行INSERT FROM SELECT语句,会返回一个task id,经过这个task id,能够轮询任务执行状况,若是status为“SUCCESS”,则任务完成:
SHOW query_task WHERE id = '26c6b18b_1532588796832'
本文做者:julian.zhou
本文为云栖社区原创内容,未经容许不得转载。