sqoop 安装完成以后,下面介绍基本操做。html
sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root \
--password Oracle123java
sqoop list-tables \
--connect jdbc:mysql://localhost:3306/hive \
--username root \
--password Oracle123node
[hadoop@hd1 conf]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password Oracle123 \
> --table sqoop1 \
> --delete-target-dir \
> -m 1mysql
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42) at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742) at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369) at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: java.lang.ClassNotFoundException: org.json.JSONObject at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
看样子缺乏jar包,导入json.jar包从新导入。sql
40899430803_0003/libjars/parquet-hadoop-1.5.0-cdh5.7.0.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1595) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3287) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
看错误提示 不能知足MR 最低副本=1的要求,能够肯定要么DN死了,要么NN不能正常跟DN通讯。apache
通常解决方法有2中:json
1:检查DN跟NN之间的防火墙设置,网络是否通畅,DN节点的DataNode是否启动api
2:因为NN从新格式化过,因此有可能形成NN的版本号跟DN记录的不一致,这个时候须要从新format NN。服务器
关闭NN于DN的防火墙以后从新执行。网络
430803_0004_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1540918072803 found 1540908325730 Note: System times on machines may be out of sync. Check system time and time zones. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) . Failing the application. 18/10/30 21:55:26 INFO mapreduce.Job: Counters: 0 18/10/30 21:55:26 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 18/10/30 21:55:26 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 211.1476 seconds (0 bytes/sec) 18/10/30 21:55:26 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 18/10/30 21:55:26 INFO mapreduce.ImportJobBase: Retrieved 0 records. 18/10/30 21:55:26 ERROR tool.ImportTool: Error during import: Import job failed!
看报错信息,提示NN跟DN之间的时间不一致,致使sync出错。
设置服务器时间:
date -s "2018-10-30 18:22:10" && hwclock --systohc
hwclock --show
修改以后从新执行:
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
感受是驱动的问题,换了一个驱动仍是报这个错误,我决定换一个MySQL版本(mysql-5.6.23)。
[root@hd1 mysql]# vi /etc/my.cnf wait_timeout=86400 [mysqld] basedir=/home/mysql/mysql-5.6.23 datadir=/home/mysql/mysql-5.6.23/data socket=/tmp/mysql.sock log_error=/home/mysql/mysql-5.6.23/mysql.err user=mysql [mysql] socket=/tmp/mysql.sock
[mysql@hd1 support-files]$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.23 MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show tables; ERROR 1046 (3D000): No database selected mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | test | +--------------------+ 4 rows in set (0.00 sec)
继续执行仍是报错,百思不得其解,最后我把localhost改为具体IP地址
--connect jdbc:mysql://192.168.83.11:3306/sqoop 在执行尽然成功了。
18/10/31 13:22:16 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=137094 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=4 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=5935 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=5935 Total vcore-seconds taken by all map tasks=5935 Total megabyte-seconds taken by all map tasks=6077440 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=139 CPU time spent (ms)=990 Physical memory (bytes) snapshot=107266048 Virtual memory (bytes) snapshot=2714398720 Total committed heap usage (bytes)=22544384 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=4 18/10/31 13:22:16 INFO mapreduce.ImportJobBase: Transferred 4 bytes in 24.3133 seconds (0.1645 bytes/sec) 18/10/31 13:22:16 INFO mapreduce.ImportJobBase: Retrieved 1 records.
为何会这样?
在官网上看到以下帖子:
You get a ConnectionRefused Exception when there is a machine at the address specified, but there is no program listening on the specific TCP port the client is using -and there is no firewall in the way silently dropping TCP connection requests. If you do not know what a TCP connection request is, please consult the specification.
Unless there is a configuration error at either end, a common cause for this is the Hadoop service isn't running.
This stack trace is very common when the cluster is being shut down -because at that point Hadoop services are being torn down across the cluster, which is visible to those services and applications which haven't been shut down themselves. Seeing this error message during cluster shutdown is not anything to worry about.
If the application or cluster is not working, and this message appears in the log, then it is more serious.
The exception text declares both the hostname and the port to which the connection failed. The port can be used to identify the service. For example, port 9000 is the HDFS port. Consult the Ambari port reference, and/or those of the supplier of your Hadoop management tools.
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
Check the port the client is trying to talk to using matches that the server is offering a service on. The netstat command is useful there.
On the server, try a telnet localhost <port> to see if the port is open there.
On the client, try a telnet <server> <port> to see if the port is accessible remotely.
Please do not file bug reports related to your problem, as they will be closed as Invalid
sqoop create-hive-table --connect jdbc:mysql://hd1:3306/hive --table TBLS --username root --password Oracle123 --hive-table TBLS
18/10/31 17:16:02 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly. 18/10/31 17:16:02 ERROR tool.CreateHiveTableTool: Encountered IOException running create table job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50) at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392) at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.CreateHiveTableTool.run(CreateHiveTableTool.java:58) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:259) at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44) ... 11 more
解决方法:添加HADOOP_CLASSPATH到用户环境变量中。
export HADOOP_CLASSPATH=$HIVE_HOME/lib/*