java 网络编程html
package java.net.*java
Network Protocol Stacknode
Socketmysql
Definition: web
A socket is one endpoint of a two-way communication link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to.sql
endpoint端点是一个IP地址和一个端口的组合,每个TCP链接能够被两个端点惟一标识。这样,你才能够在主机和服务器之间创建多个链接。apache
代码示例:编程
GreetingClient.javajson
1 import java.net.*; 2 import java.io.*; 3 4 public class GreetingClient{ 5 public static void main(String[] args){ 6 String serverName = args[0]; 7 int port = Integer.parseInt(args[1]); 8 try{ 9 System.out.println("Connecting to " + serverName + 10 " on port " + port); 11 Socket client = new Socket(serverName, port); 12 System.out.println("Just connected to " 13 + client.getRemoteSocketAddress()); 14 OutputStream outToServer = client.getOutputStream(); 15 DataOutputStream out = new DataOutputStream(outToServer); 16 out.writeUTF("Hello from " 17 + client.getLocalSocketAddress()); 18 InputStream inFromServer = client.getInputStream(); 19 DataInputStream in = 20 new DataInputStream(inFromServer); 21 System.out.println("Server says " + in.readUTF()); 22 client.close(); 23 }catch(IOException e){ 24 e.printStackTrace(); 25 } 26 } 27 }
GreetingServer.java 继承Thread类实现多线程浏览器
1 import java.net.*; 2 import java.io.*; 3 4 public class GreetingServer extends Thread{ 5 private ServerSocket serverSocket; 6 7 public GreetingServer(int port) throws IOException{ 8 serverSocket = new ServerSocket(port); 9 serverSocket.setSoTimeout(10000); 10 } 11 12 public void run(){ 13 while(true){ 14 try{ 15 System.out.println("Waiting for client on port " + 16 serverSocket.getLocalPort() + " ..."); 17 Socket server = serverSocket.accept(); 18 System.out.println("Just connected to " 19 + server.getRemoteSocketAddress()); 20 DataInputStream in = new DataInputStream(server.getInputStream()); 21 System.out.println(in.readUTF()); 22 DataOutputStream out = 23 new DataOutputStream(server.getOutputStream()); 24 out.writeUTF("Thank you for connecting to " 25 + server.getLocalSocketAddress() + "\nGoodbye!"); 26 server.close(); 27 }catch(SocketTimeoutException s){ 28 System.out.println("Socket timed out!"); 29 break; 30 }catch(IOException e){ 31 e.printStackTrace(); 32 break; 33 } 34 } 35 } 36 37 public static void main(String[] args){ 38 int port = Integer.parseInt(args[0]); 39 try{ 40 Thread t = new GreetingServer(port); 41 t.start(); 42 }catch(IOException e){ 43 e.printStackTrace(); 44 } 45 } 46 }
Another Example 测试网络端口 (端口开放则return true)
1 public static boolean testInet(String site, int port) {
2 Socket sock = new Socket(); 3 int timeout = 3000; // ms 4 InetSocketAddress addr = new InetSocketAddress(site,port); 5 try { 6 sock.connect(addr,timeout); 7 return true; 8 } catch (IOException e) { 9 return false; 10 } finally { 11 try {sock.close();} 12 catch (IOException e) {} 13 } 14 }
Payload [wikipedia]
payload是所传输数据中实际想传输的消息内容对应的那部分。
The term is borrowed from transportation, where "payload" refers to the part of the load that pays for transportation.
示例
下面是一段json数据
{
"data":{ "message":"Hello, world!" } }
字符串"Hello, world!" 就是 payload, 而其它部分就是协议开销.
Ethernet
IP (Internet Protocol 网际协议) [rfc791]
IP协议为上层协议提供无状态,无链接,不可靠的服务。
Transport Layer
端口在传输协议层肯定。
UDP [rfc768]
Format ------ 0 7 8 15 16 23 24 31 +--------+--------+--------+--------+ | Source | Destination | | Port | Port | +--------+--------+--------+--------+ | | | | Length | Checksum | +--------+--------+--------+--------+ | | data octets ... +---------------- ... User Datagram Header Format
TCP [rfc793]
TCP Header Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TCP Header Format Note that one tick mark represents one bit position.
TCP 三次握手(three-way handshake)创建链接
简单解释以下:
TCP 关闭链接 四次握手
简单解释: 链接的两端是独立的,每一端的关闭都须要两次握手。
HTTP
HTTP是一个客户端终端(用户)和服务器端(网站)请求和应答的标准(TCP)。经过使用Web浏览器、网络爬虫或者其它的工具,客户端发起一个HTTP请求到服务器上指定端口(默认端口为80)。咱们称这个客户端为用户代理程式(user agent)。应答的服务器上存储着一些资源,好比HTML文件和图像。咱们称这个应答服务器为源服务器(origin server)。在用户代理和源服务器中间可能存在多个“中间层”,好比代理伺服器、网关或者隧道(tunnel)。
尽管TCP/IP协议是互联网上最流行的应用,HTTP协议中,并无规定必须使用它或它支持的层。事实上,HTTP能够在任何互联网协议上,或其余网络上实现。HTTP假定其下层协议提供可靠的传输。所以,任何可以提供这种保证的协议均可以被其使用。所以也就是其在TCP/IP协议族使用TCP做为其传输层。
一般,由HTTP客户端发起一个请求,创建一个到服务器指定端口(默认是80端口)的TCP链接。HTTP服务器则在那个端口监听客户端的请求。一旦收到请求,服务器会向客户端返回一个状态,好比"HTTP/1.1 200 OK",以及返回的内容,如请求的文件、错误消息、或者其它信息。
HTTP状态码
状态代码的第一个数字表明当前响应的类型:
HTTP响应由三个部分组成:
状态码(Status Code):描述了响应的状态。能够用来检查是否成功的完成了请求。请求失败的状况下,状态码可用来找出失败的缘由。若是Servlet没有返回状态码,默认会返回成功的状态码HttpServletResponse.SC_OK。
HTTP头部(HTTP Header):它们包含了更多关于响应的信息。好比:头部能够指定认为响应过时的过时日期,或者是指定用来给用户安全的传输实体内容的编码格式。如何在Serlet中检索HTTP的头部看这里。
主体(Body):它包含了响应的内容。它能够包含HTML代码,图片,等等。主体是由传输在HTTP消息中紧跟在头部后面的数据字节组成的。
Code Example: use HttpURLConnection POST data to web server
1 public static String executePost(String targetURL, String urlParameters) { 2 HttpURLConnection connection = null; 3 4 try { 5 //Create connection 6 URL url = new URL(targetURL); 7 connection = (HttpURLConnection) url.openConnection(); 8 connection.setRequestMethod("POST"); 9 connection.setRequestProperty("Content-Type", 10 "application/x-www-form-urlencoded"); 11 12 connection.setRequestProperty("Content-Length", 13 Integer.toString(urlParameters.getBytes().length)); 14 connection.setRequestProperty("Content-Language", "en-US"); 15 16 connection.setUseCaches(false); 17 connection.setDoOutput(true); 18 19 //Send request 20 DataOutputStream wr = new DataOutputStream ( 21 connection.getOutputStream()); 22 wr.writeBytes(urlParameters); 23 wr.close(); 24 25 //Get Response 26 InputStream is = connection.getInputStream(); 27 BufferedReader rd = new BufferedReader(new InputStreamReader(is)); 28 StringBuilder response = new StringBuilder(); // or StringBuffer if Java version 5+ 29 String line; 30 while ((line = rd.readLine()) != null) { 31 response.append(line); 32 response.append('\r'); 33 } 34 rd.close(); 35 return response.toString(); 36 } catch (Exception e) { 37 e.printStackTrace(); 38 return null; 39 } finally { 40 if (connection != null) { 41 connection.disconnect(); 42 } 43 } 44 }
A Simple Web Crawler
pom.xml
1 <?xml version="1.0" encoding="UTF-8"?> 2 <project xmlns="http://maven.apache.org/POM/4.0.0" 3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 4 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 5 <modelVersion>4.0.0</modelVersion> 6 7 <groupId>artificerPi</groupId> 8 <artifactId>WebCrawler</artifactId> 9 <version>1.0-SNAPSHOT</version> 10 11 <dependencies> 12 <dependency> 13 <groupId>org.jsoup</groupId> 14 <artifactId>jsoup</artifactId> 15 <version>1.8.3</version> 16 </dependency> 17 <dependency> 18 <groupId>mysql</groupId> 19 <artifactId>mysql-connector-java</artifactId> 20 <version>5.1.25</version> 21 </dependency> 22 </dependencies> 23 </project>
ddl
1 create database crawler; 2 3 use crawler; 4 5 CREATE TABLE IF NOT EXISTS `Record` ( 6 `RecordID` INT(11) NOT NULL AUTO_INCREMENT, 7 `URL` text NOT NULL, 8 PRIMARY KEY (`RecordID`) 9 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
DB.java
1 /** 2 * Created by artificerPi on 2016/4/7. 3 */ 4 import java.sql.Connection; 5 import java.sql.DriverManager; 6 import java.sql.ResultSet; 7 import java.sql.SQLException; 8 import java.sql.Statement; 9 10 public class DB { 11 12 public Connection conn = null; 13 14 public DB() { 15 try { 16 Class.forName("com.mysql.jdbc.Driver"); 17 String url = "jdbc:mysql://localhost:3306/Crawler"; 18 conn = DriverManager.getConnection(url, "root", "passw0rd"); 19 System.out.println("conn built"); 20 } catch (SQLException e) { 21 e.printStackTrace(); 22 } catch (ClassNotFoundException e) { 23 e.printStackTrace(); 24 } 25 } 26 27 public ResultSet runSql(String sql) throws SQLException { 28 Statement sta = conn.createStatement(); 29 return sta.executeQuery(sql); 30 } 31 32 public boolean runSql2(String sql) throws SQLException { 33 Statement sta = conn.createStatement(); 34 return sta.execute(sql); 35 } 36 37 @Override 38 protected void finalize() throws Throwable { 39 if (conn != null || !conn.isClosed()) { 40 conn.close(); 41 } 42 } 43 }
Main.java
1 /** 2 * Created by artificerPi on 2016/4/7. 3 */ 4 5 import org.jsoup.Jsoup; 6 import org.jsoup.nodes.Document; 7 import org.jsoup.nodes.Element; 8 import org.jsoup.select.Elements; 9 10 import java.io.IOException; 11 import java.sql.PreparedStatement; 12 import java.sql.ResultSet; 13 import java.sql.SQLException; 14 import java.sql.Statement; 15 16 17 public class Main { 18 public static DB db = new DB(); 19 20 public static void main(String[] args) throws SQLException, IOException { 21 db.runSql2("TRUNCATE Record;"); 22 processPage("http://www.mit.edu"); 23 } 24 25 public static void processPage(String URL) throws SQLException, IOException { 26 //check if the given URL is already in database 27 String sql = "select * from Record where URL = '" + URL + "'"; 28 ResultSet rs = db.runSql(sql); 29 if (rs.next()) { 30 31 } else { 32 //store the URL to database to avoid parsing again 33 sql = "INSERT INTO `Crawler`.`Record` " + "(`URL`) VALUES " + "(?);"; 34 PreparedStatement stmt = db.conn.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS); 35 stmt.setString(1, URL); 36 stmt.execute(); 37 38 //get useful information 39 Document doc = Jsoup.connect("http://www.mit.edu/").get(); 40 41 // query param: research 42 if (doc.text().contains("research")) { 43 System.out.println(URL); 44 } 45 46 //get all links and recursively call the processPage method 47 Elements questions = doc.select("a[href]"); 48 for (Element link : questions) { 49 if (link.attr("href").contains("mit.edu")) 50 processPage(link.attr("abs:href")); 51 } 52 } 53 } 54 }
其中使用了jsoup (Java HTML Parser),能够参考官方文档或IBM developerworks.
原理:
HTTP Code 206
206 Partial Content 服务器已经成功处理了部分GET请求。相似于FlashGet或者迅雷这类的HTTP 下载工具都是使用此类响应实现断点续传或者将一个大文档分解为多个下载段同时下载。 该请求必须包含Range头信息来指示客户端但愿获得的内容范围,而且可能包含If-Range来做为请求条件。 响应必须包含以下的头部域: Content-Range用以指示本次响应中返回的内容的范围;若是是Content-Type为multipart/byteranges的多段下载,则每一multipart段中都应包含Content-Range域用以指示本段的内容范围。假如响应中包含Content-Length,那么它的数值必须匹配它返回的内容范围的真实字节数。 Date ETag和/或Content-Location,假如一样的请求本应该返回200响应。 Expires, Cache-Control,和/或Vary,假如其值可能与以前相同变量的其余响应对应的值不一样的话。 假如本响应请求使用了If-Range强缓存验证,那么本次响应不该该包含其余实体头;假如本响应的请求使用了If-Range弱缓存验证,那么本次响应禁止包含其余实体头;这避免了缓存的实体内容和更新了的实体头信息之间的不一致。不然,本响应就应当包含全部本应该返回200响应中应当返回的全部实体头部域。 假如ETag或Last-Modified头部不能精确匹配的话,则客户端缓存应禁止将206响应返回的内容与以前任何缓存过的内容组合在一块儿。 任何不支持Range以及Content-Range头的缓存都禁止缓存206响应返回的内容。
Demo:
设定请求: 提交 RANGE: bytes=2000070
1 URL url = new URL("http://www.sjtu.edu.cn/down.zip"); 2 HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection(); 3 4 // 设置 User-Agent 5 httpConnection.setRequestProperty("User-Agent","NetFox"); 6 // 设置断点续传的开始位置 7 httpConnection.setRequestProperty("RANGE","bytes=2000070"); 8 // 得到输入流 9 InputStream input = httpConnection.getInputStream();
从适当的位置继续写入文件
RandomAccess oSavedFile = new RandomAccessFile("down.zip","rw"); long nPos = 2000070; // 定位文件指针到 nPos 位置 oSavedFile.seek(nPos); byte[] b = new byte[1024]; int nRead; // 从输入流中读入字节流,而后写到文件中 while((nRead=input.read(b,0,1024)) > 0) { oSavedFile.write(b,0,nRead); }
Session & Cookie
cookie是Web服务器发送给浏览器的一块信息。浏览器会在本地文件中给每个Web服务器存储cookie。之后浏览器在给特定的Web服务器发请求的时候,同时会发送全部为该服务器存储的cookie。下面列出了session和cookie的区别:
不管客户端浏览器作怎么样的设置,session都应该能正常工做。客户端能够选择禁用cookie,可是,session仍然是可以工做的,由于客户端没法禁用服务端的session。
在存储的数据量方面session和cookies也是不同的。session可以存储任意的Java对象,cookie只能存储String类型的对象。
参考:
https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html
https://zh.wikipedia.org/wiki/%E8%B6%85%E6%96%87%E6%9C%AC%E4%BC%A0%E8%BE%93%E5%8D%8F%E8%AE%AE