使用任何语言作模拟登录或者抓取访问页面,无外乎如下思路:web
第一 启用一个web访问会话方法或者实例化一个web访问类,如.net中的HttpWebRequest;
第二 模拟POST或者GET方式提交的数据;
第三 模拟请求的头;
第四 提交请求并得到响应,及对响应作咱们所须要的处理。
这里咱们以人人网的登陆为例,将涉及到POST以及GET两种请求方式。
你们使用抓包工具(IE调试工具/httpwatch)都是能够的,我这里采用httpwatch,登录人人网的时候(www.renren.com),一共作了一个POST请求以及两个GET请求,以下图:服务器

post了一个后,第一个返回状态值是200的通常就是登陆后的首页地址,有些网页须要跳转的比较多一些,可是方法都是同样的,cookie
观察这三个请求的详细信息,不难看出这里都是顺序的,第一个GET请求的地址由POST的响应获得,而第二个GET请求的地址又由第一个GET的响应获得。app
每次请求与下一次请求之间的联系就是每次请求后返回的Cookies数据,前一次的返回Cookie数据须要同下一次请求一同发送到服务器,这也是C#模拟网站登录的关键。dom
这里须要注意几点:工具
1、选择须要post的地址,能够经过工具查看得到,也能够经过查看网页源代码得到。post

2、content能够查看返回的内容,或者是包含下一跳的连接地址。到最后必定是首页的网页内容。网站



先来模拟第一个POST请求google
- HttpWebRequest request = null;
- HttpWebResponse response = null;
- string gethost = string.Empty;
- CookieContainer cc = new CookieContainer();
- string Cookiesstr = string.Empty;
- try
- {
-
- string postdata =“”email=adm13956587&password=786954887&icode=&origURL=http%3A%2F%2Fwww.renren.com%2Fhome&domain=renren.com&key_id=1&captcha_type=web_login"
- string LoginUrl="http://www.renren.com/PLogin.do";
- request = (HttpWebRequest)WebRequest.Create(LoginUrl);
- request.Method = "POST";
-
- request.ContentType = "application/x-www-form-urlencoded";
- byte[] postdatabytes = Encoding.UTF8.GetBytes(postdata);
- request.ContentLength = postdatabytes.Length;
- request.AllowAutoRedirect = false;
- request.CookieContainer = cc;
- request.KeepAlive = true;
-
- Stream stream;
- stream = request.GetRequestStream();
- stream.Write(postdatabytes, 0, postdatabytes.Length);
- stream.Close();
-
- response = (HttpWebResponse)request.GetResponse();
-
- response.Cookies = request.CookieContainer.GetCookies(request.RequestUri);
- CookieCollection cook = response.Cookies;
- string strcrook = request.CookieContainer.GetCookieHeader(request.RequestUri);
- Cookiesstr = strcrook;
- The URL has moved <a href="http://www.renren.com/home">here</a>
- StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
- string content = sr.ReadToEnd();
- response.Close();
- string[] substr = content.Split(new char[] { '"' });
- gethost = substr[1]; //http://www.renren.com/home
- }
- catch (Exception)
- {
-
- }
注释写的很详细了,在这就再也不分析,也许有人对request = (HttpWebRequest)WebRequest.Create(LoginUrl)有疑问,能够去google一下HttpWebRequest和WebRequest的区别,简单来讲WebRequest是一个抽象类,不能直接实例化,须要被继承,而HttpWebRequest继承自WebRequest。url
再模拟第一个和第二个GET请求
- try
- {
- request = (HttpWebRequest)WebRequest.Create(gethost);
- request.Method = "GET";
- request.KeepAlive = true;
- request.Headers.Add("Cookie:" + Cookiesstr);
- request.CookieContainer = cc;
- request.AllowAutoRedirect = false;
- response = (HttpWebResponse)request.GetResponse();
-
- Cookiesstr = request.CookieContainer.GetCookieHeader(request.RequestUri);
-
- StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
- string ss = sr.ReadToEnd();
- string[] substr = ss.Split(new char[] { '"' });
- gethost = substr[1]; //http://www.renren.com/1915651750
- request.Abort();
- sr.Close();
- response.Close();
- }
- catch (Exception)
- {
-
- }
- try
- {
-
- request = (HttpWebRequest)WebRequest.Create(gethost);
- request.Method = "GET";
- request.KeepAlive = true;
- request.Headers.Add("Cookie:" + Cookiesstr);
- request.CookieContainer = cc;
- request.AllowAutoRedirect = false;
- response = (HttpWebResponse)request.GetResponse();
-
- Cookiesstr = request.CookieContainer.GetCookieHeader(request.RequestUri);
-
StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
-
string ss = sr.ReadToEnd();
- webBrowser1.Navigate("about:blank");
-
webBrowser1.Document.OpenNew(true);
-
webBrowser1.Document.Write(ss);
- request.Abort();
- response.Close();
- }
- catch (Exception)
- {
-
- }
GET与POST请求大同小异,这里便再也不累述。三次请求结束,保存好你的cookie string,每次请求的时候都赋给请求的头部,你就处于登陆状态了。