php urlencode vs java URLEncoder.encode

 

结论:urlencode 先比URLEncoder.encode多编码 “ * ” 符号,其余都保持一致

php urlencode 

  phpversion()>=5.3 will compliant with RFC 3986, while phpversion()<=5.2.7RC1 is not compliant with RFC 3986.php

  参考 RFC3896 方式编码java

  

返回字符串,此字符串中除了 -_. 以外的全部非字母数字字符都将被替换成百分号(%)后跟两位十六进制数,空格则编码为加号(+)。
此编码与 WWW 表单 POST 数据的编码方式是同样的,同时与 application/x-www-form-urlencoded 的媒体类型编码方式同样。
因为历史缘由,此编码在将空格编码为加号(+)方面与 » RFC3896 编码(参见 rawurlencode())不一样。

 

php并无彻底按照 rfc3896编码,符号【~】在标准中是不用编码,可是他也编码了。git

 

因此最终的未编码的字符列表为 [-], [_], [.],如同其文档中描述的同样api

java URLEncoder.encode

  参考 RFC2396 方式编码浏览器

  可是因为ie浏览器编码了除  "-", "_", ".", "*" 以外的字符,java采用了和IE同样的编码列表,app

  因此最终的未编码的字符列表为 [-], [_], [.], [*]less

  

The list of characters that are not encoded has been determined as follows: RFC 2396 states: ----- Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper and lower case letters, decimal digits, and a limited set of punctuation marks and symbols. unreserved = alphanum | mark mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear. ----- It appears that both Netscape and Internet Explorer escape all special characters from this list with the exception of "-", "_", ".", "*". While it is not clear why they are escaping the other characters, perhaps it is safest to assume that there might be contexts in which the others are unsafe if not escaped. Therefore, we will use the same list. It is also noteworthy that this is consistent with O'Reilly's "HTML: The Definitive Guide" (page 164). As a last note, Intenet Explorer does not encode the "@" character which is clearly not unreserved according to the RFC. We are being consistent with the RFC in this matter, as is Netscape.

 

History of related RFCs:

RFC 1738 section 2.2
only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.ide

RFC 2396 section 2.3
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"ui

RFC 2732 section 3
(3) Add "[" and "]" to the set of 'reserved' characters:this

RFC 3986 section 2.3
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

RFC 3987 section 2.2
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

相关文章
相关标签/搜索