Piwik现已更名为Matomo,这是一套国外著名的开源网站统计系统,相似于百度统计、Google Analytics等系统。最大的区别就是能够看到其中的源码,这正合我意。由于我一直对统计的系统很好奇,很想知道里面的运行原理是怎么样的,碰巧了解到有这么一个系统,所以立刻尝试了一下。国内关于该系统的相关资料比较匮乏,大可能是分享怎么安装的,并无找到有关源码分析的文章。下面先对其作个初步的分析,后面会愈来愈详细,本人目前的职位是前端,所以会先分析脚本代码,然后再分析后台代码。javascript
Piwik的官网是matomo.org,使用PHP编写的,而我之前就是PHP工程师,所以看代码不会有障碍。目前最新版本是3.6,Github地址是matomo-org/matomo,打开地址将会看到下图中的内容(只截取了关键部分)。php
打开js文件夹,里面的piwik.js就是本次要分析的脚本代码(以下图红色框出部分),内容比较多,有7838行代码。html
先把系统的代码都下载下来,而后在本地配置虚拟目录,再开始安装。在安装的时候能够选择语言,该系统支持简体中文(注意下图中红色框出的部分)。系统会执行一些操做(注意看下图左边部分),包括检查当前环境可否安装、创建数据库等,按照提示一步一步来就行,比较简单,没啥难度。前端
安装完后就会自动跳转到后台界面(以下图所示),有图表,有分析,和经常使用的统计系统差很少。功能还没细看,只作了初步的了解,界面的友好度仍是蛮不错的。java
嵌到页面中的JavaScript代码与其它统计系统也相似,以下所示,也是用异步加载的方式,只是发送的请求地址没有假装成图像地址(注意看标红的那句代码)。git
<script type="text/javascript"> var _paq = _paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//loc.piwik.cn/"; //自定义 _paq.push(['setTrackerUrl', u+'piwik.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.type='text/javascript'; g.async=true; g.defer=true; g.src='piwik.js'; s.parentNode.insertBefore(g,s); })(); </script>
在页面中嵌入这段脚本后,页面在刷新的时候,会有下图中的请求。在请求中带了一大堆的参数,在后面的内容中会对每一个参数作释义。github
7000多行的脚本,固然不能一行一行的读,须要先拆分,拆成一个一个的模块,而后再逐个分析。脚本之因此这么大,是由于里面编写了大量代码来兼容各个版本的浏览器,这其中甚至包括IE四、Firefox1.0、Netscape等骨灰级的浏览器。接下来我把源码拆分红6个部分,分别是json、private、query、content-overlay、tracker和piwik,以下图红线框出的所示,piwik-all中包含了所有代码,便于对比。代码已上传到Github。web
json.js是一个开源插件JSON3,为了兼容不支持JSON对象的浏览器而设计的,这里面的代码能够单独研究。private.js包含了一些用于全局的私有变量和私有函数,例如定义系统对象的别名、判断类型等。query.js中包含了不少操做HTML元素的方法,例如设置元素属性、查询某个CSS类的元素等,它相似于一个微型的jQuery库,不过有许多独特的功能。content-overlay.js有两部分组成,一部分包含内容追踪以及URL拼接等功能,另外一部分是用来处理嵌套的页面,这里面具体没有细看。tracker.js中只有一个Tracker()函数,不过内容最多,有4700多行,主要的统计逻辑都在这里了。piwik.js中内容很少,包含一些初始化和插件的钩子等功能,钩子具体怎么运做的还没细看。数据库
虽然分红了6部分,可是各部分的内容仍是蛮多的,而且内容之间是有联系的,所以短期的话,很难搞清楚其中全部的门道。我就挑了一点我我的感受最重要的先作分析。json
我原先只知道两种传送数据的方式,一种是经过Ajax的方式,另外一种是建立一个Image对象,而后为其定义src属性,数据做为URL的参数传递给后台,这种方式很通用,而且还能完美解决跨域问题。我之前编写的一个性能参数搜集的插件primus.js,也是这么传送数据的。在阅读源码的时候,发现了第三种传送数据的方式,使用Navigator对象的sendBeacon()。
MDN上说:“此方法可用于经过HTTP将少许数据异步传输到Web服务器”。虽然这个方法有兼容问题,但我仍是被震撼到了。它很适合统计的场景,MDN上又讲到:“统计代码会在页面关闭(window.onunload)以前向web服务器发送数据,但过早的发送数据可能错过收集数据的机会。然而, 要保证在页面关闭期间发送数据一直比较困难,由于浏览器一般会忽略在卸载事件中产生的异步请求 。在使用sendBeacon()方法后,能使浏览器在有机会时异步地向服务器发送数据,同时不会延迟页面的卸载或影响下一页的载入。这就解决了提交分析数据时的全部的问题:使它可靠,异步而且不会影响下一页面的加载,而且代码更简单”。下面是代码片断(注意看标红的那句代码),存在于tracker.js中。
function sendPostRequestViaSendBeacon(request) { var supportsSendBeacon = "object" === typeof navigatorAlias && "function" === typeof navigatorAlias.sendBeacon && "function" === typeof Blob; if (!supportsSendBeacon) { return false; } var headers = { type: "application/x-www-form-urlencoded; charset=UTF-8" }; var success = false; try { var blob = new Blob([request], headers); success = navigatorAlias.sendBeacon(configTrackerUrl, blob); // returns true if the user agent is able to successfully queue the data for transfer, // Otherwise it returns false and we need to try the regular way } catch (e) { return false; } return success; }
下面的方法(存在于tracker.js中)专门用于搜集页面中的统计数据,将它们拼接成指定连接的参数,而这条连接中的参数最终将会发送给服务器。
/** * Returns the URL to call piwik.php, * with the standard parameters (plugins, resolution, url, referrer, etc.). * Sends the pageview and browser settings with every request in case of race conditions. */ function getRequest(request, customData, pluginMethod, currentEcommerceOrderTs) { var i, now = new Date(), nowTs = Math.round(now.getTime() / 1000), referralTs, referralUrl, referralUrlMaxLength = 1024, currentReferrerHostName, originalReferrerHostName, customVariablesCopy = customVariables, cookieSessionName = getCookieName("ses"), cookieReferrerName = getCookieName("ref"), cookieCustomVariablesName = getCookieName("cvar"), cookieSessionValue = getCookie(cookieSessionName), attributionCookie = loadReferrerAttributionCookie(), currentUrl = configCustomUrl || locationHrefAlias, campaignNameDetected, campaignKeywordDetected; if (configCookiesDisabled) { deleteCookies(); } if (configDoNotTrack) { return ""; } var cookieVisitorIdValues = getValuesFromVisitorIdCookie(); if (!isDefined(currentEcommerceOrderTs)) { currentEcommerceOrderTs = ""; } // send charset if document charset is not utf-8. sometimes encoding // of urls will be the same as this and not utf-8, which will cause problems // do not send charset if it is utf8 since it's assumed by default in Piwik var charSet = documentAlias.characterSet || documentAlias.charset; if (!charSet || charSet.toLowerCase() === "utf-8") { charSet = null; } campaignNameDetected = attributionCookie[0]; campaignKeywordDetected = attributionCookie[1]; referralTs = attributionCookie[2]; referralUrl = attributionCookie[3]; if (!cookieSessionValue) { // cookie 'ses' was not found: we consider this the start of a 'session' // here we make sure that if 'ses' cookie is deleted few times within the visit // and so this code path is triggered many times for one visit, // we only increase visitCount once per Visit window (default 30min) var visitDuration = configSessionCookieTimeout / 1000; if ( !cookieVisitorIdValues.lastVisitTs || nowTs - cookieVisitorIdValues.lastVisitTs > visitDuration ) { cookieVisitorIdValues.visitCount++; cookieVisitorIdValues.lastVisitTs = cookieVisitorIdValues.currentVisitTs; } // Detect the campaign information from the current URL // Only if campaign wasn't previously set // Or if it was set but we must attribute to the most recent one // Note: we are working on the currentUrl before purify() since we can parse the campaign parameters in the hash tag if ( !configConversionAttributionFirstReferrer || !campaignNameDetected.length ) { for (i in configCampaignNameParameters) { if ( Object.prototype.hasOwnProperty.call(configCampaignNameParameters, i) ) { campaignNameDetected = getUrlParameter( currentUrl, configCampaignNameParameters[i] ); if (campaignNameDetected.length) { break; } } } for (i in configCampaignKeywordParameters) { if ( Object.prototype.hasOwnProperty.call( configCampaignKeywordParameters, i ) ) { campaignKeywordDetected = getUrlParameter( currentUrl, configCampaignKeywordParameters[i] ); if (campaignKeywordDetected.length) { break; } } } } // Store the referrer URL and time in the cookie; // referral URL depends on the first or last referrer attribution currentReferrerHostName = getHostName(configReferrerUrl); originalReferrerHostName = referralUrl.length ? getHostName(referralUrl) : ""; if ( currentReferrerHostName.length && // there is a referrer !isSiteHostName(currentReferrerHostName) && // domain is not the current domain (!configConversionAttributionFirstReferrer || // attribute to last known referrer !originalReferrerHostName.length || // previously empty isSiteHostName(originalReferrerHostName)) ) { // previously set but in current domain referralUrl = configReferrerUrl; } // Set the referral cookie if we have either a Referrer URL, or detected a Campaign (or both) if (referralUrl.length || campaignNameDetected.length) { referralTs = nowTs; attributionCookie = [ campaignNameDetected, campaignKeywordDetected, referralTs, purify(referralUrl.slice(0, referralUrlMaxLength)) ]; setCookie( cookieReferrerName, JSON_PIWIK.stringify(attributionCookie), configReferralCookieTimeout, configCookiePath, configCookieDomain ); } } // build out the rest of the request request += "&idsite=" + configTrackerSiteId + "&rec=1" + "&r=" + String(Math.random()).slice(2, 8) + // keep the string to a minimum "&h=" + now.getHours() + "&m=" + now.getMinutes() + "&s=" + now.getSeconds() + "&url=" + encodeWrapper(purify(currentUrl)) + (configReferrerUrl.length ? "&urlref=" + encodeWrapper(purify(configReferrerUrl)) : "") + (configUserId && configUserId.length ? "&uid=" + encodeWrapper(configUserId) : "") + "&_id=" + cookieVisitorIdValues.uuid + "&_idts=" + cookieVisitorIdValues.createTs + "&_idvc=" + cookieVisitorIdValues.visitCount + "&_idn=" + cookieVisitorIdValues.newVisitor + // currently unused (campaignNameDetected.length ? "&_rcn=" + encodeWrapper(campaignNameDetected) : "") + (campaignKeywordDetected.length ? "&_rck=" + encodeWrapper(campaignKeywordDetected) : "") + "&_refts=" + referralTs + "&_viewts=" + cookieVisitorIdValues.lastVisitTs + (String(cookieVisitorIdValues.lastEcommerceOrderTs).length ? "&_ects=" + cookieVisitorIdValues.lastEcommerceOrderTs : "") + (String(referralUrl).length ? "&_ref=" + encodeWrapper(purify(referralUrl.slice(0, referralUrlMaxLength))) : "") + (charSet ? "&cs=" + encodeWrapper(charSet) : "") + "&send_image=0"; // browser features for (i in browserFeatures) { if (Object.prototype.hasOwnProperty.call(browserFeatures, i)) { request += "&" + i + "=" + browserFeatures[i]; } } var customDimensionIdsAlreadyHandled = []; if (customData) { for (i in customData) { if ( Object.prototype.hasOwnProperty.call(customData, i) && /^dimension\d+$/.test(i) ) { var index = i.replace("dimension", ""); customDimensionIdsAlreadyHandled.push(parseInt(index, 10)); customDimensionIdsAlreadyHandled.push(String(index)); request += "&" + i + "=" + customData[i]; delete customData[i]; } } } if (customData && isObjectEmpty(customData)) { customData = null; // we deleted all keys from custom data } // custom dimensions for (i in customDimensions) { if (Object.prototype.hasOwnProperty.call(customDimensions, i)) { var isNotSetYet = -1 === indexOfArray(customDimensionIdsAlreadyHandled, i); if (isNotSetYet) { request += "&dimension" + i + "=" + customDimensions[i]; } } } // custom data if (customData) { request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(customData)); } else if (configCustomData) { request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(configCustomData)); } // Custom Variables, scope "page" function appendCustomVariablesToRequest(customVariables, parameterName) { var customVariablesStringified = JSON_PIWIK.stringify(customVariables); if (customVariablesStringified.length > 2) { return ( "&" + parameterName + "=" + encodeWrapper(customVariablesStringified) ); } return ""; } var sortedCustomVarPage = sortObjectByKeys(customVariablesPage); var sortedCustomVarEvent = sortObjectByKeys(customVariablesEvent); request += appendCustomVariablesToRequest(sortedCustomVarPage, "cvar"); request += appendCustomVariablesToRequest(sortedCustomVarEvent, "e_cvar"); // Custom Variables, scope "visit" if (customVariables) { request += appendCustomVariablesToRequest(customVariables, "_cvar"); // Don't save deleted custom variables in the cookie for (i in customVariablesCopy) { if (Object.prototype.hasOwnProperty.call(customVariablesCopy, i)) { if (customVariables[i][0] === "" || customVariables[i][1] === "") { delete customVariables[i]; } } } if (configStoreCustomVariablesInCookie) { setCookie( cookieCustomVariablesName, JSON_PIWIK.stringify(customVariables), configSessionCookieTimeout, configCookiePath, configCookieDomain ); } } // performance tracking if (configPerformanceTrackingEnabled) { if (configPerformanceGenerationTime) { request += ">_ms=" + configPerformanceGenerationTime; } else if ( performanceAlias && performanceAlias.timing && performanceAlias.timing.requestStart && performanceAlias.timing.responseEnd ) { request += ">_ms=" + (performanceAlias.timing.responseEnd - performanceAlias.timing.requestStart); } } if (configIdPageView) { request += "&pv_id=" + configIdPageView; } // update cookies cookieVisitorIdValues.lastEcommerceOrderTs = isDefined(currentEcommerceOrderTs) && String(currentEcommerceOrderTs).length ? currentEcommerceOrderTs : cookieVisitorIdValues.lastEcommerceOrderTs; setVisitorIdCookie(cookieVisitorIdValues); setSessionCookie(); // tracker plugin hook request += executePluginMethod(pluginMethod, { tracker: trackerInstance, request: request }); if (configAppendToTrackingUrl.length) { request += "&" + configAppendToTrackingUrl; } if (isFunction(configCustomRequestContentProcessing)) { request = configCustomRequestContentProcessing(request); } return request; }
统计代码每次都会传送数据,而每次请求都会带上一大串的参数,这些参数都是简写,下面作个简单说明(若有不正确的地方,欢迎指正),部分参数还没做出合适的解释,例如UUID的生成规则等。首先将这些参数分为两部分,第一部分以下所列:
一、idsite:网站ID
二、rec:1(写死)
三、r:随机码
四、h:当前小时
五、m:当前分钟
六、s:当前秒数
七、url:当前纯净地址,只留域名和协议
八、_id:UUID
九、_idts:访问的时间戳
十、_idvc:访问数
十一、_idn:新访客(目前还没有使用)
十二、_refts:访问来源的时间戳
1三、_viewts:上一次访问的时间戳
1四、cs:当前页面的字符编码
1五、send_image:是否用图像请求方式传输数据
1六、gt_ms:内容加载消耗的时间(响应结束时间减去请求开始时间)
1七、pv_id:惟一性标识
再列出第二部分,用于统计浏览器的功能,经过Navigator对象的属性(mimeTypes、javaEnabled等)和Screen对象的属性(width与height)得到。
一、pdf:是否支持pdf文件类型
二、qt:是否支持QuickTime Player播放器
三、realp:是否支持RealPlayer播放器
四、wma:是否支持MPlayer播放器
五、dir:是否支持Macromedia Director
六、fla:是否支持Adobe FlashPlayer
七、java:是否激活了Java
八、gears:是否安装了Google Gears
九、ag:是否安装了Microsoft Silverlight
十、cookie:是否启用了Cookie
十一、res:屏幕的宽和高(未正确计算高清显示器)
上面这11个参数的获取代码,能够参考下面这个方法(一样存在于tracker.js中),注意看代码中的pluginMap变量(已标红),它保存了多个MIME类型,用来检测是否安装或启用了指定的插件或功能。
/* * Browser features (plugins, resolution, cookies) */ function detectBrowserFeatures() { var i, mimeType, pluginMap = { // document types pdf: "application/pdf", // media players qt: "video/quicktime", realp: "audio/x-pn-realaudio-plugin", wma: "application/x-mplayer2", // interactive multimedia dir: "application/x-director", fla: "application/x-shockwave-flash", // RIA java: "application/x-java-vm", gears: "application/x-googlegears", ag: "application/x-silverlight" }; // detect browser features except IE < 11 (IE 11 user agent is no longer MSIE) if (!new RegExp("MSIE").test(navigatorAlias.userAgent)) { // general plugin detection if (navigatorAlias.mimeTypes && navigatorAlias.mimeTypes.length) { for (i in pluginMap) { if (Object.prototype.hasOwnProperty.call(pluginMap, i)) { mimeType = navigatorAlias.mimeTypes[pluginMap[i]]; browserFeatures[i] = mimeType && mimeType.enabledPlugin ? "1" : "0"; } } } // Safari and Opera // IE6/IE7 navigator.javaEnabled can't be aliased, so test directly // on Edge navigator.javaEnabled() always returns `true`, so ignore it if ( !new RegExp("Edge[ /](\\d+[\\.\\d]+)").test(navigatorAlias.userAgent) && typeof navigator.javaEnabled !== "unknown" && isDefined(navigatorAlias.javaEnabled) && navigatorAlias.javaEnabled() ) { browserFeatures.java = "1"; } // Firefox if (isFunction(windowAlias.GearsFactory)) { browserFeatures.gears = "1"; } // other browser features browserFeatures.cookie = hasCookies(); } var width = parseInt(screenAlias.width, 10); var height = parseInt(screenAlias.height, 10); browserFeatures.res = parseInt(width, 10) + "x" + parseInt(height, 10); }
除了上述20多个参数以外,在系统官网上可点击“Tracking HTTP API”查看到全部的参数,只不过都是英文的。
上面用到的代码已上传至https://github.com/pwstrick/mypiwik,若有须要,可自行下载。