Hacker News网站的文章排名算法工做原理

In this post I'll try to explain how Hacker News ranking algorithm works and how you can reuse it in your own applications. It's a very simple ranking algorithm and works surprising well when you want to highlight hot or new stuff.算法

这篇文章我要向你们介绍Hacker News网站的文章排名算法工做原理,以及如何在本身的应用里使用这种算法。这个算法很是的简单,但却在突出热门文章和遴选新文章上表现的异常优秀。
编程

Digging into news.arc code

Hacker News is implemented in Arc, a Lisp dialect coded by Paul Graham. Hacker News is opensource and the code can be found at arclanguage.org. Digging through the news.arc code you can find the ranking algorithm which looks like this:app

深刻 news.arc 程序代码
Hacker News是用Arc语言开发的,这是一种Lisp方言,由Y Combinator投资公司创始人Paul Graham创造。Hacker News的开源的,你能够在arclanguage.org找到它的源代码。深刻发掘 news.arc 程序,你会找到这段排名算法代码,就是下面这段:
less

; Votes divided by the age in hours to the gravityth power.
; Would be interesting to scale gravity in a slider.

(= gravity* 1.8 timebase* 120 front-threshold* 1 
   nourl-factor* .4 lightweight-factor* .3 )

(def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
  (* (/ (let base (- (scorefn s) 1)
          (if (> base 0) (expt base .8) base))
        (expt (/ (+ (item-age s) timebase*) 60) gravity))
     (if (no (in s!type 'story 'poll))  1
         (blank s!url)                  nourl-factor*
         (lightweight s)                (min lightweight-factor* 
                                             (contro-factor s))
                                        (contro-factor s))))

In essence the ranking performed by Hacker News looks like this:ide

本质上,这段 Hacker News采用的排名算法的工做原理看起来大概是这个样子:
post

Score = (P-1) / (T+2)^G

where,
P = points of an item (and -1 is to negate submitters vote)
T = time since submission (in hours)
G = Gravity, defaults to 1.8 in news.arc

As you see the algorithm is rather trivial to implement. In the upcoming section we'll see how the algorithm behaves.网站

Score = (P-1) / (T+2)^G
其中,
P = 文章得到的票数( -1 是去掉文章提交人的票)
T = 从文章提交至今的时间(小时)
G = 比重,news.arc里缺省值是1.8
正如你看到的,这个算法很容易实现。在下面的内容里,咱们将会看到这个算法是如何工做的。
this

Effects of gravity (G) and time (T)

Gravity and time have a significant impact on the score of an item. Generally these things hold true:url

  • the score decreases as T increases, meaning that older items will get lower and lower scores
  • the score decreases much faster for older items if gravity is increased

To see this visually we can plot the algorithm to Wolfram Alpha.spa

比重(G)和时间(T)对排名的影响
比重和时间在文章的排名得分上有重大的影响。正常状况下以下面所述:
当T增长时文章得分会降低,这就是说越老的文章分数会越底。
当比重加大时,老的文章的得分会减的更快
为了能视觉呈现这个算法,咱们能够把它绘制到Wolfram Alpha。
得分随着时间是如何变化的

How score is behaving over time

Score 24 hours

As you can see the score decreases a lot as time goes by, for example a 24 hour old item will have a very low score regardless of how many votes it got.

Plot query:

你能够看到,随着时间的流逝,得分骤然降低,例如,24小时前的文章的分数变的很是低——无论它得到了如何多的票数。
Plot语句:

plot(
    (30 - 1) / (t + 2)^1.8, 
    (60 - 1) / (t + 2)^1.8,
    (200 - 1) / (t + 2)^1.8
) where t=0..24

How gravity parameter behaves

比重参数是如何影响排名的

Gravity effects

As you can see by the graph the score decreases a lot faster the larger the gravity is.

Plotting query:

图中你能够看到,比重越大,得分降低的越快。
Plot语句:

plot(
    (p - 1) / (t + 2)^1.8, 
    (p - 1) / (t + 2)^0.5,
    (p - 1) / (t + 2)^2.0
) where t=0..24, p=10

Python implementation

As already stated it's rather simple to implementing the score function:

def calculate_score(votes, item_hour_age, gravity=1.8):
    return (votes - 1) / pow((item_hour_age+2), gravity)

The most crucial aspect is understanding how the algorithm behaves and how you can customize it for your application and I hope I have contributed that knowledge :-)

Happy hacking!

plot(
    (p - 1) / (t + 2)^1.8, 
    (p - 1) / (t + 2)^0.5,
    (p - 1) / (t + 2)^2.0
) where t=0..24, p=10
Python语言实现
以前已经说了,这个评分算法很容易实现:
def calculate_score(votes, item_hour_age, gravity=1.8):
    return (votes - 1) / pow((item_hour_age+2), gravity)
关键是要理解算法中的各个因素对评分的影响,这样你能够在你的应用中进行定制。我但愿这篇文章已经向你说明了这些 
祝编程快乐!

Edit:
You can view comments to this post and a lot more thoughts on HN's ranking here:

Edit:
Paul Graham has shared the updated HN ranking algorithm:

    (= gravity* 1.8 timebase* 120 front-threshold* 1
       nourl-factor* .4 lightweight-factor* .17 gag-factor* .1)

    (def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
      (* (/ (let base (- (scorefn s) 1)
              (if (> base 0) (expt base .8) base))
            (expt (/ (+ (item-age s) timebase*) 60) gravity))
         (if (no (in s!type 'story 'poll))  .8
             (blank s!url)                  nourl-factor*
             (mem 'bury s!keys)             .001
                                            (* (contro-factor s)
                                               (if (mem 'gag s!keys)
                                                    gag-factor*
                                                   (lightweight s)
                                                    lightweight-factor*
                                                   1)))))