虽然大数据是一个泛泛的概念词,可是关于大数据,关于大数据处理分析的话题近来持续升温,如今基本成了新一轮工业革命级别的话题。
大数据是什么,做为数据采集团队 ,咱们很长的时间里一直也在思考,什么是大数据,大数据的前景和价值在哪里。微信
这篇文章里,我会跟你们一块儿分享个人见解以及各类有趣的内容和资源,它们关于:网络
什么是大数据app
大数据的实践electron
大数据的应用场景ide
硬广:咱们团队的帮助你零门槛采集数据:工具
造数 - 最好用的云爬虫工具 进击的爬虫工具!post
最近都在说裁人,若是想知道互联网裁人潮对就业薪资是否是真的产生了持久的负面影响,能够用咱们的工具,帮你定时天天采集几回生成列表看一看。大数据
先听听行家的说法:优化
大数据就是多,就是多。原来的设备存不下、算不动。ui
————啪菠萝·毕加索
大数据,不是随机样本,而是全部数据;不是精确性,而是混杂性;不是因果关系,而是相关关系。
_______Schönberger
移步ted:Kenneth Cukier: Big data is better data
America's favorite pie is?
Audience: Apple. Kenneth
Cukier: Apple. Of course it is. How do we know it? Because of data. You look at supermarket sales. You look at supermarket sales of 30-centimeter pies that are frozen, and apple wins, no contest. The majority of the sales are apple. But then supermarkets started selling smaller, 11-centimeter pies, and suddenly, apple fell to fourth or fifth place. Why? What happened? Okay, think about it. When you buy a 30-centimeter pie, the whole family has to agree, and apple is everyone's second favorite. (Laughter) But when you buy an individual 11-centimeter pie, you can buy the one that you want. You can get your first choice. You have more data. You can see something that you couldn't see when you only had smaller amounts of it.
曾经人们觉得最爱吃的派都是苹果派,不过当你有了更细致的数据,你会发现,苹果派受欢迎实际上是一种妥协的结果:苹果派是每一个人第二喜欢的口味。
拿到小尺寸派的数据之后你更发现,其实苹果派只能排到第四,第五位的样子了。
你有了更多数据,你就能看到以前你看不到的信息。
大数据是大数据的采集
大数据行业,自己是依托于数据源存在的服务性行业。
大数据最根本之处在于信息收集方式出现了重大变化与革新。大数据的出现与大量信息直接在网络呈现关系很是紧密。
微博、天猫、淘宝、微信等等都直接产生了大量包括定位、消息记录、消费记录、评价、阅读等等殊为庞大的信息,能够说互联网企业都天然的带有数据企业的标签。不过若是咱们从数据的源头看的更仔细一些,仍是会发现,其实不少数据依然是有巨大的采集与归类的需求。
Joel Selanikio:Transcript of "The big-data revolution in healthcare"
There's a concept that people talk about nowadays called "big data." And what they're talking about is all of the information that we're generating through our interaction with and over the Internet, everything from Facebook and Twitter to music downloads, movies, streaming, all this kind of stuff, the live streaming of TED. And the folks who work with big data, for them, they talk about that their biggest problem is we have so much information. The biggest problem is: how do we organize all that information?
如今人人都说大数据,但其实你们说的是 facebook,twitter,streaming 等等站点上天天产生的信息,作大数据的人呢,会以为咱们有的数据量实在太大了。
(组织信息仍然是最难的问题)
I can tell you that, working in global health, that is not our biggest problem. Because for us, even though the light is better on the Internet, the data that would help us solve the problems we're trying to solve is not actually present on the Internet. So we don't know, for example, how many people right now are being affected by disasters or by conflict situations. We don't know for, really, basically, any of the clinicsin the developing world, which ones have medicines and which ones don't. We have no idea of what the supply chain is for those clinics. We don't know -- and this is really amazing to me -- we don't know how many children were born -- or how many children there are -- in Bolivia or Botswana or Bhutan. We don't know how many kids died last week in any of those countries. We don't know the needs of the elderly, the mentally ill. For all of these different critically important problems or critically important areas that we want to solve problems in, we basically know nothing at all.
许多有效的数据还彻底不在网络上,要依靠原始的方法来收集。数据方面还有不少基本层面的问题在很是多的领域很是明显。
最近看到个例子,说pokemon go 带给玩家运动量上的变化:
一、应用中的数据分析示例:·
六个月之后,大部分pokemon go 的玩家的运动量逐渐和 non-player基本一致了。
看来确实是一个能用至关效果的游戏。
二、交通情况大数据分析示例:
Susan Etlinger: What do we do with all this big data?
Now, there's a group of data scientists out of the University of Illinois-Chicago, and they're called the Health Media Collaboratory, and they've been working with the Centers for Disease Control to better understand how people talk about quitting smoking, how they talk about electronic cigarettes, and what they can do collectively to help them quit. The interesting thing is, if you want to understand how people talk about smoking, first you have to understand what they mean when they say "smoking." And on Twitter, there are four main categories: number one, smoking cigarettes; number two, smoking marijuana;number three, smoking ribs; and number four, smoking hot women.
这里很是有趣
现在,在政策上,国家战略层面上,大数据受到的重视程度都愈来愈高。
应用场景上,如今分布在:
供应链和渠道分析&优化
订价分析与优化
欺诈行为分析&检测
设备管理
社交媒体分析&客户分析
《大数据时代》一书做者维克托认为大数据时代有三大转变:
“ 第一,咱们能够分析更多的数据,有时候甚至能够处理和某个特别现象相关的全部数据,而不是依赖于随机采样。更高的精确性可以使咱们发现更多的细节。
第二,研究数据如此之多,以致于咱们再也不热衷于追求精确度。适当忽略微观层面的精确度,将带来更好的洞察力和更大的商业利益。
第三,再也不热衷于寻找因果关系,而是事物之间的相关关系。例如,不去探究机票价格变更的缘由,可是关注买机票的最佳时机。”大数据打破了企业传统数据的边界,改变了过去商业智能仅仅依靠企业内部业务数据的局面,而大数据则使数据来源更加多样化,不只包括企业内部数据,也包括企业外部数据,尤为是和消费者相关的数据
据野史记载,中亚古国花剌子模有一古怪的风俗,凡是给君王带来好消息的信使,就会获得提高,给君王带来坏消息的人则会被送去喂老虎。从前的人喜欢批评这位君王的天真品性,觉得奖励带来好消息的人,就能鼓励好消息的到来,处死带来坏消息的人,就能根绝坏消息。
在今天这个信息爆炸的时代,咱们不必定能让信使必定送来好消息,但你可让咱们的爬虫定时给你送来最有用最合你需求的信息。