在语义的世界里,能够近似地说:万事万物都是特征提取。你只要找到特征,事情就好办。…………你指望毕其功于一役吗?天然语言处理的真实应用里是很难有什么场景找到一个通吃特征的。都是一层一层特征叠加的。一层特征去掉一部分垃圾数据。如此反复,终成正果。注意方法论。
统计粗且糙,乃大锤。规则细而精,乃小锤。先大场后细棋。
KafkaSink.java
|
import kafka.javaapi.producer.
Producer;
……
public class KafkaSink
extendsAbstractSink
implements Configurable {
……
private Producerbyte[]< producer;
……
@
Override
public Status process()
throws EventDeliveryException {
Channel channel = getChannel();
Transaction tx = channel.getTransaction();
try{
tx.begin();
Evente = channel.take();
if (e ==
null) {
tx.rollback();
returnStatus.BACKOFF;
}
producer.send(newKeyedMessage< span style='font-size:12px;font-style:normal;font-weight:bold;color:rgb(255, 0, 0);' >String,
byte[]<(topic, e.getBody()));
tx.commit();
returnStatus.READY;
}
catch (
Exception e) {
|
KafkaSpout.java
|
public abstract class KafkaSpout
implementsIRichSpout {
……
@
Override
public
void activate() {
……
for(
final KafkaStream stream : streamList) {
executor.submit(new
Runnable() {
@
Override
public void run() {
ConsumerIterator< span style='font-size:12px;font-style:normal;font-weight:normal;color:rgb(0, 112, 192);' >byte[],
byte[]< iterator = stream.iterator();
while (iterator.hasNext()) {
if(spoutPending.get() < span>
sleep(1000);
continue;
}
MessageAndMetadata< span style='font-size:12px;font-style:normal;font-weight:normal;color:rgb(0, 112, 192);' >byte[],
byte[]< next = iterator.next();
byte[] message = next.message();
List< span style='font-size:12px;font-style:normal;font-weight:normal;color:rgb(255, 0, 0);' >Object< tuple =
null;
try{
tuple = generateTuple(message);
}
catch(Exception e) {
e.printStackTrace();
}
if (tuple ==
null|| tuple.size() != outputFieldsLength) {
continue;
}
collector.emit(tuple);
spoutPending.decrementAndGet();
}
}
|
EvaluateBolt.java
|
public classEvaluateBolt
extendsBaseBasicBolt {
……
@
Override
public void execute(Tuple input, BasicOutputCollector collector) {
……
if (LogWebsiteSpout.PAGE_EVENT_BROWSE.equals(event)) {
if (LogWebsiteSpout.PAGE_TYPE_GOODS.equals(pageType)) {
incrBaseStatistics(baseKeyMap, BROWSE_ALL, 1);
}
else if (LogWebsiteSpout.PAGE_TYPE_PAY1.equals(pageType)) {
incrBaseStatistics(baseKeyMap, ORDER_ALL, 1);
}
String recDisplay = input.getStringByField(LogWebsiteSpout.FIELD_REC_DISPLAY);
recDisplayStatistics(recDisplay, time, pageType, baseKeyMap);
}
else if (LogWebsiteSpout.PAGE_EVENT_CLICK.equals(event)) {
String recType = input.getStringByField(LogWebsiteSpout.FIELD_REC_TYPE);
|
窝窝的解决方案介绍列表: javascript
#研发解决方案#基于StatsD+Graphite的智能监控解决方案 html
#研发中间件介绍#定时任务调度与管理JobCenter java
#研发解决方案介绍#Recsys-Evaluate(推荐评测)
mysql
#研发解决方案介绍#Tracing(鹰眼)
redis
#研发解决方案介绍#IdCenter(内部统一认证系统) 数据库
#研发解决方案介绍#基于ES的搜索+筛选+排序解决方案 api
#数据技术选型#即席查询Shib+Presto,集群任务调度HUE+Oozie