本文共 5531 字,大约阅读时间需要 18 分钟。
本机模式下,我们可以通过Storm框架实现一个从消息中间件持续读取消息的场景。每条消息作为一句话输入后,按照空格切分成多个单词,并统计每个单词的出现次数,最终打印出每个单词的出现次数。
在项目中引入必要的依赖项,确保 Storm 核心组件和相关工具包能够正常运行。以下是核心依赖的配置:
org.apache.storm storm-core 2.1.0 com.codahale.metrics metrics-core 3.0.2
这是一个生成单词的Spout拓扑,负责持续输出需要统计的句子。以下是该拓扑的实现代码:
import org.apache.storm.spout.SpoutOutputCollector;import org.apache.storm.task.TopologyContext;import org.apache.storm.topology.OutputFieldsDeclarer;import org.apache.storm.topology.base.BaseRichSpout;import org.apache.storm.tuple.Fields;import org.apache.storm.tuple.Values;import org.apache.storm.utils.Utils;public class WordCountSpout extends BaseRichSpout { private SpoutOutputCollector collector; private ListsentenceList = Arrays.asList( "The quick brown fox jumps over the lazy dog", "Dog does not eat dog", "The fox may grow grey but never good" ); @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { this.collector = collector; } @Override public void nextTuple() { Random rand = new Random(); String sentence = sentenceList.get(rand.nextInt(sentenceList.size())); collector.emit(new Values(sentence)); Utils.sleep(1); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("sentence")); }}
这个Bolt拓扑负责将句子拆分成单词,并将每个单词发射到下一个拓扑。以下是其实现代码:
public class SplitSentenceBolt extends BaseRichBolt { private OutputCollector collector; @Override public void prepare(MaptopoConf, TopologyContext context, OutputCollector collector) { this.collector = collector; } @Override public void execute(Tuple input) { String sentence = input.getStringByField("sentence"); String[] words = sentence.split(" '); for (String word : words) { collector.emit(new Values(word)); } collector.ack(input); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }}
这个Bolt拓扑负责统计每个单词的出现次数,并将结果发射到下一个拓扑。以下是其实现代码:
public class WordCountBolt extends BaseRichBolt { private OutputCollector collector; private MapwordCountMap = null; @Override public void prepare(Map topoConf, TopologyContext context, OutputCollector collector) { this.collector = collector; this.wordCountMap = new HashMap<>(); } @Override public void execute(Tuple input) { String word = input.getStringByField("word"); Long count = wordCountMap.get(word); if (count == null) { count = 0L; } count++; wordCountMap.put(word, count); collector.emit(new Values(word, count)); collector.ack(input); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); }}
这个Bolt拓扑负责将统计结果打印出来。以下是其实现代码:
public class PrintResultBolt extends BaseRichBolt { private MapwordCountMap = null; @Override public void prepare(Map topoConf, TopologyContext context, OutputCollector collector) { this.wordCountMap = new HashMap<>(); } @Override public void execute(Tuple input) { String word = input.getStringByField("word"); Long count = input.getLongByField("count"); wordCountMap.put(word, count); System.out.println("实时结果:" + word + " = " + count); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // 无需声明任何输出字段 } @Override public void cleanup() { System.out.println("--------------Result------------"); List wordList = new ArrayList<>(wordCountMap.keySet()); Collections.sort(wordList); for (String word : wordList) { System.out.println(word + " = " + wordCountMap.get(word)); } System.out.println("----------------------------------"); }}
这是一个用于管理所有拓扑的主类。以下是其实现代码:
public class WordCountTopology { public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("WordCountSpout", new WordCountSpout(), 1); builder.setBolt("SplitSentenceBolt", new SplitSentenceBolt(), 1).shuffleGrouping("WordCountSpout"); builder.setBolt("WordCountBolt", new WordCountBolt(), 1).shuffleGrouping("SplitSentenceBolt"); builder.setBolt("PrintResultBolt", new PrintResultBolt(), 1).globalGrouping("WordCountBolt"); StormTopology topology = builder.createTopology(); Config config = new Config(); config.setDebug(true); // 集群模式下直接提交 StormSubmitter.submitTopology("WordCountTopology", config, topology); }}
在集群模式下,只需通过Storm Submitter直接提交拓扑即可运行。以下是提交命令示例:
./bin/storm jar /Users/sunnan/BigData/storm-wordcount/target/storm-wordcount-1.0-SNAPSHOT.jar org.example.wordcount.topology.WordCountTopology
运行后可以通过Storm UI查看拓扑状态或使用命令storm list
查询运行情况。通过Storm UI还可以对拓扑进行管理,如激活、暂停或杀死。
PrintResultBolt的输出结果会记录在worker.log文件中,可以通过查看这些日志来查看实时统计结果。
通过上述配置,可以在本机模式或集群模式下顺利运行词频统计的Storm拓扑。
转载地址:http://tapez.baihongyu.com/