用 Python 分析电影《我和我的家乡》(2)

当前位置:

首页 > 编程开发 > python爬虫 >

用 Python 分析电影《我和我的家乡》(2)

效果如下：

从图中我们可以看出大家在下午 ~ 晚间活跃程度比较高，因 19 左右是晚饭时间，这个时间段评论数量下降也合乎常理。

主要演员

我们接着来看主要演员（包括其饰演的角色）在评论区中被提及的情况，主要代码实现如下：

cts_list = df.iloc[:, 2]
cts_str ="".join([str(i) for i in cts_list])
px = ["黄渤", "王宝强", "刘昊然", "葛优", "刘敏涛", "范伟", "张译", "邓超", "闫妮", "沈腾", "马丽"]
py = [cts_str.count("黄渤") + cts_str.count("黄大宝"), cts_str.count("王宝强") + cts_str.count("老唐"),
      cts_str.count("刘昊然") + cts_str.count("小秦"), cts_str.count("葛优") + cts_str.count("张北京"),
      cts_str.count("刘敏涛") + cts_str.count("玲子"), cts_str.count("范伟") + cts_str.count("老范"),
      cts_str.count("张译") + cts_str.count("姜前方"), cts_str.count("邓超") + cts_str.count("乔树林"),
      cts_str.count("闫妮") + cts_str.count("闫飞燕"), cts_str.count("沈腾") + cts_str.count("马亮"),
      cts_str.count("马丽") + cts_str.count("秋霞")]
(
    Bar(init_opts=opts.InitOpts(theme=ThemeType.CHALK, width="700px", height="400px"))
    .add_xaxis(px)
    .add_yaxis("", py)
    .set_global_opts(
        title_opts=opts.TitleOpts(title="主要演员及其饰演角色被提及次数", subtitle="数据来源：猫眼电影", pos_left = "center")
    )
).render_notebook()

效果如下：

从图中我们可以看出主要演员在评论区出现次数的前三强为：沈腾、范伟和邓超，进而说明这几位演员的热度比较高，在评论区引起了大家广泛的热议。

电影单元

我们接着看每个电影单元在评论区被提及的情况，主要代码实现如下：

mx = ["天上掉下个UFO", "北京好人", "最后一课", "回乡之路", "神笔马亮"]
my = [cts_str.count("天上掉下个UFO"), cts_str.count("北京好人"), cts_str.count("最后一课"), cts_str.count("回乡之路"), cts_str.count("神笔马亮")]
(
    Bar(init_opts=opts.InitOpts(theme=ThemeType.DARK, width="700px", height="400px"))
    .add_xaxis(mx)
    .add_yaxis("", my)
    .set_global_opts(
        title_opts=opts.TitleOpts(title="电影单元被提及次数", subtitle="", pos_left = "center")
    )
).render_notebook()

效果如下：

从图中我们可以看出电影单元《最后一课》被提及的次数超过了其它几个单元被提及次数的总和，进而可以看出其热度比较高，引起了大家的共鸣，有点一枝独秀的感觉。

词云展示

整体词云

首先我们来看一下整体评论的词云，代码实现如下：

cts_list = df.iloc[:, 2]
cts_str ="".join([str(i) for i in cts_list])
stylecloud.gen_stylecloud(text=cts_str, max_words=400,
                          collocations=False,
                          font_path="SIMLI.TTF",
                          icon_name="fas fa-home",
                          size=800,
                          output_name="total.png")
Image(filename="total.png")

效果如下：

从图中我们可以直观的看出：好看、很好看、值的一看、不错、最后一课等被提及的次数比较多，说明大多数人对影片是比较满意，电影单元最后一课热度比较高、引起了很多人的共鸣。

热评词云

最后，我们看一下热门评论（点赞多、回复多的评论内容）的词云，代码实现如下：

hot_str = ""
for index, row in df.iterrows():
    content = row[2]
    support = row[6]
    reply = row[7]
    if(support > 30):
        hot_str += content
    elif (reply > 5):
        hot_str += content
stylecloud.gen_stylecloud(text=hot_str, max_words=200,
                          collocations=False,
                          font_path="SIMLI.TTF",
                          icon_name="fas fa-fire",
                          size=800,
                          output_name="hot.png")
Image(filename="hot.png")

效果如下：

这个热门评论的画风和之前有点不一样了，最醒目（最大）的词汇是：UFO、难看、电影倒是没看、和对象开演前十分钟分手了… 最后这个不多说了，大家自行体会吧~

因采集的评论数据有限，可能与实际情况存在一定的偏差，大家理性看待即可。

转载自Python小二