We Analyzed 13,124 HubSpot Blog Posts: Here’s What We Learned

In this data story, we dive into the HubSpot Blog — one of the world’s most active marketing blogs. Our goal is to get a high-level understanding of HubSpot’s content strategy by analyzing their topics, links, and content structure at large scale. To this end, we crawled 13,124 URLs and used Frase’s NLP engine to extract topics automatically and other useful metadata from them.

The HubSpot Blog: Overview

  • Launched in 2005
  • Description: “HubSpot’s Blog for marketing, sales, agency, and customer success content, which has more than 400,000 subscribers and attracts over 4.5 million monthly visitors.”
  • Website property: blog.hubspot.com
  • Main categories:
    1. Marketing: everything you need to know to master inbound marketing
    2. Sales: expert inbound sales content for today’s sales organization
    3. Service: dedicated to helping transform today’s customer service organization
    4. News Trends: seeks to inform the curious learner of the latest research, developments, and trends from the tech sector, and where it converges with business, life, and entertainment
  • Traffic:
    • +4.5 million visits per month
    • 80% of traffic is organic (source)

The Dataset

  • URLs successfully processed: 13,124
  • All URLs belong to the blog.hubspot.com domain
  • Blog posts published between 2013 and 2019. Approximately 80% of the content in this dataset is from 2016 to 2018; this study focuses on this period in order to keep results more accurate.

Results

1. Content-Length

The average word count has more than doubled since 2016. Higher word count aligns with SEO ranking factors studies showing that Google prefers long-form content.

2. Images

Along with content length, the number of images has also increased significantly since 2016. Studies have also shown Google’s preference for content that includes images.

3. Content Types

We used simple keyword matching to generate the content types displayed in the chart. For example, if the title includes the word “ebook,” then the post is considered an ebook.” In the case of “List”, we used an algorithm that identified numbers or text features that implied a list type of article, for example, “The ten biggest trends in content marketing.”

As shown in the chart below, more than half of HubSpot’s content is either a list or a how-to article which aligns with studies that show these content types to deliver best click-through-rate performance.

4. Links

This section analyzes hyperlinks used in the full text. The number of hyperlinks used per post has almost doubled since 2016.

The majority of links are external (they point to domains outside hubspot.com), but the share of internal links increased significantly in 2018.

Over half of HubSpot’s internal links point to the blog, followed by landing pages hosted at the root hubspot.com level.

The chart below shows the most common domains HubSpot’s posts are linked to. Domains appear to be a combination of highly reputable technology publishers (Techcrunch, Wired), business publishers (HBR, Forbes), research-oriented sites (Wikipedia, Statista, Investopedia), and internet marketing blogs (Moz, Search Engine Land).

5. Social media

The chart below breaks down HubSpot’s hyperlinks pointing to social media and content-sharing platforms.

6. Topics

For every blog post, Frase analyzed both titles and full text to extract topics. Topic extraction was performed using Frase’s Named Entity Recognition engine, which can detect topics and classify them by concept, organization, person, and location.

The chart below shows the most prominent topics mentioned across titles (2016-2018 period).

The table below shows the top 10 title topics by year. This table excludes company names (such as “Facebook” or “Google”), and focuses on conceptual topics (such as “social media” or “landing page”).

The chart below shows the most mentioned organizations in the full text. Excluded from this chart are Facebook, Google, Twitter, and Linkedin.

The chart below shows people (first name and last name) detected in the full text.

7. Topic Trends

One of the goals of this study was to identify topic trends. HubSpot heavily used specific topics in the past that are less prominent, or inexistent in the present.

The chart below shows some clear examples of topics that were popular in 2017, but much less used in 2018. In the case of “inbound marketing,” the topic was mentioned in 150 titles in 2017, but it didn’t make it to a single title in 2018.

In contrast, the chart below shows topics that were not prominent in 2017 that picked up visibility in 2018. For example “customer success” never appeared in titles in 2017, but became a relevant topic in 2018.

After analyzing the full dataset (with blog posts dating back to 2013), we identified specific topics in 2018 that had never appeared in titles before.

Conclusions

  • Hubspot is not only publishing dozens of blog posts per week, but they are creating long-form content.
  • HubSpot is betting on educational content in the form of guides, how-to and lists.
  • HubSpot has created a wide net of topics which maximizes their chance of ranking for many search queries.
  • HubSpot is keeping their content up-to-date with news-driven topics and trends. With the launch of their News & Trends category in 2018, HubSpot is not only an inbound marketing blog, but it is closer to becoming a publisher.