Analysis of Personal LinkedIn Networks with Python

Learning More About My LinkedIn Connections through Data Processing and Visualization

Photo by Greg Bulla on Unsplash

As a social media manager, one of my favorite types of projects is the audience analysis of our company channels. The insights gained by this research are essential at the start of a new campaign; helping to better inform our decisions on how best to position a message, getting the most out of our ad spend, and more.

Having completed these types of reports before, I know what I can expect from company page data, but was unsure what might be available at a personal level. By digging into the data collected by LinkedIn, I’ll learn which companies I’m most connected to, the types of roles most frequently held, and the interests of my connections that can help inform what I write about in the future.

In this post, I’ll outline my approach to this analysis. I’m also posting a copy of the code on my Github so that you can recreate this analysis on your own network.

Downloading my Data

To begin, I first need to download the data available. This is easily obtained in your account settings. After selecting the data files you’re most interested in, a CSV file will be available for download within 10 minutes. Everything I‘m analyzing in this article specifically comes from the ‘Connections’ file.

LinkedIn also has a second category of data available for download within 48 hours. This set includes details like how ads are targeted to you, ads you’ve engaged on, and other inferential data based on your activity. You can find more about the specific data gathered and how to access it here: Downloading Your Account Data.

Data Analysis — First Steps

Once downloaded, I can begin to process the data and think more concretely about the questions I can ask of it. Below is a snapshot of the ‘Connections.csv’ file showing my ten most recent connections and more importantly a breakdown of the data structure. The columns available are First Name, Last Name, Email Address, Company, Position, and Connected On.

By taking the description of this dataframe, I can quickly get a count of total connections (1,113) as shown by the count of ‘Connected on’ entries. The most commonly found first name in my network is Jessica (16 instances); Brown the most common surname. Eight people in my network have made their email addresses available to their connections.

New Connections per Month

I’ve plotted a chart of my monthly new connections below. You can see early in my career that I was particularly active on LinkedIn. I may have been particularly ambitious in making new connections on the channel; however, it’s also likely that I was now able to make my off-channel connections official with my new profile.

Cumulative Sum of Total Connections

However, there is one insight that might not be so obvious in the previous monthly new connections chart. By viewing the same data in a different context, I see that I was able to grow my network consistently over the last several years.

I was a bit conflicted in exploring this open-source API. Genderize.io compares your list of first names to its database names and the probability that a certain gender has that name.

Is it perfect? Certainly not. First names are often gender-neutral or can have a different association by region. More importantly, the tool fails to take into account a person’s actual gender identity. Genderize.io answers only part of these issues by allowing for localization if you can narrow your network to a specific country.

With these limitations in mind, I do think this information allows us to explore more complex questions in addressing implicit biases and gender equity in the workplace. Here are some past uses of the API that I found interesting:

Anyway, here’s how the gender representation of my network has changed over time. I’m actually kind of amazed how consistent it stayed over the last several years.

Breakdown by Network by Company

Which organizations are most represented in my network? I could use a simple bar graph like the one below to view this data, but it doesn’t tell me much that I couldn’t already assume.

Alternatively, I could use a treemap like the one below to provide a more accurate network at a company level. Each rectangle represents a single organization, sized and ordered by the number of connections I have at each organization.

As you can see, I have a wide network but suspect the number of organizations I’m most connected to is a bit narrow. I don’t have much to compare this to, so if you end up doing this analysis on your network, let me know how you compare!

Most Frequently Positions Held by Connections

Similarly, I can view the most frequently held positions in my network. While it might be interesting to see I’m more connected to Marketing Managers than any other occupation, viewing the data isn’t particularly useful.

By converting this same ‘Positions’ column into a word cloud, I get considerably deeper insights into the thematic make-up of my network.

I see that connections are primarily in senior and director-level positions. This group also has a strong interest in digital marketing and communications, with an emphasis on strategy, analytics, and creative execution.

Conclusion

You should now know a bit more about the data collected on your LinkedIn network and how to quickly analyze it in Python. It’s a rather straightforward project, but I did enjoy learning more about my LinkedIn network, some of the different ways I could present this data, and finally documenting these insights online through datapane.

I have made the file available on Github. I’d love to hear your use and reiteration on this code. Finally, if you like to explore the technical side of social media and network analysis as much as I do, feel free to connect with me on LinkedIn or Twitter.

Social media manager on assignment in Germany. Currently pursuing MBA and MS Finance degrees. Let’s connect: https://www.linkedin.com/in/wesleymatthews/