Customer Profiling using Machine Learning on Instagram

As a business, what if you can understand your customers by analyzing the content they follow on Instagram?

With machine learning, you can do just that.

By looking at the profiles that your followers follow, you can get a sense for what types of content your followers really engage with.

To illustrate this, I scraped the profiles followed by everyone who followed @sportstown, a pool hall and local staple in Orlando, Florida. The profiles selected were either businesses or profiles verified by Instagram. These types of profiles typically posted content that aligned with a common theme.

The idea was to use machine learning (in particular, Latent Dirichlet Allocation) to automatically group these profiles according to shared themes or categories.

These themes are characterized by certain keywords that are found in the captions of the latest 50 posts by each profile. The following topics arose after running the analysis on 42,500 captions from 850 Instagram profiles.

food foodie orlandoeats orlandofoodie delicious chicken foodporn cheese

body fitness workout gym health strong fit

Bikes and Tattoos
harley harleydavidson tattoo motorcycle bike

florida downtownorlando art thecitybeautiful water travel

ball usopen billiards seekers pool

Liquor and Wine
whisky cocktails whiskey bourbon distillery gin wine

dog dogs dogsofinstagram puppy pet animals pets doglover

bassplayersunited hiphop bass rap beat

que uma brasil mais ufc

design collection art piece modern painting

beer florida craftbeer food ipa brewery beers

trulieve hemp marijuana cannabis cbd florida dank sunshinecannabis

que con mejor florida gracias

lash bits lashes palette classic ebony afropunk

Food Culture
foodtruck handcrafted patreon foodporn dinner burger

This means that the folks who follow Sportstown are also into sports (who knew?), fitness, art, and many other interesting topics. Each of these topics are not evenly represented, so I looked at how many profiles were categorized into each topic.

It’s clear that that most followers are local foodies that are into fitness and drinking beer. There’s also a large faction of followers that enjoy art and have a strong affinity for dogs.

This type of information can naturally be used to drive marketing campaigns on Instagram that are based around the topics that Sportstown’s followers respond to the most. If you know what your customers want, you have a higher chance of designing successful campaigns.

To verify these results, I suggest you visit Sportstown. Based on my experience, you’ll be certain to encounter locals who bring their dogs, bikers with tattoos that play music, and fitness aficionados that surprisingly drink heavily!

City Culture: A New Perspective

In biology, a culture maintains tissue cells, bacteria, etc. in conditions suitable for growth.

Similarly, we can get a sense of a how a city has grown over time using a heatmap of home sale data. The following is a visualization of 4000 home sales across 40 years in Orlando, FL, U.S., scraped from Zillow. It is remarkable to see how human behavior often resembles life at the microscopic scale

If you’re familiar with Orlando, you’ll see that the two fastest growing neighborhoods are College Park and Colonial Town, with the Hourglass District and SoDo right behind them.

Commercial developers and potential home buyers can use this type of data to project growth and decide to build a business or buy a new home in a neighborhood before it becomes super saturated.

The Heartbeat of City

Cities are living, breathing animals with a pulse.

Each day, traffic flow swells and contracts through the major arteries of a city. The blood that runs through these vessels is teeming with cells of life that do work in the heart of the city to keep it alive and bustling.

In Orlando, FL, U.S., the city center is imbued with life from the two major highways that cut through its heart.

The data for this visualization was obtained from the freely available repository of traffic sensor data provided by the FL Department of Transportation. Below is an interactive map of all the sensor locations in Florida.

University Catalog Visualization

Visualizing a university’s catalog can illuminate aspects of the curriculum design. These characteristics should match our understanding of the disciplines being taught at the school.

As the largest university in the U.S., the University of Central Florida (UCF) has painstakingly built a robust catalog of colleges, departments, majors, and courses to service approximately 60,000 students each semester. These resources are all related in various ways: colleges have departments which curate majors that are comprised of courses that have other courses as prerequisites.

These relationships give rise to implicit connections between the disciplines of study. In what follows, we use a method of laying out these resources called a force layout, which visually positions elements together or apart based on simulated forces that depend on the connections between the elements.

Visualizing a College

A visualization of UCF’s College of Sciences reveals patterns that match our intuitions. Large nodes are departments, medium nodes are majors, and small nodes are courses. Lines connect nodes according to the aforementioned relationships:

As you traverse the graph around the top outer edge going left, you see the manifestation of the purity spectrum of the sciences, hilariously depicted in this xkcd comic:

Another interesting facet of this graph is the central position occupied by the department of statistics. As one moves from theoretical to empirical, the language of science shifts from the mother tongue of mathematics to the dialect of statistics. This medial position corroborates the classification of statistics as the “servant of all science.”

Lastly, we can find political science almost isolated at the bottom, suggesting that it’s a science of a special breed. Indeed, political science is a primarily observational endeavor, not experimental like most of the other sciences, although this isn’t always the case.

Visualizing a Major

Using UCF’s Mathematics major as example, we can see a rich structure emerge. The intricate set of course requirements induces a format that belies the graduated nature of mathematical understanding. Certain concepts must be learned incrementally at the zone of proximal development to ensure that the student can digest higher level materials.

In the graph above, the radius of a course corresponds the the number of courses that have it as a prerequisite. From this we can quickly identify the seminal courses in a math degree at UCF: Logic and Proof, Linear Algebra, and Calculus with Analytic Geometry III.

According to Alex Wissner-Gross, intelligence is characterized by the maximization of future freedom of action. To ensure that you have the most freedom of choice in what you can take in following semesters, it makes sense to take these foundational courses early.

By contrast, a psychology degree at UCF has a less rigid structure:

From this graph it’s apparent that once we’ve taken General Psychology, we’ve unlocked access to mostly all the courses necessary for a Psychology degree at UCF.

Possible Extensions with Data

If we had data on student grades and course choices semester after semester, we could identify the choke-points of a degree, i.e. where students are failing in their progression. We can use this information along with machine learning to improve advising services that assist students in choosing optimal paths through a degree, based on the data of other students like them who have succeeded. This data-driven advising could augment or replace curricular suggestions that have historically been based on faulty human insight, to the benefit of all students.