Writing a “Live Follower Count” Python Program for my Website

Writing a “Live Follower Count” Python Program for my WebsiteAn Alternative to traditional webscraping with Applescript and OCRJack Blair9 min read·4 hours agoOn my personal website, I have a “live” counter of my total followers across all different platforms: LinkedIn, GitHub, Youtube, Instagram, Twitter, Medium, and Facebook. I’ve been trying to find ways to game-ify social media for myself and having accurate counts of my followers and data is crucial to that goal.See it in action here:Jack BlairJack Blair is a student entrepreneur from Purdue University building technology at the intersection of AI and human…jack-blair.comThe “live” follower count on my website.Currently, to update this follower count, I manually go to each website to sum up this number and change it on Framer. However, I’m a adventurous data scientist, so let’s see if we can find a better option. My goal is to create an autonomous script that can automatically visit each platform and tally up the results.Automating the Scraping ProcessAttempt 1: Using Other PlatformsMy first thought was to use third party platform that could unify all of this data into one place on my behalf. Platforms like https://www.hootsuite.com/ and https://funnel.io/ offer these tools but they are quite expensive and built for small business who are willing to invest in this critical data. For my purposes, I literally just need a script to sum the number of followers I have.These large platforms also have the huge downside of needing to “connect” all your accounts to them (which basically means giving up your passwords/security to use them).This route is definitely a dead-end for my goal.Attempt 2: Using Social Media APIsThis is the second worst option. Every single social media platform has their own API, billing, feature set, and rate limits. For example, there is a Twitter API, Facebook Graph API, Unsplash API, and so on.Although they are quite safe since they are offered by the social media companies themselves, setting up and maintaining each of these will take an enormous amount of time and force me to enter my credit card into even more websites.Additionally, all of your usage is tied to your account which can lead to banning and abuse.This is not something that I want to do.Attempt 3: Using Python librariesLet’s see if anyone else has tried to tackle this problem before me.Seems like someone made an API for LinkedIn!GitHub - tomquirk/linkedin-api: 👨‍💼Linkedin API for Python👨‍💼Linkedin API for Python. Contribute to tomquirk/linkedin-api development by creating an account on GitHub.github.comThere’s also Pytube, a YouTube API. https://github.com/pytube/pytubeTwitter has banned all other APIs after Elon Musk took over, so it looks like I’ll have to pay for that one. Also, Medium seems to have one.Looks like I’ll need to maintain a version of each python library for every single website I want to count from…Also, some of these libraries work by passing in my exact username and password, which could possibly ban my accounts down the line. Additionally, I have to worry about managing all the updates to these libraries and changing my code as these webistes update their frontends.It seems so simple to be that by just visiting each of the links for each of my social media, I can clearly see how many followers I have. I think there is a better way to solve this problem!Attempt 4: Using HTML scraping (Puppetter, chromium, etc)Now here is where we can start some real coding. I found this code to grab the number of twitter followers.from bs4 import BeautifulSoup
import requests
handle = input('Input your account name on Twitter: ')
temp = requests.get('https://twitter.com/'+handle)
bs = BeautifulSoup(temp.text,'lxml')
try:
follow_box = bs.find('li',{'class':'ProfileNav-item ProfileNav-item--followers'})
followers = follow_box.find('a').find('span',{'class':'ProfileNav-value'})
print("Number of followers: {} ".format(followers.get('data-count')))
except:
print('Account name not found...')Let’s see if it works!Ok that definitely did not work… Also, it seems like for other websites, the popups to sign in and join block the content that I want to see.I just need a way to automatically open up my Chrome browser and navigate to the websites that I am alreadly authenticated with. It seems like using an automated browser like Puppetter or Playwright is not going to be scalable enough.Attempt 5: Using Open-InterpreterWith this in mind, let’s figure out how I can open up chrome automatically and grab the content. I saw recently that open-interpreter is able to open applications, so let’s ask it to count my LinkedIn followers.This is great, the page opens correctly, but the problem is, there is no way to extract information from the screen. We can’t access the HTML from open-interpreter, and it still can’t screenshot consistently.As a side note, from here, I explored a JS cli tool called capture-website which did a good job at screenshooting, but fell to the same shortcomings that puppetter and others did.https://www.npmjs.com/package/capture-websiteSo, we’re able to navigate to all the websites and manually screenshot the page, so where do we go from here?Attempt 6: Using Apple Script and Tesseract (OCR)We know we can navigate to each website using Applescript. That’s how open-interpreter did it under the hood anyway. Now, we just need to figure out how to automatically screenshot the pages once we get there.From here, we can use OCR (a relatively new technology) to extract the text from the screen. This is great because if the UIs of these social media pages changes, our script will be able to adapt and won’t rely on other libraries to update.With this plan, let’s write our script.YOUTUBE_LINK = 'https://www.youtube.com/channel/UC-uTdkWQ8doqRwXBlkH67Dw'
LINKEDIN_LINK = 'https://www.linkedin.com/in/jackblair876/'
GITHUB_LINK = 'https://github.com/JackBlair87'
INSTAGRAM_LINK_MAIN = 'https://www.instagram.com/jack.blairr/'
INSTAGRAM_LINK_SECONDARY = 'https://www.instagram.com/jack.bl.ai.rt/'
TWITTER_LINK = 'https://twitter.com/JackBlair87'
MEDIUM_LINK = 'https://medium.com/@jackblair87'
FACEBOOK_LINK = 'https://www.facebook.com/jack.blair.94043/'

PROFILE_LINKS = [{ 'link' : YOUTUBE_LINK, 'platform' : 'youtube', 'account' : 'main' },
{ 'link' : LINKEDIN_LINK, 'platform' : 'linkedin', 'account' : 'main' },
{ 'link' : GITHUB_LINK, 'platform' : 'github', 'account' : 'main' },
{ 'link' : INSTAGRAM_LINK_MAIN, 'platform' : 'instagram', 'account' : 'main' },
{ 'link' : INSTAGRAM_LINK_SECONDARY, 'platform' : 'instagram', 'account' : 'art' },
{ 'link' : TWITTER_LINK, 'platform' : 'twitter', 'account' : 'main' },
{ 'link' : MEDIUM_LINK, 'platform' : 'medium', 'account' : 'main' },
{ 'link' : FACEBOOK_LINK, 'platform' : 'facebook', 'account' : 'main' }]

TOTAL_FOLLOWERS = 0Let’s first formally define each of the profiles we want to sum. We include an account variable in the dictionary so we can sum two or more profiles in each platform, like an instagram main and instagram alt.
def take_screenshots():
for link in tqdm(PROFILE_LINKS):
temp_link = 'screenshots/' + link['platform'] + '[' + link['account'] + '].png'

#join this path with the current path
full_link = os.path.join(os.getcwd(), temp_link.replace(' ', '\ '))
print(full_link)

r = applescript.run(f"""
tell application "Google Chrome"
activate
open location "{link['link']}"
set thePath to "{full_link}"
delay 5
do shell script ("screencapture " & thePath)
end tell
""")

#save it to the desktop
image1 = pyautogui.screenshot(full_link)
image1.save(full_link)

r = applescript.run(f"""
tell application "Google Chrome"
try
tell window 1 of application "Google Chrome" to ¬
close active tab
end try
end tell
""")Now, we can define a function that iterated over the list of links and navigates to the page, screenshots using pyautogui, and saves the screenshot. Next, let’s use an OCR library called tesseract to extract the text from the screenshots.def extract_follower_count(text, platform):
platform_mappings = {
'youtube': 'subscribers',
'instagram': 'followers',
'twitter': 'followers',
'facebook': 'friends',
'github': 'followers',
'linkedin': 'followers',
'medium': 'followers'
}

if platform in platform_mappings:
keyword = platform_mappings[platform]
count_text = text.split(keyword)[0]
count = list(map(str.strip, count_text.split()))[-1]
print(f'{platform.capitalize()} count:', count)
return convert_youtube_strings_to_values(count)
else:
print('Platform not supported')
return None

def ocr_image(image_path, platform):
global TOTAL_FOLLOWERS
# Load image
img = cv2.imread(image_path)
# Convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply threshold to convert to binary image
# threshold_img = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Pass the image through pytesseract
text = pytesseract.image_to_string(gray)
# Print the extracted text
# print(text)

#save to a text file
with open('text_files/' + image_path.split('/')[1].split('.')[0] + '.txt', 'w') as f:
f.write(text)

TOTAL_FOLLOWERS += extract_follower_count(text.lower(), platform)

return textExample screenshotted image:Example extracted text:GM OMOUR®) @ oO © © S CG GG CEG 9 sBeg x +

23 https://www.linkedin.com/in/jackblair876/ A small AppleScript to take a
screenshot every 30 seconds for 8...

a: 3 Try 1 month offer i
‘in| Q Search aa _ : yt gist.github.com
My Network Messaging Notifications For Business Premium for $

Profile language V4
English
Public profile & URL V4

www.linkedin.com/in/jackblair876

Ad ee

Get the latest jobs and industry news

i zm Purdue University College
Jack Blair 9 , of Engineering \\
@ PURDUE '26 | @ TJHSST '22 | °. 2x Founder | & Al Researcher |
® Full Stack Developer | »# Eagle Scout Jack, explore relevant opportunities
San Francisco Bay Area - Contact info with Twilio
My Portfolio (7 ( Follow -

\ )

3,706 followers - 500+ connections

t opento J Add profile section )( More )

Open to work o Ron Nachum - ‘st
Software Engineer, System Engineer, Mechanical Engineer and Project Engineer roles Founder | Harvard CS/Stat |
Show details

Other similar profiles

ronnachum.com

Rohin Inani - 2nd
Incoming SWE Intern @

Analytics
© Private to you

$ 845 profile views ul, 4,125 post impressions Q. 339 search appearances Ambarella | CS @ Purdue...
Discover who's viewed your Check out who's engaging with See how often you appear in
profile. your posts. search results. a* Connect
Past 7 days
. Kushagr Khanna - ‘st A Messagin: ow BA
Show all analytics > 9 . we gingextract_follower_count() finds the correct word (followers, subscribers, friends) for each platform, and grabs the value before that word. This works quite well, but to improve this, an LLM could be added post processing.Let’s see how this script is doing…As of May 8th, 2024, it perfectly extracts the number of followers I have, even across multiple attempts.The full source code is in the repo below. You will be able to clone and add any links you want. I am calling this project HyperHerd since is relates to “herding” your followers.GitHub - JackBlair87/HyperHerdContribute to JackBlair87/HyperHerd development by creating an account on GitHub.github.comWhat Can Be ImprovedReflection and LimitationsThis script works extremely well. I need to create benchmarking later, but from my own tests, it is quite consistent. There are a few downsides, however.It only works on MacThe computer needs to be awake and the user needs to manually start the script for it to workIt completely stop the user from doing anything else while it is runningIf there is no WiFi, or slow WiFi, it will breakThere is no way to track the followers per platform over timeIn an ideal world, this code could be executed on a server, on a set time interval, and stored in a database in the cloud, which could then the queried or remotely pushed out to a number of sources.However, since the user needs to be signed into all of the services and have correct cookies and session information to prevent bans, it seems like staying on-device is the correct solution.ImprovementsTo look further into this, I want to explore using a Chrome extension instead of Applescript to complete this process. From what I understand, it has a few main advantages.Our core operations are to Open Chrome, Open a link in a new tab, Screenshot the webpage (or scrape HTML), perform OCR, and parse the response. Since Tesseract has a javascript library, all of the above are possible to do with chrome extensions. That would solve the cross platform issue, and fix the not being able to do anything else why its running. Additionally, we could run the extension in the background at certain intervals and have it store the data in localstorage (or Firebase) to track the follower counts over time.If the OCR is causing too many sources of error. Using a Chrome extension grants us access to the raw HTML, which when combined with HTML-to-Markdown code, can be exported and parsed, greatly increasing accuracy.Either way, a Chrome extension seems like the intuitive next step for the project and I’ll make a part 2 outlining that once I create it! In the meantime, if you have any advice and questions, feel free to leave a comment below.Also, make sure to follow me on all platforms to help me test my script!! :)P.S. Life is not about follower count. But it is about data. ❤️

Web Scraping Applescript Follower Count Scripting Python Scraping