When you freely choose to belong to a social network you have to keep in mind that you are going to share a lot of personal information about your real self. This personal data on social networks is a beacon that points to you. It is very important to share as little information about us as possible. Only the one needed to be able to create an account.
There are networks of cybercriminals that use scrapping tools to create databases of public profiles and then sell them to the highest bidder. It is no longer that they can illegally access databases but that they create programs to extract that data from public profiles.
When the extraction of personal data is permitted
Web scraping is the automatic collection of information (for example, by data collection software) from a website or other interfaces and functions developed for individuals. If you want to do it you have to start from the premise that maybe certain web pages do not allow you to do it. For example, in the LinkedIn terms and conditions, in point 8.2 it says:
Develop, support or use software, devices, scripts, robots or any other means or processes (including crawlers, browser plugins and add-ons or any other technology) to scrape the Services or otherwise copy profiles and other data from the Services;
That is, LinkedIn does not permit the development, backup or use of software, devices, scripts, robots or any other means or process (including browser trackers, plug-ins and browser plug-ins or any other technology) to extract the Services or copy profiles and other data from the Services. It does not allow it.
If the service does not allow you to do so, you can perform an unauthorized data extraction. You do it in a hidden way so that this activity is camouflaged among other normal uses. This is the most common thing.
The case of LinkedIn
Some time ago I shared an information about a data breach of the professional social network LinkedIn (see https://avertigoland.com/2021/06/new-security-breach-in-linkedin/). The company from the outset reported that its database had not been broken. It is now known that the person selling that data, TomLiner, got that information by using the LinkedIn Api.
This person for fun was dedicated to collecting millions of public data from LinkedIn users and subsequently put it up for sale on a hacking forum. No less than 700 million LinkedIn user accounts.
Obtain personal data using Apis
This user used a very simple modus operandi. For 2 months he spent extracting information every day from the Linkedin API. Api is an acronym for Interfax application programming. They are data that a website offers to its users or customers and are accessible to download. Above all it has a commercial purpose and allow one application to connect with another and share information.
It took so long because it was extracting data little by little because if the system observes that a user uses the api too much, it can ban that user forever.
Denying the evidence
LinkedIn’s response to this leak has been to deny that it occurred. In their blog they have been reporting the news of this leak. He denies that this is a problem but the use of his API to get data is more than evident.
In fact, TomLiner commented that he somehow managed to trick the Linkedin API and thus get more results without triggering any kind of alarm.
Using Python to extract personal data
On the internet there are many web pages that allow you to do web scraping using Python,a programming language widely used nowadays.
Whether you’re a data scientist, an engineer, or anyone analyzing large amounts of datasets, the ability to extract data from the web is a very useful skill. Suppose you find data on the web and there is no direct way to download it, web scraping with Python is a skill you can use to extract the data into a useful form that can be imported.
If you need more information about this you can visit these websites: https://www.datacamp.com/community/tutorials/web-scraping-using-python, https://www.edureka.co/blog/web-scraping-with-python/ and https://realpython.com/beautiful-soup-web-scraper-python/.
Summary
It’s a good idea to share the least of your personal information on any social network you use. Data extraction techniques are used today to collect public data from web pages and create databases for later sale on the black market.
Using programming languages such as Python can allow you to obtain personal data in an automated web scraping way. This data extraction cannot be avoided and is often done without a web page knowing.
Security in online services has to be increasingly strict. The use of APIs can be a problem and their safety should be improved in the near future. Cases like the 700 million LinkedIn accounts is a clear demonstration of this.
References
- https://www.bbc.com/news/business-57841239
- https://www.youtube.com/watch?v=Bg9r_yLk7VY
- https://www.datacamp.com/community/tutorials/web-scraping-using-python
- https://www.edureka.co/blog/web-scraping-with-python/
- https://realpython.com/beautiful-soup-web-scraper-python/