A lot of naive users and early adopters would have started using services before knowing that their data could be tracked. When Google started as a search engine with page ranking, they definitely would not have visualised search personalisation. When Facebook started as a mere social network, personalised ads and recommendations across all their platforms might not have been their goal. What was the turning point and how did these giants start making profits through data?
Personal Data and Anonymisation
Traditionally, in a non-digital era, anything that could identify an individual was considered Personally Identifiable Information (PII). This includes attributes like name, portrait or photograph, government issued IDs, physical attributes, health records and such others. Some seemingly non-personal data like purchase records and travel records can also become PII. Today, we have a lot more attributes that can be treated as PII. IP address, device identifiers, account names, user handles, email addresses and phone numbers have effectively become digital fingerprints and thus PII.
In a hypothetical scenario of a village with a population of 100, a seemingly non-personal information like shoe size or hair length can become a PII. In some cases, knowing a set of data together can act as personally identifiable information. It is possible that in a given educational institute, there is only one student with green eyes. Knowing those two attributes, viz, institute’s name and eye colour, can reveal the otherwise “anonymised” information.
Removing all personally identifiable information from a data set leads to data anonymisation. Once the data is correctly anonymised, there is no way to trace back to the original data subject. There is also a different process called pseudonymisation where PII is/are mapped to a different unidentifiable attributes. For example, name could be converted into a randomly generated number. Unlike anonymisation, this process makes it possible to trace the data back to user. Pseudonymised data can be considered anonymous if access to the attribute-pseudonym mapping is restricted.
With the increased availability and usage of internet services, digital personal data collection has become widespread. A fairly well versed person might stay away from creating accounts and try using incognito or private browsing. But, that still does not account for the small blocks of tracker code added to almost every website in the name of ads or analytics. A little more advanced user might use content blockers. How about the web browser itself and lastly the internet service provider? What about the little voice assistants lurking in corners waiting for your command? And finally, the ever enhancing facial recognition technology and increased use of cameras which can pave the way to surveillance at an unprecedented level.
Data is New Gold
Recommendations have always existed in history. The local bookshop owner and librarians have always recommended books to regular customers depending on their interests. The restaurant next door remembers regular customers and serves food without them ordering. In these cases, there is a personal connection with the people involved. Also, they don’t give recommendations to strangers. Here, the bookshop or restaurant owners make that discretion on their own.
The improvements in Artificial Intelligence and Machine Learning have made it possible to create complex data models. Since machines process the data, there is no discretion regarding whose data is being processed and to whom the recommendations should be given. There is no stranger in the digital world. Machines do not discriminate between a regular and a stranger unlike the shop owner.
Today, there are a lot of companies that act as data collectors – entities that collect personal information. At times, these companies store and process the data themselves to enhance user experience. Some companies might sell this precious data to a third party for processing. A lot of these third party companies use the data for targeted marketing. A few might use them for scamming. There is a third possibility where a data breach happens at the data collector or processor site. This is the most dangerous aspect since no one has control on what can happen to that data. The possibility of a breach makes data as secure as the data site itself.
Let us focus on just the aspect of selling user data. A lot of businesses including Google and Facebook provide many services for free. These companies do not fall under the non-profit category. How do they make their services profitable? Targeted advertising is one of the many ways. With trackers across websites, extensive user data collected over the years and highly sophisticated user profiling models, these companies are able to facilitate individual-targeted advertising. Though they may not directly sell their data, they use the data to allow advertisers to target only specific user segments.
Firstly, all this personal information can provide a customised and “better” user experience. Getting personalised search results and relevant ads can save time and lead to convenience. In the medical field, user profiling can lead to better prognosis and save lives. Tracking and surveillance can also be used to reduce crime. At first sight, it looks like profiling is good and useful.
But by now many would have already lost their privacy. There are very few regulations and laws in place that deal with digital data privacy. Most of the regulations are in their infancy and some countries like India are working towards creating them. The biggest problem is lack of consent on data collection and processing. Though this is changing, most services do not provide a list of the data that is being collected and how it is processed and used. The option to opt-in or opt-out of tracking is either non-existent or extremely hard to find. There is also lack of clarity on how long and where the data is stored and processed. Ironically, it makes the subject, who the data is about, not have any control over it.
Some might say that they have nothing private or hide-worthy data. Sure, they might be fine with all sorts of customisation and targeted advertising. But, where does one draw the line? Are they okay sharing all their financial information? Do they mind getting scammed? Are they alright with giving away their phone numbers and government IDs? Does being under audio and video surveillance all day not bother them? We need regulations in place soon to ensure that valuable personal information is not misused and abused.
All major technology companies collect our data. Google collects everything including every click and voice commands on almost all their products, android devices and through website trackers. Facebook also collects similarly. Amazon collects browsing and purchase history and voice commands through Alexa. Netflix tracks a user’s watch and search history. Apple collects usage data from their devices and voice commands through Siri.
Some of these giants provide a level of transparency about where the data is stored and processed while some don’t. Some provide extensive privacy settings and some make the settings hard to find. Unless one goes back to how they lived 20 years ago and has never opened a full feature browser, their privacy is already lost. All we can do is protect the future.
By using certain applications and services like email, messaging and social networks, we are already opening up our personal information. It is possible to limit the data we let them collect by minimising usage and being careful with what we share there. These settings don’t handle digital fingerprints and data obtained through tracking via advertisements and analytics.
All major browsers provide an incognito or private mode. Surfing in this mode does not save history but this does not stop websites from knowing your digital fingerprint such as IP address, browser details, operating system and device details. Google Chrome has settings to block third party cookies and also sends a “Do Not Track” request. Firefox too provides similar features. However, it is up to the website to heed the request. Taking a step ahead, Safari gives only a simplified version of system configuration to avoid fingerprinting and also has settings to stop third party tracking. All these browsers provide extensions to make them more secure. The most privacy-centric browser today is the non-profit Tor. Tor provides this security at the cost of slow speed.
A secure browser may be able to solve the problem of privacy to an extent but the search engine might also be a culprit. Google Search is known to keep a track of user searches and clicks. Microsoft’s Bing also uses data to enhance user experience. DuckDuckGo is the most privacy centric search engine and does not track user searches.
There is one last point where privacy can be compromised and that is the Internet Service Provider (ISP). Every single interaction on the internet is routed through the ISP. The ISP can potentially collect and use this interaction data. Circumventing this is usually not something done by the average user. Using Tor browser solves this. Else, one has to change their Domain Name Server (DNS) or use a Virtual Private Network (VPN) to use the internet.
For both, a data collector and a subject, it is important to know where the data is stored. Country boundaries matter when it comes to storing and processing data. It is important to keep sensitive and personal information within a country for the sake of legal and security purposes. Thus, while choosing which services to entrust your data with, this is an important aspect. Supporting homegrown software and services helps keep data inside the borders.
With the General Data Protection Regulation (EU GDPR) being active for a couple of years and the California Privacy Rights Act (CPRA) under way, businesses have already started working towards data privacy. India is drafting the Personal Data Protection Bill. Initially, a committee headed by BN Srikrishna and now a committee chaired by Kris Gopalakrishnan have submitted reports. While countries work towards regulations, individuals need awareness. Companies lauded as heroes today may become tomorrow’s villains.
• Google collects everything, including every click and voice commands, on almost all their products, android devices and through website trackers
• Facebook also collects similarly
• Amazon collects browsing and purchase history and voice commands through Alexa
• Netflix tracks a user’s watch and search history
• Apple collects usage data from their devices and voice commands through Siri
• Microsoft’s Bing also uses data to enhance user experience
• The most privacy-centric browser today is the non-profit Tor. Tor provides this security at the cost of slow speed
• DuckDuckGo is the most privacy-centric search engine and does not track user searches
(The author is CEO, vConnects, and has worked at Apple, USA, as a Senior Software Engineer. Views are purely personal)
Now you can get handpicked stories from Telangana Today on Telegram everyday. Click the link to subscribe.