Machine Learning for Entity Resolution on Social Media User Accounts

A convenient way to quickly engineer features using large volumes of data and rapidly construct a pilot exploratory model.

PI.EXCHANGE partnered with this client to provide cyber surveillance functionality by delivering a model trained and deployed as an API endpoint with the AI & Analytics Engine. This enabled the client to provide a "user matching" service to law-enforcement agencies such as the Department of Home Affairs and the Australian Federal Police.

 

Overview

The data collected for this pilot project consists of social media posts obtained as a result of searching and applying some filtering criteria. Social-media platforms were scoured for posts resulting from keyword searches. In addition to the text of the messages, metadata (such as profile name, profile information, message timestamp, post URL, etc.) was also collected. This resulted in around a million messages spanning across 10 days.

The client's challenge necessitated a means to automatically analyze a massive volume of cyber surveillance data as well as automatically predict and identify potential "matters & persons of interest" regarding potential malicious intent to conduct cyber attacks. These challenges were further complicated by the potential for persons of interest to require identification under multiple aliases and across communication channels, as well as overcome technical overheads such as analyzing data fast enough within the short time frame and modest budget, responding to fast-changing dynamics through self-serving, self-learning and self-adapting, and ultimately matching the strong standard of operational security & privacy endemic to the nature of the client's work.

The client's challenge necessitated a means to automatically analyze a massive volume of cyber surveillance data as well as automatically predict and identify potential "matters & persons of interest" regarding potential malicious intent to conduct cyber attacks. These challenges were further complicated by the potential for persons of interest to require identification under multiple aliases and across communication channels, as well as overcome technical overheads such as analyzing data fast enough within the short time frame and modest budget, responding to fast-changing dynamics through self-serving, self-learning and self-adapting, and ultimately matching the strong standard of operational security & privacy endemic to the nature of the client's work.

 

Solving the problem with an AI & Analytics Engine application

The AI & Analytics Engine offers a highly accessible and automated solution to such challenges by offering AI-powered automation and recommendation to vastly simplify the process of developing a specialised AutoML application. The client took advantage of our rich library of pre-built data connectors to connect their data stream which was then instantly & automatically analyzed by The Engine, before our Model Recommender offered the 5 best model template options for the client to progress with. Of these the client selects 3 and is now able to rapidly explore & experiment on our Engine by training all 3 options in parallel to identify top performers when it comes to understanding the data in a cyber security context. The best performing model was then made available for single click deployment via the Deployment & Management Engine and the production instance was deployed on an environment with the high security requirements.

Altogether a fully operational Prediction API was developed in only 5 working days (in comparison to a typical 1 month turnaround using traditional methods) by leveraging the automation and pre-tuned technology available in within the AI & Analytics Engine. This allowed the application development team to rapidly prototype their application, further innovate within the time-frame, and ultimately meet their deadline whilst maintaining the highest standards of security & privacy throughout the entire production pipeline.

We looked to PI.EXCHANGE to analyze large volumes of cyber surveillance data and predict and identify matters and persons of interest. By leveraging the smart and highly scalable automation and pre-tuned technology available within their AI & Analytics Engine, we were able to significantly accelerate insight extraction whilst maintaining the highest standards of security & privacy in a very cost effective setup.

 

Benefits and Outcomes

  • Deliver fully operational Prediction API in 5 days
  • Reduced total development cost
  • Lowered ongoing maintenance cost (no human interaction required)
  • Full high-security & privacy compliance
  • Accelerated insight extraction & analysis by 300%