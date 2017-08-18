Amazon Macie automates cloud data protection with machine learning

Amazon promises AWS S3 customers that they will be able to identify and protect sensitive data faster with Macie, but is it enough to catch up to what Microsoft and Google offers?



Amazon offers a number of excellent tools to help enterprises keep their data and applications safe in the cloud. Last year, Amazon unveiled Amazon Inspector, its host-based application vulnerability assessment tool to monitor what is installed and configured on each virtual Instance. This year, it’s Amazon Macie, a security service designed to automatically discover and protect sensitive data stored in AWS.

As organizations move more of their data to Amazon’s various cloud offerings, security teams have the unenviable task of continuously tracking the data to identify, classify and protect sensitive pieces of information such as personally identifiable information (PII), personal health information (PHI), regulatory documents, API keys, secret key material and intellectual property.

Amazon Macie automates what has traditionally been a labor-intensive task by using machine learning to understand where sensitive information is stored and how it is accessed. Macie dynamically analyzes all attempts to access data and flags anomalies, such as large amounts of data being downloaded, uncommon login patterns, or data showing up in an unexpected location. Macie can alert when someone accidentally makes sensitive data externally accessible or stored credentials insecurely.

“Amazon Macie is a service powered by machine learning that can automatically discover and classify your data stored in Amazon S3. But Macie doesn’t stop there, once your data has been classified by Macie, it assigns each data item a business value, and then continuously monitors the data in order to detect any suspicious activity based upon access patterns,” Tara Walker, AWS tech evangelist, wrote on the Amazon Web Services blog.

Currently only available for S3 customers, Macie support for other AWS data stores will come later in the year.

Understanding Macie

Amazon Macie applies predictive analytics algorithms on authentication data such as location, times of access and historical patterns to develop a baseline for how each piece of data is used. To use Macie, administrators have to enable appropriate IAM (identity and access management) roles created for the service. Amazon has created sample templates for cloud formation to set up the necessary IAM roles and policies.

Instead of continuously scanning S3 buckets to find new data which needs to be classified, Macie uses event data from AWS CloudTrail to check for all PUT requests into S3 buckets. This way data is classified automatically as they are added into the buckets. Macie uses the file metadata, file contents and what it has learned about similar files in the past to properly classify the data. It doesn't rely on patterns to just recognize known data, such as PII, but can also look at things like source code. After classifying the data, Macie assign a risk level between 1 and 10, with 10 being the highest risk and 1 being the lowest data risk.

