Recently, I was listening to Cloudcasts and their year in review and predictions for the next year. One thing that particularly struck me was the discussion on security and the impact it will have on the software industry as a whole.
Now, I want to use this blog post to put this a bit into context from my perspective and add some predictions there as well. In the podcast the story started out from SolarWinds breach and then moved the discussion into the direction of “build sec-ops”. While build sec-ops might be currently an underrepresented topic, I believe there is a much bigger area left to talk about.
If you look at the keynote and individual presentations of the 2020 re:invent (please forgive my bias), in the area of data management, there is an interesting trend. With the large number of services providing some kind of data analysis and presentation, the integration between all of these services will become even more important. From my experience with customers using products of the AWS analytics space, it very much shifted from using just one service for everything to giving data analysts, data scientists, machine learning engineers, business analysts, basically everyone access to the right tool for their job.
Let’s have a quick look at the announcements that will have a profound impact on the analytics environment in AWS. Let’s start with “Glue Elastic Views”, this feature is a part of the “Glue” ecosystem, and it underlines the actual “glue” piece in the analytics space. “Glue Elastic Views” is positioned as a data integration piece that automatically manages materialized views from several source systems into several target systems. Where customers had to previously manage and update these extracts from, for example Amazon DynamoDB to Amazon Redshift, manually, this new service will step in and take care of it. Next up is Amazon Redshift announcing the preview of data sharing between Redshift clusters. Being able to seamlessly access data from multiple Redshift clusters within an account is crucial to enable the desired flexibility of giving the individual personas across the customer organizations access to the right tool for the different jobs.
A little bit on the side was a more quiet announcement of additional features for Amazon Lake Formation. While the actual text of the announcement is short, it’s going to be important to distill its pieces.
- There is going to be a new table type in Glue called “governed tables” that supports ACID transactions.
- Automatic compaction and optimization of governed tables in the background - no more small files.
- New APIs for row-level access control and acceleration.
Each of these individual features is probably worth a lot for customers building their data lakes in AWS. And they seem to imply that customers are going to get some tools to enforce security across their lake in an easy way.
The reason why this is important in the context of the “year of security” is that all of these features promote collaboration and integration of the most crucial and important data of customers storing data in the systems of the different cloud providers. In particular Glue Elastic Views bridges a gap between several different services in a novel way which means that this will put pressure on additional requirements for governance, auditing and permission control across the different services. One challenge is going to be to bridge the gap between source users in the source systems and the consumers in the target systems. It becomes even more cumbersome, if the user logging into e.g. an RDS instance using an elastic view is logically the same user as in the RDS instance and in both cases federates into AWS via Okta or Ping, but the physical users are completely independent and there is no way to properly track resources and permissions across the different AWS services. In the context of Lake Formation, it means that customers that move to a concept of managed tables with managed storage cannot only rely on security provided by the underlying storage system like S3, but require additional permissions and auditing capabilities on top of it.
For a long time, AWS services were able to each develop their own security model and get away with it. My prediction for the year 2021 is that this is no longer enough. In the same way as enterprise customers have moved to quickly adopt AWS organizations for resource separation and governance, they will expect similar enterprise level features for individual data access.
But of course, this is not only true for AWS, but recognized as well by Google, here Debanjan Saha, VP of analytics at Google Cloud, mentions security and governance very high up in his 2021 predictions as well:
Compliance can’t just be an add-on item. The modern cloud model has to be one that can withstand the scrutiny around data sovereignty and accessibility questions. It’ll change how companies do business and how much of society is run. Even large, traditional enterprises are moving to the cloud to handle urgent needs, like increased regulations. The stakes are too high now for enterprises to ignore the critical components of security and privacy.
Not to forget, this is not just a market for established players, but there are smaller players already in the market advertising solutions that make it easy to secure data access across multiple data source and provide the necessary data governance features. Take for example Okera, they offer fine grained data access and policy enforcement across data sources and systems providing a homogenized security layer the directly integrates into the customers identity solutions.
Generally, my hope (rather than prediction) is that identity management in cloud environments will become easier. My own experience tells me that the current authentication and authorization concepts that heavily lean towards infrastructure management are not sufficient for the plurality of services that todays data scientists and engineers want to leverage. Managing n different identity islands is going to be a huge problem, especially if AWS is continuing to churn out new service at their current pace.
The company or cloud provider that is going to make it easy to govern and audit data management across multiple data source in a reliable and transparent way is going to grab a lot of data, which in turn means usage and revenue.
To summarize, when cloud providers are launching more services with individually smaller surface area, the need for homogenized data access, governance and auditing will become even more important. Sharing data across multiple sources to many different targets will mean even more detailed discussions about cross-service security boundaries, governance, and compliance regimes. In my opinion 2021 will be a great year for new innovations in the area of security and data management in the cloud.