Shifting Left with Lightlytics

The public cloud continued to dominate spending in 2021. Gartner forecasts worldwide end-user spending for public cloud to reach $397.4 billion by 2022. With increased velocity; automation continues to be a critical business imperative for the enterprise. Getting automation right means getting all the appropriate teams pulled into the process early on (shift left). Lightlytics is a new SaaS product on the market that aims to make DevOps for cloud infrastructure as agile as software delivery.

What is Shifting Left?

Shift left is the practice of getting adequate testing completed early, often, and with the right team engagement. Instead of cloud security practitioners being brought in to remediate an environment that is QA getting ready to go Prod, shifting-left would integrate security policy as early as initial development pull requests (or even commits) to version control. If you can’t merge risky infrastructure-as-code in the first place, then it will never make it to Prod!

Getting Started
#

Lightlytics is a SaaS platform that empowers SRE + DevOps to automatically predict, pre-empt, and prevent failures, downtime, or business disruption caused by infrastructure. Getting started is pretty simple as they offer a 14-day free trial (no credit card required). Let’s take it for a spin!

Adding an AWS Account
#

To get started, you need to add an AWS account. This is done by simply providing your Account ID, choosing a Display Name, and selecting the AWS regions you have infrastructure deployed to. Lightlytics creates an IAM Role for Read Access using a CloudFormation Stack.

To keep your posture up-to-date in real-time, your account must have CloudTrail configured with a Management Events trail that applies to all regions. You can then enable real time collection of configuration events and updates (meaning Lightlytics stays up-to-date with infrastructure changes as they happen). This is enabled through an additional CloudFormation Stack.

A Common Scenario
#

Security-focused teams in enterprises generally want to decrease or eliminate direct internet exposure (especially from EC2 instances). Somehow, this still ends up happening, and reactive projects are scoped out to remediate. Let’s take this scenario through the paces with Lightlytics and see how it might prevent us from doing this.

Some Risky Configuration
#

The following security group is referenced in my aws_network_interface configuration, which is attached directly to my EC2 instance. I also set up an aws_route with the destination CIDR of 0.0.0.0/0 to an IGW. This configuration will allow both ingress and egress internet traffic for the ec2 instance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
resource "aws_security_group" "allow-all" {
  name        = "allow-all"
  description = "Allow all ingress/egress"
  vpc_id      = aws_vpc.vpc.id

  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]

  }
}

Security Today is Reactive

One trend I’ve noticed with cloud security in the enterprise is, response to risky configuration seems to be reactive. Some bad configuration happens over a period of time followed by an audit. The audit yields actionable insights, and a remediation effort is planned. It is never easy remediating production infrastructure as an outage is usually incurred. How could we get security in the loop to get eyes on and understand the scope of the change before it ever gets any legs?

Simulation with GitHub Actions Integration
#

This GitHub Actions workflow is slightly modified from the Lightlytics Documentation. Every time an attempt is made to merge changes to infrastructure into main, this workflow will kick-off and simulate the proposed changes in Lightlytics. This works by executing the Terraform Plan and sending the plan output to Lightlytics with each designated trigger (push/pull request). A link is generated directly to the simulation in the gitflow. In a subsequent release, Lightlytics plans to release the ability to automatically fail pull requests if a violation is detected.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
# Modified from https://docs.lightlytics.com/docs/github-action
name: simulation
on:
  push:
    branches:
      - main
  pull_request:

jobs:
  terraform-simulation:
    runs-on: ubuntu-latest
    name: Lightlytics
    steps:
      - uses: actions/checkout@v2
      - uses: hashicorp/setup-terraform@v1
        with:
          terraform_wrapper: false

      # Additional step added to setup AWS credentials
      - name: Setup Credentials
        run: |
          mkdir -p ~/.aws
          echo "[default]" > ~/.aws/credentials
          echo "aws_access_key_id = ${{ secrets.ACCESS_KEY }}" >> ~/.aws/credentials
          echo "aws_secret_access_key = ${{ secrets.SECRET_KEY }}" >> ~/.aws/credentials
      - name: Terraform Plan
        run: |
          terraform init
          terraform plan -lock=false -out ./terraform.plan
          terraform show -json ./terraform.plan > ./plan.json

      # Converted collection-token to GH secret
      - uses: lightlytics/publisher@v1.1
        id: ll-publisher
        with:
          plan-json: ./plan.json
          ll-hostname: ${{ secrets.LIGHTLYTICS_HOSTNAME }}
          collection-token: ${{ secrets.LIGHTLYTICS_TOKEN }}
          github-token: ${{ secrets.GH_TOKEN }}
...

Entering DevSecOps
#

This is valuable because you could enforce default protections on every new git repository you create (complete with security approvers). The security practitioner could then click on the link and review an easy-to-follow simulation outlining the impact radius before approving and merging the changes. Can you say DevSecOps?

Discovering Brownfield
#

Things have a habit of slipping through the cracks. Lightlytics Discovery decomposes all the dependencies between services, containers, or other common infrastructure you may have deployed. What if I need to find all EC2 instances that have internet ingress or egress in an unsupported design? I can narrow this criterion down using Search Paths in the Discovery dashboard. Using Internet as the source, I can set the Destination as a Resource Type (In this example, I use EC2).

Why use Lightlytics?
#

There are a lot of great open-source projects on the market that work by scanning Terraform HCL or Plan files that can accomplish most of what was outlined in this blog. Why would I turn to something like Lightlytics?

Barrier to Entry
#

There is a reason SaaS is the largest market segment (public cloud services). Lightlytics requires minimal setup while making quick and easy work of integrating with version control and adding immediate value with thoughtful change review. The more open-source you go, the more customization is required to get the desired results. Furthermore, you must take steps to understand the maintainers of an open-source project, how it gets funded, and possibly assess the risk of it potentially being abandoned.

Going beyond Terraform State
#

Many products I’ve tested for static code analysis are effective only with infrastructure that is managed in Terraform State. Lightlytics takes this further by simulating changes against the entire infrastructure contained in the AWS account. If I have a brownfield environment deployed outside of Terraform State, I want to make sure the influx of new changes doesn’t negatively impact it.

Agentless Approach
#

In reading my past blogs, you’ll know I’m partial to approaching as many problems as possible without agents or appliances, especially in the cloud. In the CSPM space, products generally use some combination of API calls, cloud logs, proprietary agents, or appliances. Lightlytics uses a combination of API calls along with integrations to cloud native features like CloudTrail and even VPC Flow Logs to add data-plane context.

With this approach, there is no reliance on scheduled or periodic scans of your infrastructure or git repositories. The posture is updated as changes happen. The following .gif was taken as I ran a terraform apply. Events were populating on the Lightlytics dashboard as they were completed in the Terraform plan output in real time. There is value here in that no gap exists between new infrastructure being provisioned and operational posture getting updated. This is what makes the Simulation piece compelling since you know that it will be running against a completely up-to-date picture of your entire infrastructure.

Why Agentless?

In some circumstances, agents are a must. This holds true especially when an in-depth perspective into an asset’s OS, kernel, and processes is required. As the shift to immutable infrastructure continues, the need for this is minimized as resources like VMs are not long-lived. Leveraging cloud native API calls and logging provides a seamless union allowing for better integration and correlation with native provider automation and enforcement mechanisms.

Conclusion
#

For infrastructure-as-code, taking a proactive approach by shifting-left and catching things in the build pipeline is the ultimate security. Today, many products tend to be reactive which seems to be the modus operandi of security. Lightlytics has created a solid foundation that can provide value to CloudOps and Cloud SecOps teams that want to go fast without leaving availability and security behind.

As of writing this, Lightlytics has support for AWS but has multi-cloud on the roadmap. In addition to supporting additional clouds, the team is working to incorporate the ability to enforce custom-made, industry best practices, and business logic (architectural standards) as part of the GitOps flow. You can learn more about the vision and team here.

Getting Started#

Adding an AWS Account#

A Common Scenario#

Some Risky Configuration#

Simulation with GitHub Actions Integration#

Entering DevSecOps#

Discovering Brownfield#

Why use Lightlytics?#

Barrier to Entry#

Going beyond Terraform State#

Agentless Approach#

Conclusion#

Related