Importance of logging
We all wish that our applications would always run perfectly and that we would never need to investigate failures. But given that nowadays we are integrating with more and more third-party services, handling heterogeneous data, or hosting in environments that don’t offer 100% up-time, it is still necessary that we have a robust logging system. Capturing enough information about what our application is doing is crucial to discovering and fixing issues in a timely manner, and we may also have legal requirements around the traceability of data in our applications.
There is however a very tangible price to be paid in terms of log storage. With enough usage in the system over enough time, we may find that our logs will start to occupy many gigabytes of data. While this might be somewhat more acceptable for a solution that is installed on premise, more and more applications are being deployed in the cloud and the billing systems for these usually charge both for storage, but also for reads of data (fact check).
Minimizing log storage costs
Let’s take the example of an application that is deployed in AWS and uses CloudWatch to ingest log data. Logs that are generated by any given application are stored in log groups and, inside these, the logs are organized in log streams by either time period or request id.
Log groups can have a retention period configured, which ensures that any older logs are automatically deleted. This option can be set to anything from 1 day to 10 years, but by default it is set to none, meaning that logs will be saved indefinitely. One obvious step here is to make sure we configure an appropriate retention period to avoid paying for storage that is no longer needed. But what about scenarios in which we actually do need to save this data for legal/security requirements?
For cases like these, Cloudwatch offers the option of exporting log data to S3 and then transition them to a long-term, low-priced cold storage state, which can still be accessed, albeit not as quickly as regular logs. The export can be set up via Console, CLI, or SDK.
1️⃣ Step 1 – Create an S3 bucket
It is recommended to have a dedicated bucket for storing archived logs; this way, the access to the log data can be controlled better. Start with creating an S3 bucket set up for the same region as where your application and logs reside – this will minimize the export time.
2️⃣ Step 2 – Set up lifecycles on the bucket
S3 gives the ability to configure lifecycle transitions for objects in a bucket, with fine-grained control as to what prefixes are transitioned and how long after the objects are added to have them transition to a different storage class. There are several storage classes available, but for this scenario, we can go with Glacier, since this will meet the requirement of infrequent use and affordable storage.
3️⃣ Step 3 – Trigger export task
Once we have the bucket set up and configured, it is time to trigger the export of logs from the log groups. This is as easy as creating an export task in which we specify what range of logs to export and what bucket to export them to.
The export tasks can be triggered as frequently as needed, but it is worth taking into consideration the log retention period so that we don’t end up missing log data in the export. For automation purposes, we can create this task via a Lambda function that runs on a scheduler.
Once the export task has run, we will be able to access the logs from our S3 bucket, with some time considerations depending on the storage class we’ve chosen to transition to. For example, retrieval from Glacier Deep Archive will take longer than from Glacier.
It’s always important to take into consideration storage costs when dealing with large volumes of data, and, as we have seen, AWS offers appropriate mechanisms for minimizing these costs. And clients will be grateful that we are thinking about the costs they will incur over the long run.
About Iulia Dormenco
Iulia Dormenco has been a .NET developer for 8 years, with an interest in testing, quality and lately cloud service providers, especially AWS. Her desire to balance out soft and technical skills has led her to explore mentorship and Scrum master roles, while keeping a firm foothold in developing projects in areas such as insurance, fintech and healthcare.