Sources
453 sources collected
Despite Terraform’s wide adoption and impressive capabilities, the tool presents challenges that can directly affect time-to-market, operational efficiency, and ultimately, the organization’s bottom line. ... ## The Hidden Costs of Complexity in Terraform Configurations While Terraform simplifies infrastructure management by codifying resources into manageable scripts, it is a sophisticated, complex language. It is a powerful tool for Operations to manage an ecosystem, but a poor choice as a mechanism for external users to make modifications to an environment. When Operations takes the approach that users update Terraform files in a repository to make a change to the system, it introduces a steep learning curve for teams that may not be fully proficient in its intricate configurations. As with most automation tools, the devil is in the details. Developers often face challenges updating Terraform files, particularly when they lack deep expertise in both Terraform and the associated underlying infrastructure. Terraform often requires manual inspection of the change by Operations which runs in contrast to a self-service model, impacting business efficiency. For example, defining a virtual machine within Terraform requires knowledge of various configuration attributes, such as the virtual machine name (which might have character length constraints), resource limits like CPU allocations, and complex dependencies between different components, such as Kubernetes clusters, roles, and users. These details often require expertise beyond the scope of a developer’s primary role—especially if their focus is elsewhere, such as software development. Couple that with the need for manual inspection by Operations to ensure the request conforms to enterprise standards, the process becomes overly burdensome and time-consuming. … 1. **Operational Inefficiency**: Users can spend an inordinate amount of time learning Terraform-specific nuances and troubleshooting configuration errors. These time investments often lead to delays in deploying new resources, causing roadblocks in mission-critical projects and reducing overall operational efficiency. In an enterprise environment, where agility is key to maintaining a competitive edge, such delays can hinder an organization’s ability to meet market demands or launch new initiatives quickly. 2. **Increased Risk of Errors**: The complexity of Terraform configurations also increases the likelihood of human error. A single misconfiguration can cause critical system failures, trigger outages, or result in security vulnerabilities. For instance, misconfiguring Kubernetes resource limits could result in performance bottlenecks or, in the worst-case scenario, downtime for customer-facing applications. These risks not only affect service delivery but also damage an organization’s reputation and user trust. Even with manual inspection, there is still a risk, as humans are error prone. … While these checks and balances are important for maintaining infrastructure stability and security, they often create significant bottlenecks. Once a PR is submitted, developers are left waiting for the operations team to approve the change. During this waiting period, developers may engage in multiple back-and-forth conversations on collaboration tools like Slack, often having to resubmit PRs due to minor configuration errors that were overlooked. In many cases, this process turns into a frustrating cycle of trial and error, leading to prolonged delays. … 1. **Delays in Deployment**: The time spent waiting for approvals can significantly slow down the deployment of critical infrastructure, which can, in turn, delay the release of new products or features. In fast-paced industries like finance or e-commerce, where time-to-market is often the difference between leading or lagging competitors, these delays represent a serious business risk. … 1. **Security Vulnerabilities**: Terraform configurations can easily introduce security vulnerabilities if not carefully managed. Misconfigured access controls or user permissions, for instance, can expose sensitive data or provide unauthorized system access. In an era where cybersecurity is a key business concern, configuration vulnerabilities pose a significant threat, leading to data breaches, regulatory penalties, and reputational damage. 2. **Scalability Issues**: As organizations grow, infrastructure requirements increase in complexity. While Terraform is designed to manage large-scale environments, misconfigurations lead to performance issues that inhibit scalability. For example, improperly managing dependencies between cloud resources can create bottlenecks impacting new of new services or infrastructure deployments. 3. **Vendor Lock-In**: While Terraform is designed to be cloud-agnostic, its implementation can sometimes lead to inadvertent vendor lock-in. If your teams rely heavily on Terraform modules and resources specific to a particular cloud provider, migrating from one cloud provider to another becomes both difficult and expensive. The exposure is especially important for organizations prioritizing multi-cloud overall or, simply, the flexibility to switch providers based on cost or performance metrics.
jamesrcounts.com
Why Your Terraform Platform Isn't Scaling—and What to Do ...The production environment was a modern, automated marvel. The platform that powered it? A legacy ops bottleneck with no change control and no repeatability. It was frustrating, but more than that—it was dissonant. > I could build secure, repeatable landing zones with Terraform, but I couldn't automate the identity, pipelines, or secrets that made those zones possible in the first place.
### The Drawbacks of Terraform - **1. Drift in State Management**: Terraform must keep track of the current state of your resources. Managing this state file, in a codebase entirely separate from the application code, can be problematic in a large team setting – especially as individuals will often circumvent the process by making changes directly in the cloud provider's console. This often causes "drift", meaning the configuration file does not match reality. … - **3. Significant Effort in Environment Setup**: Configuration needs to be manually replicated for different environments, which is often time-consuming and error-prone. The scope involved often leads to teams relying on 1:1 duplicates of production for dev and staging environments. This causes expensive over-provisioning of these environments, significantly increasing cloud costs. - **4. Debugging Errors**: Debugging and error handling can be complex, especially with large deployments. - **5. Disconnect Between Developers and DevOps**: Developers, typically not versed in HCL, are often forced to rely on DevOps to provision resources, often slowing down the development process.
# Mastering Terraform State Management: Challenges and Solutions Despite its widespread adoption, Terraform state management is often cited as one of the most challenging aspects of using Infrastructure as Code (IaC). Surprisingly, a survey by HashiCorp revealed that over 50% of Terraform users have encountered state-related issues (HashiCorp, 2024), underscoring the often-overlooked technical complexities. … ### 2. State Corruption - **Technical Details**: Manual edits, file corruption, and concurrent modifications can all corrupt the state file. - **Implementation**: Can occur due to concurrent updates, failed deployments, or unexpected errors. - **Risks**: A corrupt state can result in Terraform being unable to manage infrastructure, causing downtime. … ### 5. Complex Workflows - **Technical Details**: Complex deployments require structured planning to avoid chaos. - **Implementation**: Requires modular, automated, and disciplined workflows. - **Risks**: Can cause delays and higher risk of deployment errors.
www.schibsted.pl
9 reasons why terraform is a pain, and 1 why you should still care - Schibsted Tech Polska## The pains ### 1. The evil state First thing you will complain about, when it comes to Terraform, is the fact that it’s stateful, and the implications it brings. I personally consider two issues that it brings: - the state has to be in sync with the infrastructure all the time – that also means that you have to go all-in when it comes to provisioning – i.e. no stack modifications can be made outside of the provisioning tool - you have to keep the state somewhere – and this has to be a secure location as state has to carry secrets … ### 2. Hard to start with the existing stack Back in the early days of Terraform, its issue tracker was full of complaints from people not being able to leverage Terraform with the existing stack. The reason for it was the fact, that Terraform was not able to incorporate it into the state (to my amazement, while looking for a sign of this, I’ve found my old PR that was trying to address that issue back then 😉 ). Fortunately, the import command was introduced, and this problem has been solved (at least at the system level). … ### 3. Complicated state modifications There is one additional thing that is a bit problematic when dealing with the state. While constantly refactoring your infrastructure definition, you may end up renaming resources (changing their identifiers) or moving them deeper into modules. Such changes are unfortunately hard for Terraform to follow, and leave it in a state where it doesn’t know that certain resources are simply misplaced. If you run … ### 4. Tricky conditional logic There are some people around the web who doesn’t like the fact that Terraform is not really an actual imperative programming language. To be perfectly honest I don’t share that opinion – I think the provisioning definition of the stack should be as declarative as it can – that leaves a lot less space for some deviations in the definitions. On the other hand, the conditional logic provided by Terraform is a bit tricky. For example to define a resource that is conditionally provisioned you make the resource to be a list, and use the count parameter to control it:
itnext.io
The Pains in Terraform CollaborationThe snags that may stall your Terraform adoption and what to do I divide Infrastructure as Code (IaC) into three categories. **Mark-up languages** like CloudFormation and ARM have simple format, but the body of code sprawls enormously with more objects lumped together. **Domain specific languages** such as Terraform’s HCL, feature flexible syntax and a mild dose of abstraction, creating a pleasant coding experience. Libraries that supports **general-purpose programming languages**, such as AWS CDK and Pulumi, are extremely powerful yet requiring serious programming proficiencies to tame the hyper-abstractions. … The open-source Terraform keeps states in workspaces. So we can address the first problem. However, workspace does not attempt to deal with the second and third problems. For that sake, I regard the workspace feature in open-source Terraform as half-baked. It misses too much. I have seen teams using variable files to store input per-workspace input variables. However, the input variables may contain secrets too. In addition, one more item to keep track over time, is whether each state remains consistent with the actual target resources (drift detection), which is also tricky. … There are many purpose-built extensions (GitHub, Azure DevOps) to facilitate Terraform installation and command invocation. However, as discussed, the real pain point with Terraform collaboration is the statefulness and consequent issues. Automation pipelines fall short in this regard, despite of its significant role in continuous integration in SDLC. Its scripting capability can virtually achieve any programmable task, but it is not fun to juggle with numerous code paths to deal with state logistics and stateful resources.
scalr.com
7. Advanced Patterns...As Terraform/OpenTofu structures become more complex with modules, multiple environments, split state, and potentially orchestration tools like Terragrunt, developers inevitably encounter challenging issues. ... Many common errors actually stem from neglecting the foundational best practices discussed earlier.
Terraform is a powerful infrastructure-as-code (IaC) tool, but many teams hit the same pain points as they scale: remote state management, secrets ending up in state, configuration drift, module sprawl, slow plans and applies, and safe promotion across environments. In this article, we’ll walk through 13 of the biggest Terraform challenges, with practical tips to help you build faster, safer workflows. … 1. State management at scale 2. Sensitive data ending up in state and plan artifacts 3. Preventing and detecting configuration drift 4. Taming the dependency graph and resource ordering 5. Provider versioning and upgrade surprises 6. Dealing with cloud API rate limits and eventual consistency 7. Managing multiple environments without chaos 8. Managing Terraform modules at scale 9. Refactoring without accidental destroy/recreate 10. Importing existing (brownfield) infrastructure 11. Performance bottlenecks in large plans and applies 12. Making changes safe: review, testing, and policy guardrails 13. Licensing and governance uncertainty ## 1. State management at scale Terraform state management gets tricky the moment your team and CI/CD start running Terraform in parallel. The `terraform.tfstate` file is Terraform’s “source of truth” for what it thinks exists. If two runs can write state at the same time (or the state is stored somewhere unreliable), you can end up with conflicting updates and painful recovery work. … ## 2. Sensitive data ending up in state and plan artifactstiTerraform is good at not splashing secrets all over your terminal, but that can create a false sense of safety. Even when the CLI shows `(sensitive value)`, the underlying state and plan data can still contain the real value, because Terraform needs a complete record of resource attributes to manage drift and future changes. State and plan files may include sensitive values like initial database passwords or API tokens — and local state is stored in plaintext by default. This becomes a real problem in CI/CD: It’s common to save `terraform plan -out=tfplan` and upload it as an artifact for a later apply job. That plan file can contain enough information to leak secrets if it’s accessible to the wrong people (or just ends up in the wrong place), turning “preview” artifacts into secret blobs you now have to secure like production credentials. … ## 4. Taming the dependency graph and resource orderingesTerraform builds a dependency graph to figure out resource ordering and run as much as possible in parallel, based mostly on the references it can “see” in your configuration. Trouble starts when the dependency is real but implicit: maybe a resource relies on a side effect (“this IAM policy must exist before that service can start”), or you’re passing IDs around as plain strings, so Terraform can’t infer the relationship. … Version constraints alone aren’t enough for reproducibility. Terraform uses constraints to decide what’s allowed and then records the exact chosen versions (plus checksums) in `.terraform.lock.hcl` so future runs make the same selections by default. If that lock file isn’t committed and consistently used, you can still get “works on my machine” drift between environments. … ## 6. Dealing with cloud API rate limits and eventual consistencyndSometimes your Terraform code is fine and the cloud just isn’t ready yet. Big applies can hit API throttling (429s / “Rate exceeded”) because Terraform is doing lots of create, read, and update calls at once — and most providers enforce per-account or per-region limits. Furthermore, many services are eventually consistent: The API accepts a change, but other endpoints won’t “see” it for seconds or minutes. … ## 12. Making changes safe: review, testing, and policy guardrails, At some point, the biggest risk isn’t “Terraform is wrong.” It’s that humans can’t reliably review what Terraform is saying. A plan with hundreds (or thousands) of changes is easy to rubber-stamp — and it’s hard to spot the one destructive action hiding in the noise. Correctness also isn’t just syntax. A configuration can be valid and still violate your organization’s rules (“no public S3,” “only these regions,” “no wide-open security groups”), or break module expectations in subtle ways. … ## 13. Licensing and governance uncertaintynsFor a lot of teams, “Terraform risk” isn’t technical — it’s licensing and governance. Terraform’s license changed to Business Source License 1.1 in August 2023, which created uncertainty for anyone redistributing Terraform, embedding it in products, or offering IaC as a hosted service. Many organizations can keep using Terraform internally, but the gray area is usually “Are we building something that could be considered competitive?” That question tends to trigger legal review and slow platform roadmaps. Governance adds a second layer: when a single vendor controls the roadmap, release cadence, and contribution process, teams need to plan for the possibility of future shifts (license terms, deprecations, feature direction) that ripple through their infrastructure workflow.
terramate.io
10 Biggest Pitfalls of Terraform - TerramateTerraform (or OpenTofu if you prefer open source) has emerged as a pivotal player in the evolving Infrastructure as Code (IaC) landscape, facilitating the management and provision of cloud resources through code. However, like any tool, it has drawbacks and tradeoffs. Challenges such as **managing multiple environments with workspaces**, **maintaining module versions** and **backend configurations**, and** managing resource lifecycles** often make Terraform code hard to read and prone to errors. Moreover, scaling can be cumbersome due to a lack of stack concept, leading to complications in more intricate environments. … ## 1. Terraform Workspaces Terraform Workspaces help you manage different environments, like staging, development, and production. However, they can be tricky to handle. For example, the code can be difficult to understand because you have to use the `count` parameter a lot to create resources based on conditions. Also, it gets harder when you want to scale or grow with Terraform Workspaces because you need to add more connections between them when managing different environments. … ## 2. Maintaining Module Versions In Terraform, a feature called the module block lets users use pre-set modules. But there's a problem with this block. The `source` and `version` attributes in this block, which are used to specify where the module comes from and which version of the module to use, don't allow for variable interpolation. Variable interpolation is replacing a placeholder in a string with its actual value. This limitation can cause trouble when you're trying to set up modules in a flexible or dynamic way. … ## 3. Hardcoding Backend Configuration When you’re working with Terraform, you might need to make copies of Root Modules, but this can cause unexpected problems if you’re not careful with the backend configuration. The backend configuration is where Terraform stores information about your infrastructure. If you copy the backend configuration without changing the `key` or `prefix` (which identifies the location of the stored information), it can cause problems. For example, you might end up with destructive Terraform Plans, which can potentially damage your infrastructure if the wrong state file (a file that keeps track of the status of your infrastructure) is referenced. … ## 4. Provider Config With Terraform, managing the provider configuration involves a lot of repetitive coding and manual work. The provider configuration is part of the code that tells Terraform how to interact with the service you’re using, like AWS or Google Cloud. Duplicating and manually managing this code can lead to mistakes and waste time. Here’s where Terramate can make things easier with its code generation feature. This feature can take a simple user configuration and generate more complicated provider configurations. This simplifies managing the provider configuration and reduces the duplicate code you need to write. … ## 7. Missing Stack Concept Terraform is unique in the world of IaC tools because it doesn’t have a stack concept. A stack is a collection of resources that are managed together. Instead, Terraform only focuses on what’s happening within a single directory, a root module. This can cause problems when dealing with bigger, more complex environments because it’s not designed to handle multiple collections of resources at once. … ## 8. Code Duplication In Terraform, when you want to use a module (which is a pre-set piece of code) multiple times, you have to copy the call to the module and the arguments (the specific instructions you give to the module) each time. This leads to repeated code, making your codebase larger and harder to maintain. … # 9. Monostacks If you’re managing a lot of resources (like virtual machines, databases, etc.) in Terraform, it can cause some problems. For example, if something goes wrong, it could affect many of your resources (this is known as a “big blast radius”). Also, executing plans and applying changes can take a long time when dealing with many resources. Additionally, if there are discrepancies or “drifts” in a single resource, it can prevent you from applying new changes. … ## 10. Deep Merging of Maps and Objects In Terraform, merging or combining maps and objects at multiple levels, also known as “deep merging”, is not allowed. A map is a collection of key-value pairs, and an object is a complex structure containing multiple data types. This limitation makes it hard to merge default configurations with user inputs. For instance, it’s difficult to create keys or attributes that conflict, and changing the value of an attribute in a nested structure is impossible. … ## Conclusion Terraform has played a key role in popularizing the concept of Infrastructure as Code, where you manage your IT infrastructure using code. However, it’s not without its challenges. These include issues like code that is hard to read, difficulty scaling with workspaces, problems maintaining versions of modules, the need to hardcode backend configurations and the complexity of managing the lifecycle of resources.
jonathan.bergknoff.com
Terraform Pain Points - Jonathan Bergknoff`state mv` can’t do it). Moving across state boundaries is harder still. While the documentation mentions moving to a different state file, there’s no support for hooking it up to an already-existing state in S3 (for example). The tool is not at all user friendly or convenient. The silver lining is that Terraform state is a simple JSON file, so it’s easy to write your own tooling around it. My team had occasion to do several refactors where we pulled individual projects’ resources out of a monolithic state and into their own states, once for each of our environments. Trying to orchestrate that with … Terraform’s `merge()` only performs a shallow merge. This is surprising behavior, and can lead to subtle bugs. You can work around it if you know about it, but the workarounds are often awkward. There’s an open PR adding a `deepmerge()` function. When anything in the map is “not known until after apply” (e.g. an attribute of a resource that hasn’t been created yet), the entire map is considered “not known until after apply”. For example, if our config map looks like … Had Terraform used an established programming language instead of HCL, maybe this time would have been spent on pushing the infrastructure-as-code ecosystem forward. As it is, Terraform’s core is developed slowly and there don’t seem to be any meaningful innovations on the horizon. The AWS provider has a rapid pace of development, seeing a release approximately once a week. However, there are many long-standing PRs, fixing important bugs and adding important features, which languish for months with no attention from maintainers (example, example, example, example, example). It’s a good project, but apparently not particularly well managed.
But it isn't all roses. Persistence is difficult in itself, and application development in particular -- automating open-ended interactions with stored state -- brings with it some unique challenges for developers and DBAs alike. Some of Postgres' properties and features help dev teams overcome these issues. Others exacerbate them. In this talk, we'll discuss the rise of the "accidental DBA" and the implications of vernacular schema architecture. We'll explore how development teams work with Postgres across all phases of the software development lifecycle. ... {ts:282} promises we make to our users, to our stakeholders are not about the database. They're about the whole system. The database is a core component. But it is not what users see. It's not what people think of themselves as interacting with. So compatibility and interface stability requirements also are at the level of the system, not the database. … And mistakes here can be very very difficult to take back. In particular, it can be really difficult for us to recognize that what we're doing demands us to be really active stewards of the data and to approach the requirements and {ts:458} complications or nuances that we encounter with a really truly critical eye. Finally, the team can be very unfamiliar with just how expressive data models can be or not realize how important certain elements are and that's how you end up with databases that don't have a single {ts:477} foreign key constraint. So, we'll tend to lean on the familiar the things that we know about. We'll prioritize discoverability and we'll avoid magic like triggers even when a trigger is the best you available solution for the specific problem. um a lot of the time we'll use lowest … But because we think from code, not at first from a position that prioritizes where the data are going to be most of the time, we don't make good decisions here. Uh we're not really set up for success when we start working with persistence. {ts:686} There are many problems that are not unique. probably most of them uh for instance if you're doing something geospatial you probably picked postgress because you know about postpostgis similarly PG vector is you know sort of a killer app for the um ML and vector embedding uh use case but there are a … Um this is all difficult and there are further difficulties. SQL injection is you know {ts:859} the perennial top contender for the most financially damaging application security vulnerability where you're just like interpolating user input directly into a statement and you don't escape it and somebody escapes it for you and then does a few more things on top. … {ts:1398} this is this works it's high effort it's high redundancy and you have a problem of minute variations that make a big difference um also if you you know once you start evolving the schema making some changes to how uh the application uh works with data uh those changes {ts:1417} either break your tests or they don't which is worse because your tests no longer guarantee what you think they do. … Uh we do also only get the first error. This record has many more problems. Um so we're going to have to iterate to solve you know this thing and get this record into the database for real. It is very good to fail fast. Uh we do feel it {ts:1533} a little bit more on the application side because we have long workflows and a lot of setup. … Could be the database, could be something in the code. We have to go on a bit of a journey to rule out the other things and then start figuring out what uh behavior we've encoded in the database is incorrect. And the more complicated the database side of the of the equation is, the more difficult this {ts:1571} gets. … They're very noisy. You have to restart the entire server to adjust them. So hope you got got it right in production the first time. Um, sometimes if especially if you're in a platform as a service context and you're in an {ts:1605} ephemeral environment and haven't wired up PG Badger, it's a pain to get to them.
The 2025 SO global developer survey results are fresh out, and PostgreSQL has become the most popular, most loved, and most wanted database for the third consecutive year. Nothing can stop PostgreSQL from consolidating the entire database world!