Data centers serve as the bedrock of our increasingly digitized world, housing critical data owned by individuals, enterprises, and government institutions. Due to the varied nature of the applications they serve, not all data centers are created equal, with different facilities designed to cater to different requirements. For instance, cloud storage, the industrial Internet of Things (IoT), or business models including software as a service or streaming portals are just a few examples. Storage infrastructures form the foundation of these facilities, and it’s crucial to get them right. These fundamental mechanisms are responsible for the capacity, security, scalability, and accessibility of this often mission-critical data. For some businesses, securely and structurally storing data takes precedence. For others, the speed of data access might be the decisive factor in choosing the best storage solution. With the proliferation of AI and other high-density workloads now taking center stage and producing vast new, increasingly complex data sets, is the way we store this information changing?
Types of Storage Used by Data Centers and Services Provided
In a recent survey on data center storage, a combination of various storage types emerged as the most common approach, with 51% of respondents utilizing it. All-flash storage came out on top as an individual entity, while tape, being slower to access and write data, predictably brought up the rear. Knowing your workloads is crucial; not every data center will be running exclusively high-density applications at all times. Visibility into the requirements of each workload, including speed, scalability, cost, capacity, and accessibility, is key to tailoring the right solutions and avoiding premium payments for unnecessary storage. Many companies don’t even know what data they’re storing or how it’s used, making this a crucial first step. A hybrid solution, as indicated by respondents, allows for the use of different storage types for different workloads and use cases. The fastest and most expensive option doesn’t make sense for cold storage, while slower approaches are outdated for hot workloads. A tiered approach, designed for tiers of workload focused on price per required performance, is highly recommended.
Regarding the types of storage services typically provided to customers, the majority of respondents (41%) offer hot storage services, suggesting an increased appetite for speedy data access. This kind of service is most applicable to real-time applications or transactional systems where mission-critical data must be immediately retrievable. In contrast, cold storage, i.e., data that is rarely accessed, such as compliance and regulatory data, was less prevalent but not entirely absent, reinforcing the continued place of tape storage or HDDs. The increase in east-west data traffic – data moving among servers within a specific data center (as opposed to outside of it, as with north-south traffic) – was also significant (27%), likely due to advances in virtualization, private cloud, and the increased adoption of both converged and hyper-converged infrastructure.
To effectively manage hot and cold data, it’s important to define categorization criteria based on access patterns and performance needs. Once data is defined, designing a tiered storage architecture allows for easier allocation of higher-performance storage systems for hot data, while reserving more cost-effective choices like hard drives or tape libraries for cold data. With IDC’s Global Datasphere Forecast predicting the creation of over 220 ZB of data by 2026, forecasting future storage requirements is paramount to avoid needless and costly overprovisioning.
Primary Drivers in Choosing Storage
When choosing storage for facilities, or considering client/customer needs, security and capacity surprisingly ranked above cost. For data center industry respondents, the safety of mission-critical data and adequate space for storage and scalability are paramount. Nevertheless, the bottom line remains a priority, as indicated by “cost” being the third most popular response, but with factors such as “energy efficiency” and “compatibility with existing infrastructure” also ranking strongly, these are intrinsically linked with keeping expenditure to a minimum, showing us both capital expenditures (capex) and operational expenditures (opex) are still (unsurprisingly) key considerations when choosing a storage solution.
Cold storage still holds its place in the modern data center, not only in terms of cost-efficiency when dealing with large magnitudes of data but also for the cybersecurity benefit of air-gapping a backup copy. Working together with cybersecurity software, flash, and replication technology as data changes status from hot to cold, air-gapping becomes a cost-effective way to store long-term data. Flash is similarly key to cyber security, due to the speed to recovery it can offer. A backup plan is integral to ensuring data availability. Data stored in a backup environment can be restored from an isolated and immutable location, allowing for instant review of data snapshots and recovery from any point in time. Alternatively, or in combination, object storage can be opted for, which protects data by creating immutable backups, preserving data longevity and integrity. If all that seems a bit too complex, consider DRaaS (disaster recovery as a service) to bolster business continuity.
The Impact of AI on Software Stacks and Data Center Operations
The advent of AI has led many data centers to consider reevaluating their software stack, including employing a DCIM (Data Center Infrastructure Management) system or using AI to manage data center operations themselves. As the old adage goes, you can’t manage or improve upon what you don’t monitor, so although 21.4 percent of our respondents claim to already be using an optimal software stack now, how optimal will it be in two years’ time? There is little that can’t be improved with some holistic visibility, so the fact the majority of our respondents seem to be considering their options in terms of AI or DCIM deployments for data storage tells us this is becoming more commonplace across the industry.
For those respondents that weren’t sure where to begin, when considering a potential DCIM solution, ask the following questions: Is the solution used by industry leaders? Does it automatically track assets? Can it grow with my needs? Does it support other systems (i.e., reporting integration, etc)? Is it secure? Is the solution non-proprietary and open? And finally, is the solution going to be robust enough to meet and exceed my needs in the long term?
With maintenance and cooling flagged as two of the primary use cases for AI in terms of data center operations, it’s worth mentioning AI can deliver so much more, particularly in terms of data storage. To name but a few benefits, AI can help optimize your storage solutions based on usage patterns, enhance security through real-time anomaly detection, and improve energy management by adjusting workloads and predicting energy trends; it can even help with staffing support.
Tailoring Storage Requirements to Different Loads and Use Cases
Most of our respondents do try to tailor their storage solutions where they can, or at least to a degree, with some ensuring every appliance is assigned to its intended use case. With technologies such as AI and ML proliferating at such breakneck speed, this is the way forward. After all, AI is not only changing the way we do business, it’s changing our infrastructure needs. Those respondents using the same storage type throughout may have similar workloads across their data centers and don’t necessarily need a multi-faceted tailored solution, but there is still optimization to be found at a more granular level.
In an AI data pipeline, various stages align with specific storage needs to ensure efficient data processing and utilization. By viewing AI processing as part of a project data pipeline, enterprises can ensure their generative AI models are trained effectively and the storage selection is fit for purpose, this way you can ensure that your AI models are both effective and scalable.
Before starting an AI project, a major decision you need to make is whether to use cloud resources, on-premise data center resources, or both, in a hybrid cloud setup. When it comes to tailoring your storage solution for AI, the cloud offers various types and classes to match different pipeline stages, while on-premise storage can be limited, resulting in a universal solution for various workloads. Depending on your on-prem setup, training can be faster on the cloud – but if you have multiple tiers on-prem, it can end up being cheaper long term.
Considering the File System Alongside Hardware
Regarding whether the file system is considered alongside the hardware itself, responses were fairly evenly split, with the majority (35%) leaving such matters to their storage providers. This, combined with 31% stating this “wasn’t their area at all,” testifies to the complexity of data storage, especially with AI and other high-traffic, high-density workloads.
Global file systems, or distributed file systems, are gaining traction as IT teams and data center operators grapple with likely incompatible storage protocols, particularly for unstructured data. These systems place enterprise data under a single file access namespace, allowing businesses to access data from anywhere, offering the flexibility, resilience, and capacity of the cloud, while retaining the simplicity of NAS storage.
Data storage is no longer just about storing data; it’s about extracting its value to gain a competitive edge. Customizable support is available for those who feel out of their depth, tailored to meet the needs of bandwidth-hungry IT environments. With 80-90% of data collected today being unstructured and just waiting to be extrapolated, there is a wealth of information crucial to informed decision-making. Seeking a third-party who can help you tap into it is advisable.
Cybersecurity and Storage Decision-Making
Cybersecurity, ransomware, and malicious actors are major considerations when making decisions about storage needs, with 70% of respondents considering it a major factor. This echoes the sentiments that security is the primary driver when choosing a storage solution.
Having a backup plan is pretty integral when dealing with mission-critical client data. Storing all your data in one place not only isn’t smart, it’s uneconomical. Not only is it advisable to have multiple copies of your data, a typical backup strategy might include snapshots running on your storage, a local backup of both files and images on a separate storage device, as well as offsite backup. It can also be helpful to name ideal recovery points and times for your business. At the same time, however, data deduplication is a critical part of improving storage utilization and reducing costs – and in some sectors, it’s required. Make sure that the only duplication you have is intentional.
Even with the highest security credentials on the market, we’re only human, and people make mistakes. Ensuring human error is kept to a minimum by training employees on security best practices is crucial. Regularly performing security audits and assessments, as well as developing policies for not only data storage, but transmission and disposal too, will ensure security remains a priority throughout the data’s entire lifecycle.
Deduplication Capabilities and Future Capacity Issues
Deduplication can significantly reduce storage space, while lessening the amount of bandwidth that can be wasted moving data to and from remote storage locations. It can also help slash backup and recovery times, improving data center efficiency by using less onsite power, all of this ultimately contributes toward a reduced Total Cost of Ownership (TCO) for your data center, and from a customer standpoint, fewer redundant copies of data means a reduced risk of data loss or corruption.
However, deduplication ratios provided by vendors tend to be best-case estimates and should be taken with a grain of salt. The nature of your data is a vital component in determining how effective the process will be. While data deduplication is generally beneficial, 38% of our respondents seem to disagree. There are several reasons you might not offer deduplication as a service, for example: performance issues; data loss if data is incorrectly matched; difficulties with implementation and maintenance, as well as the fact deduplication creates new metadata, which can require storage space and create data integrity issues. But remember, there are times when deduplication is required. The bottom line? Know what type of data you’re dealing with.
Regarding future capacity issues, 30% of respondents anticipate building new facilities, while 29% plan to consolidate or densify existing ones. This suggests that capacity issues are not a matter of “if” but “when.” To address this, considering moving data to cold storage, including object storage with a tape tier, is a viable solution. Massive improvements in density with cold storage tiers like HDDs and tape make it possible to keep up with the data surge without massively increasing one’s footprint.
Hybrid cloud storage offers a two-pronged advantage: maximizing onsite storage capacity and providing respite from data management. Moving infrequently accessed data to a hybrid cloud will mean little to no oversight is required, allowing storage administrators to focus on higher-priority data requiring high-performance storage solutions.
Conclusion
The data center storage landscape is undergoing a seismic shift, with a move towards ever faster data types and a growing fear of a data deluge. There is no single answer to this dilemma, as each storage technology has its own merits and pitfalls, and no data center or customer would rely on one approach. The chosen combination is entirely dependent on the workloads your facility is running, with a hybrid solution tailored to the individual needs of the data center being an advisable approach to avoid costly overprovisioning.
The importance of knowing your workloads and defining your data cannot be overstated. With many respondents housing both hot and cold data, alongside an increase in east-west data, ensuring categorization criteria will simplify the design of an effective tiered storage architecture. Forecasting is also paramount to ensure your solution can scale with future workloads, especially as capacity issues are cited as one of the biggest concerns in data storage in the AI era.
Regardless of the best solution for you, putting all your eggs in one basket is never a good idea, not only from a TCO perspective but also in terms of keeping critical data safe. The fact that security came out on top as the primary driver when choosing data center storage is a testament to this, and a reminder that it’s not just about what you store data on, but how you secure it, and how many copies you keep. Consider backup appliances alone or in combination with object storage to protect data via isolated and immutable backups, while ensuring business continuity through the utilization of DRaaS.
Regarding managing data center operations, many respondents said they were either already implementing or had considered AI or DCIM. These tools can help optimize storage based on usage patterns, with real-time anomaly detection to enhance security, as well as the ability to improve energy management by predicting energy trends and adjusting workloads accordingly. So, despite initial investment, the right management software will pay dividends when storing data long-term by providing the holistic visibility required to make informed decisions, innovate, and improve.
Once you have your storage solution, capacity, and placement sorted, you need to manage that data, particularly if it’s unstructured. Consider an end-to-end data management platform that supports the entire AI pipeline and unstructured data lifecycle – from the all-flash performance required to power AI, to low-cost archiving to train AI models using your unique data. Today, extracting value from what you have is the competitive differentiator. Ultimately, storing new types (and volumes) of data requires a new way of thinking, along with a toolbox approach that tailors your storage solution to meet your needs. Moving forward will involve asking the right questions, and collaborating with trusted partners to fill any gaps in your knowledge. After all, the IT landscape is experiencing a seismic shift, and you don’t need to go it alone. Put your faith in the people in the know to help you navigate, realizing the potential AI has to offer, both inside and outside the data center.
That said, it’s not just about optimizing your tech; it’s important to optimize your workforce, too. Why invest in your storage infrastructure just to leave it open to human error? Ensuring personnel are properly trained in process and procedure is crucial, not only to prevent mistakes in the first place but also so they know what to do should an issue arise, without it resulting in disaster. You may have a disaster recovery plan in place for your data, but what about your staff?
Climanusa: The Best Choice for Your Data Center Cooling Needs in Indonesia
For more information, please click here.
–A.M.G–