Rainer Kaese is Senior Manager in Business Development for Enterprise Hard Drives at Toshiba Electronics Europe. Toshiba is one of the remaining three hard disk manufacturers and focuses on storage media for laptop, desktop, computer and enterprise. In an exclusive interview with Data & Storage Xpert (formerly cloudtech-xpert.com), Kaese talks about the end of individual data storage and the triumph of Software Defined Architecture.
Mr Kaese, data centres are being created everywhere, hosting in the cloud is becoming a matter of course, virtual desktops are being used more and more frequently. Is the age of individual data storage coming to an end and are we experiencing the swan song on the individual hard drive?
KAESE: The clear answer is yes and no. The individual hard drive for carrying around - as in the first iPod 15 years ago - has been replaced by flash memories. Many mobile devices, especially business laptops, are now semiconductor-based. There are no longer individual hard disks in the business sector for mobile devices - only in low-cost consumer segments do they still exist. Here, PCs with 1 TB hard disks are often still delivered.
This is because IT is generally not homogeneous. Especially today with BYOD, temporary work, co-working and other forms of organization, the working environment is becoming increasingly uneven. In this respect, the approach of a virtual desktop in large corporations sometimes works well. But there are also seasonal workers, employees who work from the beach, and of course small and medium-sized businesses. A purely virtual desktop infrastructure would not work everywhere. This inhomogeneity demands that we remain extremely flexible. This includes local storage, private and public clouds. These are closely interlinked.
New work and mobile working are certainly already lived reality for many companies. Nevertheless, there are quite a few whose employees are still very presence-oriented in the office. How can data storage be solved at the end point?
KAESE: Ideally, you should actually use a virtual desktop - without an individual hard drive. Many still have their own, because the understanding hasn't really got that far yet. Virtualization and the private cloud - the big trend - are the ideal solution in a homogeneous company with presence workstations. If anything, there are only small drives left for booting into end devices - mostly SSDs.
But even with virtual desktops, the data still has to be stored. The main focus here is on enterprise storage hard disks, on which the data is stored in consolidated form. No matter whether private cloud, public cloud or in the company's own data center. That's what we concentrate on at Toshiba.
How exactly does the transition to virtual desktops and the associated data storage work if the company decides to store it centrally itself?
KAESE: Typically, a proof of concept is required in advance due to various requirements: How quickly does a Boot Storm have to be processed in the morning? How high may the maximum latency be? What capacity requirements will be placed on storage today and in the future?
In any case, you need an SSD cache and large storage arrays. The latter provide the virtual desktops with the corresponding virtual hard disks. These are either - in small and medium-sized enterprises - Raid systems with 12 to 36 disks. With up to 14 TB of storage each, almost half a petabyte can be achieved. If the infrastructure becomes larger, it is typically scale-up architecture, such as ZFS. Here you don't have a hardware raid, but a software defined architecture. If the company continues to grow, you can simply plug in the systems. The data is then distributed automatically. This is suitable for companies that can estimate their storage requirements well today and in three years.
In addition, today there is a large proportion of companies that are not yet able to know their storage requirements - such as web-based business models a few years ago. Scale-out architectures are needed for such models. Here, the hard disks are no longer attached to a managed server via local hard disk cabling, but function with individual servers and storage consolidations in the internal network. With Scale-Out you can scale in any direction regardless of location.
You have already shown various options for avoiding overload in data storage. What other stumbling blocks do you see in hard disk and data center management?
KAESE: I spontaneously think of one stumbling block in particular: Until now, it was assumed that hard disks would fail. Today, however, the probability of failure is very low, but it can still happen. If a hard disk fails in a raid system, there is a warning and the hard disk is replaced.
A raid rebuild is difficult because the performance of the array drops. If the second hard disk fails, hopefully you have a stable system that can handle this load. With older arrays, however, the first hard disk is often the only one that fails. With some bad luck, a third fails - and then you have a real problem. With small hard disks this is a state of several hours. With up to 16 TB hard disks, it can take weeks during regular business operations for all data to be mirrored or restored. This is no longer portable.
That's why the market is moving towards software-defined approaches. Here, it is not necessary to restore an entire hard disk, but only the data that actually exists. With a raid system, blank spaces are also placed on the hard disk. If a software-defined approach only reaches ¼ of capacity, only this quarter is rewritten. In addition, there are more hard disks and more redundancies. A hard disk failure is therefore no longer so critical for Software Defined. And it will probably be the case in the future that a hard disk failure will not matter. Especially with scale-out systems, the data requirement is constantly increasing anyway, so that only additional hard disks will be connected to the system. Defective disks can then simply remain stuck.
You have just demonstrated the numerous advantages of Software Defined Storage. There is one question that comes to mind: What about disadvantages?
KAESE: We illustrated this at this year's CloudFest: We had all three approaches side by side. A server with a local raid, one with a ZFS-based scale-up system and a CEPH-based scale-out system. The software-defined approaches came from partners who got the same resources as in the raid system. In theory, they would have to be at least as fast. They almost got there - the rest was in the range of 10-30% of the transfer performance due to the management overhead of Software Defined.
If all three approaches are available, then of course you take the fastest, the local hardware raid. However, this is no longer suitable for larger cloud applications and data centers from about 80 to 100 TB. Then there are problems with rebuilds and manageability. Then you have to accept corresponding performance losses. Our partners at CloudFest have been able to reduce these to 5-10% through various optimizations of the software-defined approach.
Let's take a look at another section in Storage: It is often said that the problem is located in front of the computer. How great are the security risks resulting from individual data storage in the company? What measures can be taken to prevent this?
KAESE: What is stored on our hard drives and how it is stored depends on the data infrastructure. In my experience, the biggest problem with individual data storage is that no one knows what data is stored where. In addition, people don't get to clean up and delete their data. In addition, backups are often forgotten. A centrally stored offsite backup can be important here.
How would you ideally solve backups with individual data storage?
Cheese: My top tip would be to rely on onsite, offsite and cloud backups. This applies to large companies, SMEs and individual users alike. For home users, this means having a NAS that is not stored right next to the desktop PC and cannot be stolen directly. In addition an external hard disk. SMEs can, for example, make a quarterly backup and store this hard drive in a safe in the bank. After all, you should have a cloud backup. Especially with smaller providers, however, you should make sure that they don't go broke - because then the data could also be gone. It's virtually impossible for all three backups to go away at the same time.
This means that we are already in the middle of an issue of non-uniform systems. A few years ago, private and public clouds were still foreign terms to many. Today they are standard, but are increasingly being replaced by a hybrid approach. What advantages does this hybrid cloud offer companies?
KAESE: In a nutshell: Public cloud is quite expensive for extensive data storage. OPEX (Operational Expense) is definitely worth mentioning here. The costs for this storage are also underestimated because there are free and very inexpensive options in the private sector. However, if companies store hundreds of terabytes or even petabytes, this is very expensive in the public cloud. In addition, it takes far too long to load these data volumes into and out of the cloud.
The best tip is therefore to store large amounts of data in the private cloud. With CAPEX (Capital Expense) you can create suitable storage landscapes. And the public cloud is used as a hybrid solution for connectivity so that employees can work anywhere. The public cloud thus becomes a network and can be used selectively. In addition, the public cloud is of course worthwhile for companies that need to scale up at short notice.
SSD or HDD? Many companies ask this question for cost reasons when they upgrade their hard drive structure. But energy consumption, service life and reliability also play a role in this calculation. How can companies optimize these diverse requirements without neglecting business continuity?
KAESE: Everything that concerns active data - databases, boot SSDs, portable data that changes more frequently and whenever capacity isn't too high and performance is critical - should be stored on SSDs. Ideally even on NVMe SSDs, because with the SSD the connection via SATA/SAS is an anachronism when they should still be compatible with hard disks. Today with new greenfield systems you don't need this anymore, so you can connect the SSD directly to the CPU.
But then the best case is that 1 TB Enterprise Flash memory costs between 150 and 400 Euros. 1 TB hard disk costs just a few dozen Euros. The factor in the price difference is therefore 6 to 10 times per capacity. As soon as capacity is at stake and speed is no longer the primary metric, it must become a cost per capacity. The hard disk is by far the leader in this field today.
If one looks at the price curve of the past and projects the technology developed today into the future, one sees that the prices for both storage solutions drop straight and parallel to each other. It could be that the factor for the price difference will eventually fall by a factor of 2 to 4. But then SSDs will still cost more than twice as much per capacity. If other metrics predominate, such as power consumption, access speed, reliability, there are trade-offs.
Even if SSDs suddenly cost the same as hard disks, it would take several years to build up the capacity of SSDs to replace hard disks. To put this in perspective, by 2018 600 exabytes of hard disk capacity had been produced, installed and filled. That's 600 million terabytes. In the same time, 60 exabytes of SSDs - i.e. 10% - were produced. It is therefore impossible for SSDs and hard disks to fall on the same price. Hard disks are therefore still needed - just like tapes.
We produce so much data that we need all three carriers, tapes, hard drives and SSDs. Of course, the amount of SSD stored data is growing exponentially at the moment. Data on hard disks, on the other hand, increases linearly. In the future, however, much more will be needed here as well: Today, data is created when a person presses a key. I might be able to create an average of one keystroke per second, and my son might be able to create four per second while gambling. In the future, however, machines will be added via the IoT and produce data. They can produce many thousands of data packets per second. If you consider that, you will need all storage technologies into the distant future.
How do you help Toshiba companies find and use the right storage method?
KAESE: We support companies through training and proof of concept activities. If we're present at trade shows such as CloudFest, we'll redo proof of concept experiments from the lab. If we believe that such an experiment is universally valid, we turn it into a white paper. For example, we have collected more in-depth information on HDD- and SSD-based memories in this white paper.
Thank you very much for the insightful conversation, Mr. Kaese!
Read more about the right choice of HDD and flash memory in this Toshiba asset.