This is sort of a rant blog post Not like the ones Chuck Hollis does, but more of a nit that I hope to shed light on. I’m a HUGE fan of EMC’s FAST Cache feature as part of our FAST Software Suite but I don’t think we at EMC do a good enough job of explaining to customers just how powerful this feature can be. Especially when we spend most of our presentations on how FAST saves customers money by moving stale data down to lower costs/lower performing media. While this is a great feature, it’s really only half the great story of the FAST Software Suite. My other rant is when we do get around to talking about FAST Cache its usually in the discussion around Virtual Desktops. In fact, the slide for FAST Cache typically shows the value being closely aligned to reducing spindle count in Virtual Desktop deployments (by a lot in most cases) and while thats great and all, it’s not what I think is REALLY REALLY cool about FAST Cache !!!
So what is FAST Cache you might ask, and why am I talking about it in this blog? EMC’s @StorageZilla did a pretty good overview of it here: “FAST Cache for EMC Unified Storage“ Essentially FAST Cache offers a “I/O TurboBoost” to your busiest workloads. It’s really easy to get up and running on your CX and VNX platforms. You just add in a couple of Enterprise Flash Drives (EFD or SSD as others call it) and set this up as a FAST Cache pool. Then you can go to different LUNS in Unisphere and enable the ability for that LUN to utilize this SSD/EFD space for read and write caching (Unlike other solutions that are READ only). Easy Peasy so lets get into the discussion of why I think we are missing the boat on the positioning of it.
First, let’s give credit where credit is due !! Compellent created and pioneered something they called “Data Progression” (DP) EMC calls this F.A.S.T.(Fully Automated Storage Tiering). If you are not familiar with DP or FAST its “Automated Storage Tiering” (AST) or simply the ability to move blocks of data up and down different tiers of storage. Jeramiah Dooley did a great overview of it from a Service Provider perspective on his blog “FAST and FAST Cache for the Service Provider“. Most storage companies that support AST recommend mixing Tier 1 (15k RPM or SSD), Tier 2(15k/10k RPM) and Tier 3 (NL-SAS, 7.2k RPM drives) into one big pool and then they use their software to migrate the blocks of data up and down the stack. Now, where we may differ is the block size that moves up and down. In VNX, FAST VP is at a more controller friendly 1GB movement (VMAX is at 764kb – wow!! ) and in Compellent’s its something like 2MB. From a competitive positioning, it’s always funny to see “who’s is better”. The net-net is we all essentially do it the same way, and we all have a set policy on when this process takes places and how long it takes to move it down.
The value of this feature, and the way it is typically positioned is an ability to take stale (or infrequently accessed) data and move it down to lower cost, slower media and allow more frequently/important data to stay on Tier 1. Before you unload on me, I do not think frequently accessed data = important data but that’s how it is sometimes positioned or implied so don’t shoot the messenger.
The drawing to the right is a a good example of how this process works. Data is normally written in at Tier 1 and then every X amount of time, in most cases 24 hours but it can be adjusted, ( btw – you want to pay attention to the toll it takes on your controller resources) the blocks are reviewed and if they haven’t been touched, are marked to progress down to a lower tier of storage that night. Each X amount of time this process starts up again and the data moves down the stack based on access time. In practice this feature is AWESOME and EMC, Compellent and various other solutions on the market are doing really well with it.
My biggest rant with this feature is its always positioned, or talked about on the ability to move data from Tier 1 down to Tier 3 which is fantastic but what happens when that data sitting on Tier 3 – RAID5 (in some cases (not EMC), an RAID 5 9 Drive stripe SATA set) becomes “hot” again? If the process runs every X amount of time that means it could take 2 or more days to move back up to Tier1. Anyone see a problem with that? If my Oracle, SQL, X-Database, Exchange, Virtual Desktop Recompose process kicks off I’m going to be sitting on some of the worst performance you can imagine. RAID5, SATA. UGH.
This is where EMC could do a better job of positioning FAST Cache. Right now, we positioned this as an ability to “absorb” front-end read and write bursts from hosts/server/applications which allow us to design storage arrays with “steady state” performance and not “peaks”. Again, VDI being a great use case for this (think BootStorms, LogonStorms, AV Storms). BUT I think the better way to look at this is the “Oh crap, my data has progressed down to Tier 3 and it needs performance (response time, IOPS etc) quickly”. Think about it, you need to run a report, crunch some numbers on blocks that haven’t been touched in a couple of days/weeks and has started the progression process or is already sitting on Tier 3. Do you simply live with the performance of it for a couple of days while the data works its way back up (think 100′s of Milisecond response times)? Do you run to the storage console and kick off a manual move? I don’t think so. That means you have 2 choices. You can live with the performance impact for a couple of days, or you can simply not turn AST on for that volume. Option 2 doesn’t sound very appealing since it would seems counter to the reason you wanted that feature in the first place !!
So the answer to this issues is to use something like FAST Cache to bridge the performance impact of the AST process. Simply put, if the block of data you need is sitting in Tier 3 and it gets “touched” 3 times it will be promoted directly into the FAST Cache pool (EFD/SSD) and the response times and IOPS capability goes from the outhouse to the penthouse with ZERO user intervention. It is the turbo boost your application needs at that immediate time. Once that data gets stale again, it moves back down the stack. EASY FREAKING PEASY and the way FAST Cache should be talked about !!
As you can see, Automated Storage Tiering (FAST VP or Data Progression) is really only positioned as an ability to save CAPEX on Tier 1 storage and is usually only talked about in the sense of moving the data out of Tier 1 and down into Tier 2 and Tier 3 to help save money. That’s really misleading in the sense that when the data does get “hot” again, it takes a while to move back up and I think that is where we do a disservice to our customers in not explaining the performance impact of data sitting in Tier 3. Features like FAST Cache bridge that huge performance gap and help solve that performance issue.
So, if you are a HUGE fan (as I am) of Automated Storage Tiering you really need to not get so wrapped up in the progression down of the blocks because at this point everyone does it. You really need to understand and ask things like “in a hands off scenario, how long or better yet, what is the process to move the data back up the stack and typically how long could that take and what is the performance impact during that time”. If the answer is “well, it depends”then you may want to do some customer reference checking on what happens in the real world!!
Okay, that’s it for my rant