Deduplication – Christopher Kusek, Technology Evangelist

27Apr

HP StoreOnce D2D – Understanding the challenges associated with REALLY BAD NETAPP FUD

by Christopher Kusek (PKGuild) Avamar, Deduplication, emc, HP, NetApp, VNX, WTF

Hey guys! I was sitting here today, minding my own business… when the following tweet showed up in one of my search columns! (Why yes I do search on NetApp, and every major vendor in the Industry that I know a real lot about, I like to stay topical! oh and RT job opportunities… I know peoples ;))

#HP - Understanding the Challenges Associated with NetApp's Deduplication http://tek-blogs.com/a/sutt9r @TekTipsNetHawk

So I thought “Well Hey! I’d like to understand the challenges associated with NetApp’s Deduplication! Let’s get down to business!”

I click the little link which takes me to THIS PAGE where I fill out a form to receive my “Complimentary White Paper” ooh, yay! And let me tell you, other than the abusive form (Oh lovely… who makes people fill out FORMS for content.. yea I know, I know..) this thing looked pretty damn sweet! FYI: By sweet, I mean it looks so professional, so nice, like a solid Marketing Group got their hands on this and prettified it! I mean look at it!

HP StoreOnce D2D - Understanding the Challenges Associated with NetApp Deduplication - Business White Paper

Tell me that doesn’t look damn professional! Hell, I’d even at first pass with NO knowledge, take everything contained within that document at face value as the truth, I mean cmon let’s cover the facts here.

This whitepaper looks SWEET! It’s all logo’d out and everything too!
It’s only 8 pages; that speaks of SOLID content including not only text, but pictures and CITING evidence! Sweet right?!
And you said it; right there on the first page is says “BUSINESS WHITE PAPER” Tell me that does not spell PRO all over it.

So what I’m thinking is, clearly this has been vetted by a set of experts who have validated the data and ensured that it is correct; or at least within the context of the information consider the footer of this document claims to have been published January 2011. So this CLEARLY should be current.

Yea… No. Not Quite. Quite the opposite? I guess it may be time to explain though! But before I go there, Disclaimer time!

HP’s Disclaimer at the bottom of the document:

© Copyright 2011 Hewlett?Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

My Disclaimer for what you’re about to read:

I do not work for HP and I have nothing against HP. I do not work for NetApp and have nothing against NetApp. Yea I work for EMC – Wait, aren’t you the competition?! WHY ARE YOU RAGGING ON HP FOR THEIR POORLY WRITTEN PAPER?! I think that falls in line because, when *I* Publish something attacking NetApp’s deduplication I do the homework and validate it (Except for when I quote external third parties… Yea I don’t do that anymore because… you end up with a mess like this document that HP has released ;)) OMG Seriously?! Seriously HP!? You’ve spurned me to write this because you upset my competitive nature. With that said, let’s get down to brass tacks. Secondary Disclaimer: I had forgotten I read this originally when this post came out HP Launches an Unprovoked Attack on NetApp Deduplication and you know what? between seeing it circulate AGAIN and having me fill out a form… yea Sean following bad data with bad data is #fail either way. Tertiary Disclaimer; a lot of the ‘concerns’ and ‘considerations’ addressed in the HP Paper which they’re claiming StoreOnce is the bees knees can solve, are actually readily solved with Industry Best of Breed Avamar and Data Domain, let alone leveraging VNX Deduplication and Compression, but I won’t go there because that is outside of the boundaries of this particular post :)

The paper has been broken down into the following sections; “Challenge #, blah blah blah, maybe cited evidence, Takeaway” I plan to… give you the gist of the paper without quoting it verbatim (that’s like the paper itself!) but also not removing the context, and sprinkling commentary and sarcasm as needed ;)

Challenge #1: Primary deduplication: Understanding the tradeoffs

This section has a lot of blah blah blah in it, but I’ll quote two areas which have CITED references;

While some may find this surprising given the continuing interest in optimization technologies, primary deduplication can impose some potentially significant performance penalties on the network.1

Primary data is random in nature. Deduplicating data leads to various data blocks being written to multiple places. NetApp’s WAFL file system exasperates the problem by writing to the free space nearest to the disk head. Reading the data involves recompiling these blocks into a format presentable to the application. This data reassembly overhead mandates a performance impact, commonly 20–50 percent.2

I particularly love this section for two reasons; one it’s VERY solid in its choice of words “can impose” not will impose, but it’s like “maybe?!?” This it not a game of “can” I have a cookie vs “may I have a cookie”, this is a white paper right? Give me some facts to work off of guys. Oh, I said two reasons didn’t I. Well, here is Reason #2 – Here’s the citing! [1 End Users Hesitate on Primary Deduplication; TheInfoPro TIP Insight, October 21, 2010] I’ll chalk up to the possibility that I am clearly an IDIOT but I was unable to find the “Source” of this data. So… soft language… inability to validate a point, sweet!

But wait, let me discuss the second citing for a second, yea let me do that. I won’t go into WTF they’re saying in how they’re citing this as this is not an extensive and deep analysis of how WAFL and Data ONTAP operate but I decided “Whoa excellent backing data! Let me checking out that citing shall I?!” So I go to the source [2 Evaluator Group, August, 2010] and I find… I can pay, $1999 to get this data! Excellent! First idea which came to mind, “I should write stupid papers and then sell the data at MASSIVELY high costs.. nah I’ll stick to factual blog posts” Yea, so I’m 0 for 2 in being able to “Validate” whatever these sources happen to be sharing, I’m sure you’ll be in the same boat too. Oh but the best part? Let’s take a moment and read the Take Away, shall we?!

Takeaway – Deduplication is often the wrong technology for data reduction of primary storage.

OMG SERIOUSLY? THAT IS SERIOUSLY YOUR TAKEAWAY?! It’s like a cake made up of layers of soft language, filled it with unverifiable sources. And it’s not like this is even very GOOD FUD, it’s just so… Ahh!!!!!! A number of us (non-netappians) got so pissed off when we read this, I mean SERIOUSLY?!?

Relax.. Relax, it can’t get any worse than that right?

Challenge #2: Fixed vs. variable chunking

Wow this reads like an advertisement for Avamar. But seriously, this for the most part only discusses the differences between Fixed and Variable chunking, more educational than anything. Not a whole lot for me to discuss other than noting the similarities in their message to the Industry Leading Avamar.

Takeaway – Using variable chunking allows HP StoreOnce D2D solutions to provide a more intelligent and effective approach for deduplication.

Wow Christopher, you’re getting tame.. you let them slide on that one!

Challenge #3: Performance issues and high deduplication ratios

NetApp suffers performance issues with high deduplication ratios; something NetApp engineers said on a post to the NetApp technical forum.3

NetApp is so concerned about the performance of their deduplication technology that Chris Cummings, senior director of data protection solutions for NetApp told CRN customers must acknowledge the “chance of performance degradation when implementing the technology” should they turn on the technology.4

Okay, sweet! Let’s rock this out! Not only do they have CITED sources of this data (You know I love it when I have data to refer to!) but they even provide embedded links so I can click to go directly to the data! (WOOHOO!) And like any good detective… I did visit those links. It was upon visiting those two links that two things came back to me. “Hmm, Chris Cummings quote from 2008. Hmm, Forum conversation from 2009…” … Yea I was still AT NetApp during those two periods, OMG SERIOUSLY HP YOU’RE QUOTING DATA FROM 3 OR MORE YEARS AGO?!?! How can you NOT expect me to put that in caps? Let’s take a little journey down almost ANY product or dev company for a moment… I’d like to visit VMware in this particular scenario.

“VMware is great for Virtualization applications, Oh, but not Mission Critical Applications, it’s not stable for that. Do not virtualize mission critical applications”. Yea. you can almost QUOTE me as having said that. When would I might have said that? Maybe when VMware had GSX out (Pre-ESX days) and our computers were run with the power of Potatoes. Yea, if you have NO dev cycle and you do not invest in development [Oh no you didn’t make a slighted attack on the MSA/EVA! … No I didn’t ;)] But if you STOP development all things we’re discussing can absolutely be true! #WeirdAnecdoteOver

So, while I firmly agree in 2008 and 2009 there WERE Performance concerns the likes of which were discussed in those forums. Very viable, Deduplication in general was maturing, I’m sure every product out there had similar problems (Data Domain which scales based upon CPU – with 4 year old CPUs probably couldn’t perform as well as it can today with our super Nehelem’s etc) You need to realize it is 2011, we’re in an entirely new decade. Please stop quoting “Where’s the beef” or making “Hanging Chad” references like Ted Mosby in How I met your mother because while true at the time, not so applicable today.

Takeaway – HP typically finds 95 percent duplicate data in backup and deduplicates the data without impacting performance on the primary array.

I almost forgot the takeaway! (Hey! I’m verbose… You should know that by now!) So… what I’m hearing you say is… Because HP doesn’t have a native Primary Storage Deduplication solution like NetApp or EMC… there is no performance impact on the primary array! Hooray! Yea… WTF SEAN? I mean, I guess if I wanted I could repurpose most of this paper to position Avamar which seems a LOT more versatile than HP StoreOnce but okay, let’s move past!

I’m going to lump Challenge #4, #5 and #6 together because they have little to no place in this paper.

Challenge #4: One size fits all
Takeaway – Backup solutions are optimized for sequential data patterns and are purpose built. HP Converged Infrastructure delivers proven solutions. NetApp’s one?size?fits?all approach is ineffective in the backup and deduplication market.
Challenge #5: Backup applications and presentation
Takeaway – NetApp does not provide enough flexibility for today’s complex backup environments.
Challenge #6: Snapshots vs. backup
Takeaway – Snapshots are part of a data protection solution, but are incomplete by themselves. Long?term storage requirements are not addressed effectively by snapshots alone. HP Converged Infrastructure provides industry?leading solutions, including StoreOnce for disk?based deduplication for a complete data protection strategy.

I’m sorry, this is no contest and these points have absolutely no place in a paper educating on the merits and challenges of Deduplication with NetApp. This definitely has it’s place in a whole series of OTHER competitive and FUD based documents, but not here, not today.

In summary…

Sean… (Yes I know your name!) You wrote this paper for HP right? As a Technologist and Technology Evangelist for that matter, I would absolutely LOVE to learn about the merits, the values, the benefits of what the HP StoreOnce D2D solution brings to market and can do to solve customers challenges. But honestly man, this paper? I COMPETE with NetApp and you pissed me off with your fud slinging. I know *I* can piss off the competition when I sling (FACTS) so just think about it. We’re a fairly small community, we all know each other for the most part. (If you’re at Interop in a few weeks, I’ll be at EMCWorld, feel free to txt me and we can meet up and I won’t attack you, I promise ;)) Educate, but please do not release this kind of trash into the community… Beautiful beautiful trash mind you I mean everything I said about how amazingly this was presented, honestly BEST WHITE PAPER EVER. But that has got to be some of the worst most invalid content I’ve encountered in my life. (As applicable to how I stated it :))

I guess I should add a little commercial so someone doesn’t go WTF – I mean what I said above not only about the technologies which were discussed. If you think StoreOnce is a great solution, then you’ll be floored by Avamar and Data Domain. They’re not best of breed in the industry without good reason.

Feel free to comment as appropriate, it’s possible this has been exhausted in the past but SERIOUSLY I don’t want to see this again. ;)

Step one you say we need to talk, He walks you say sit down it’s just a talk, He smiles politely back at you, You stare politely right on through.

18Aug

Data Longevity, VMware deduplication change over time, NetApp ASIS deterioration and EMC Guarantee

by Christopher Kusek (PKGuild) Avamar, Celerra, CLARiiON, Deduplication, Efficiency, emc, NAS, NetApp, SQL, Storage, Virtualization, vmware, vSphere

Hey guys, the other day I was having a conversation with a friend of mine that went something like this.

How did this all start you might say?!? Well, contrary to popular belief, I am a STAUNCH NetApp FUD dispeller. What that means is, if I hear something said about NetApp by a competitor, peer, partner or customer which I feel is incorrect or just sounds interesting; I task it upon myself to prove/disprove it because well frankly… People still hit me up with NetApp questions all the time :) (And I’d like to make sure I’m supplying them with the most accurate and reflective data! – yea that’s it, and it has nothing to do with how much of a geek I am.. :))

Well, in the defense of the video it didn’t go EXACTLY like that. Here is a little background on how we got to where that video is today :) I recently overheard someone say the following:

What I hear over and over is that dedupe rates when using VMware deteriorate over time

And my first response was “nuh uh!”, Well, maybe not my FIRST response.. but quickly followed by; “Let me try and get some foundational data” because you know me… I like to blog about things and as a result collect way too much data to try to validate and understand and effectively say whatever I say accurately :)

The first thing I did was engage several former NetApp folks who are as agnostic and objective as I am to get their thoughts on the matter (we were on the same page!) – Data collection time!

For Data Collection… I talked to some good friends of mine regarding how their Dedupe savings have been over time because they were so excited when we first enabled it in the first place (And I was excited for them!) This is where I learned some… frankly disturbing things (I did talk to numerous guys named Mike interestingly enough, and on the whole all of those who I talked with and their data they shared with me reflected similar findings)

Disturbing things learned!

Yea I’ve heard all the jibber jabber before usually touted as FUD that NetApp systems will deteriorate over time in general (whether it be Performance, whether it be Space Savings) etc etc.

Well some of the disturbing things learned actually coming from the field on real systems protecting real production data was:

Space Savings are GREAT, and will be absolutely amazing in the beginning! 70-90% is common… in the beginning. (Call this the POC and the burn-in period)

As that data starts to ‘change’ ever so slightly as you would expect your data to change (not sit static and RO) you’ll see your savings start to decrease, as much as 45% over a year
This figure is not NetApp’s fault. Virtual machines (mainly what we’re discussing here) are not designed to stay uniformly the same no matter what in accordance to 4k blocks, so the very fact that they change is absolutely normal so this loss isn’t a catastrophe, it’s a fact of the longevity of data.

Virtual Machine data which is optimal for deduplication typically amounts to 1-5% of the total storage in the datacenter. In fact if we want to lie to ourselves or we have a specific use-case, we can pretend that it’s upwards of 10%, but not much more than that. And this basically accounts for Operating System, Disk Image, blah blah blah – the normal type of data that you would dedupe in the first place.

I found that particularly disturbing because after reviewing the data from these numerous environments… I had the impression VMware data would account for much more! I saw a 50TB SAN only have ~2TB of data residing in Data stores and of that only 23% of it was deduplicating (I was shocked!)
I was further shocked that when reviewing the data that over the course of a year on a 60TB SAN, this customer only found 12TB of data they could justify running the dedupe process against and of that they were seeing less than 3TB of ‘duplicate data’ coming in around 18% space savings over that 12TB. The interesting bit is that the other 48TB of data just continued on un-affected by dedupe. (Yes, I asked why don’t they try to dedupe it… and they did in the lab and, well it never made it into production)

At this point, I was even more so concerned. Concerned whether there was some truth to this whole NetApp starts really high in the beginning (Performance/IO way up there, certain datasets will have amazing dedupe ratios to start) etc. and then starts to drop off considerably over time, while the EMC equivalent system performs consistently the entire time.

Warning! Warning Will Robinson!

This is usually where klaxons and red lights would normally go off in my head. If what my good friends (and customers) are telling me is accurate, it is that not only will my performance degrade just by merely using the system, but my space efficiency will deteriorate over time as well. Sure we’ll get some deduplication, no doubt about that! But the long term benefit isn’t any better than compression (as a friend of mine had commented on this whole ordeal) With the many ways of trying to look at this and understand I discussed it with my friend Scott who had the following analogy and example to cite with this:

The issue that I’ve seen is this:

Since a VMDK is a container file, the nature of the data is a little different than a standard file like a word doc for example.

Normally, if you take a standard windows C: – like on your laptop, every file is stored as 4K blocks. However, unless the file is exactly divisible by 4K (which is rare), the last block has just a little bit of waste in it. Doesn’t matter if this is a word doc, a PowerPoint, or a .dll in the \windows\system32 directory, they all have a little bit of waste at the end of that last block.

When converted to a VMDK file, the files are all smashed together because inside the container file, we don’t have to keep that 4K boundary. Kind of like sliding a bunch of books together on a book shelf eliminating the wasted space. Now this is one of the cool things about VMware that makes the virtual disk more space efficient than a physical disk – so this is a good thing.

So, when you have a VMDK and you clone it – let’s say create 100 copies and then do a block based dedupe – you’ll get a 99% dedupe rate across those virtual disks. That’s great – initially. Netapp tends to calculate this “savings” into their proposals and tell customers that require 10TB of storage, that they can just buy 5TB and dedupe and then they’ll have plenty of space.

What happens is, that after buying ½ the storage they really needed the dedupe rate starts to break down. Here’s why:

When you start running the VMs and adding things like service packs or patches for example – well that process doesn’t always add files to the end of the vmdk. It often deletes files from the middle, beginning, end and then replaces them with other files etc. What happens then is that the bits shift a little to the left and the right – breaking the block boundaries. Imagine adding and removing books of different sizes from the shelf and making sure there’s no wasted space between them.

If you did a file per file scan on the virtual disk (Say a windows C: drive), you might have exactly the same data within the vmdk, however since the blocks don’t line up, the block based dedupe which is fixed at 4K sees different data and therefore the dedupe rate breaks down.

A sliding window technology (like what Avamar does ) would solve this problem, but today ASIS is fixed at 4K.

Thoughts?

If you have particular thoughts about what Scott shared there, feel free to comment and I’ll make sure he reads this as well; but this raises some interesting questions.

We’ve covered numerous things in here, and I’ve done everything I can to avoid discussing the guarantees I feel like I’ve talked about to death (linked below) so addressing what we’ve discussed:

I’m seeing on average 20% of a customers data which merits deduping and of that I’m seeing anywhere from 10-20% space saved across that 20%

Translation: 100TB of data, 20TB is worth deduping reclaiming about 4TB of space in total; thus on this conservative estimate you’d get about 4-5% space saved!
Translation: When you have a 20TB data warehouse and you go to dedupe it (You won’t) you’ll see no space gained, with a 100% cost across it.

With the EMC Unified Storage Guarantee, that same 20TB data warehouse will be covered by the 20% more efficient guarantee (Well, EVERY data type is covered without caveat) [It’s almost like it’s a shill, but it really bears repeating because frankly this is earth shattering and worth discussing with your TC or whoever]

For more great information on EMC’s 20% Unified Storage Guarantee – check out these links (and other articles I’ve written on the subject as well!)

I won’t subject you to it, especially because it is over 7 minutes long, but here is a semi funny (my family does NOT find it funny!) video about EMCs Unified Storage Guarantee and making a comparison to NetApp’s Guarantee. Various comments included in the description of the video – Don’t worry if you never watch it… I won’t hold it against you ;)

Be safe out there, the data jungle is a vicious one! If you need any help driving truth out of your EMC or NetApp folks feel free to reach out and I’ll do what I can :)

SPOILERS!!!

29Jun

EMC didn’t invent Unified Storage; They Perfected it

by Christopher Kusek (PKGuild) Celerra, CLARiiON, Deduplication, Efficiency, emc, FAST, Hyper-V, Microsoft, NAS, Oracle, SATA, SQL, SSD, Storage, Technology, Unified Storage, vmware, vSphere

Hi Guys! Remember me! I’m apparently the one who upset some of you, enlightened others; and the rest of you.. well, you drove a lot of traffic here to get my blog to even beat out EMC’s main website as the primary source for information on "Unified Storage" (And for that, I appreciate it :))

In case any of you forgot some of those "target" posts, here they are for your reference! but I’m not here to start a fight! I’m here to educate and to direct my focus on not what this previously OVERLY discussed Unified Storage Guarantee was or is, but instead to drive down in to what Unified Storage will really bring to bear. So, without further adieu!

What is Unified Storage?

I’ve seen a lot of definitions of what it is, quite frankly a lot of stupid definitions too. (My GOD I hate stupid definitions!) But what does it mean when you Unify to you and me? I could go on and on about the various ‘definitions’ of what it really is (and I even started WRITING that portion of it!) but instead I’m going to scrap all of that so I do not end up on my own list of ‘stupid definitions’ and instead will define Unified Storage at it’s simplest terms.

A unified storage system merges NAS and SAN. Optimized for performance and interoperability, the system simultaneously stores both file data and blocks of application data in virtually any operating environment

You can put your own take and spin on it, but at it’s guts that is seemingly what the basics of a "Unified Storage" system are; nothing special about it, NAS and SAN (hey, lots of people do that right?!) You bet they do! And this is by no way the definitive definition on what “Unified Storage” is, and frankly that is not my concern either. So taking things to the next level; now that we have a baseline of what it takes to ‘get the job done’, now it’s time to evaluate the Cost of Living in a Unified Storage environment.

Unified Storage Architecture Cost of Living

I get it. No really I do. And I’m sure by now you’re tired of the conversation of ‘uniqueness’ focused on the following core areas:

Support for Mixed Clients
Support for multiple types (tiers) of disk
Simplified Provisioning
Thin Provisioning
Improving Utilization

All of these items are simply a FACT and an expectation when it comes to a Unified Platform. (Forget unified, a platform in general) Lack of support of multiple tiers, locking down to a single client, complicated provisioning which can only be done fat which makes you lose out on utilization and likely is a waste of time – That my friend is the cost of living. You’re not going to introduce a wasteful fat obsolete system and frankly, I’m not sure of any (many) vendors who are actually delivering services which don’t meet on multiple of these criteria; So the question I’m asking is… Why do we continue to discuss these points? I do not go to a car dealership and say “You know, I’m expecting a transmission in this car, you have a transmission right?” And feel free to replace transmission with tires and other things you just flat out EXPECT. It’s time to take the conversation to the next level though; because if you’ve ever talked to me you know how I feel about storage. “There is no inherent value of storage in and of itself without context or application.” Thus… You don’t want spinning rust just for the sake to have it spin, no you want it to store something for you, and it is with that you need to invest in Perfection.

Unified Storage Perfection

What exactly is the idea of Unified Storage Perfection? It is an epic nirvana whereby we shift from traditional thinking and takes NAS and SAN out of the business of merely rusty spindles and enable and engage the business to earn its keep.

Enterprise Flash Disks

Still storage, yet sexy in it’s own right. Why? First of all, it’s FAST OMFG FLASH IS SO FAST! And second of all, it’s not spinning, so it’s not annoying like the latest and greatest SAS, ATA or FC disk! But what makes this particular implementation of EFD far sexier than simple consumer grade SSD’s is the fact that these things will guarantee you a consistent speed and latency through and through. I mean, sure it’s nice that these things can take the sheer number of FC disks you’d need to run an aggressive SQL server configuration and optimize the system to perform, but it goes beyond that.

Fully Automated Storage Tiering (FAST)

Think back to that high performance SQL workload you had a moment ago, there might come a time in the life of the business where your performance needs change; Nirvana comes a knocking and with the power of FAST enables you to dynamically, non-disruptively move from one tier of Storage (EFD, FC, SATA) to another, so you are guaranteed not only investment protection but scalability which grows and shrinks as your business does. Gone are the days of ‘buy for what we might use one day’ and welcome are the days of Dynamic and Scalable business.

FAST Cache

Wow, is this the triple whammy or what? Building upon the previous two points, this realm of Perfection is able to take the performance and speed of Enterprise Flash Disks and the concept of tiering your disks to let you use those same existing EFD disks to extend your READ and WRITE cache on your array! FAST Cache accelerates performance to address unexpected workload spikes. FAST and FAST Cache are a powerful combination, unmatched in the industry, that provides optimal performance at the lowest possible cost. (Yes I copied that from a marketing thingie, but it’s true and is soooooo cool!)

FAST + FAST Cache = Unified Storage Performance Nirvana

So, let’s put some common sense on this then, because this is no joke, nor is it marketing BS. You assign EFD’s to a specific workload you want to guarantee a certain speed and a certain response time (Win). You have unpredictable workloads who may need to be fast some times, but may be slow other times on quarterly of yearly basis’s, so you leverage FAST to move that data around, but that’s your friend when you can PREDICT what is going to happen. What about when it is slow most of the time, but then on June 29th you make a major announcement that you were not expecting to hit as hard as it did, and BAM! Your system goes in the tank because data sitting on FC or SATA couldn’t handle the load. Hello FAST Cache, how I love you so. Don’t get me wrong, I absolutely LOVE EFD’s and I wish all of my data could sit on them (At home a lot of it does ;)) and I have massive desire for FAST because I CAN move my workload around based upon predictable or planned patterns (Marry me!) But FAST Cache is my superman, because he is there to save the day when I least expected it, he caches my reads when BOOM I didn’t know it was coming, but more importantly he holds my massive load of WRITES which come in JUST as unexpectedly. So for you naysayers or just confused ones who wonder why you’d have one vs the other (vs) the other; Hopefully this example use-case is valuable. Think about it in terms of your business, you could get away with one or the other, or all three… Either way, you’re a winner.

Block Data Compression

EMC is further advancing its storage efficiency innovation as the first storage provider to introduce block data compression, by allowing customers to compress inactive data and reclaiming valuable storage capacity— data footprints can be reduced by up to 50 percent. A common use case would be compressing inactive data once EMC FAST software has moved that data to the most cost-effective storage tier. Block data compression joins EMC’s existing capabilities, including thin provisioning and data deduplication, to automatically and transparently maximize storage utilization.

Yea, I DID copy that verbatim from a Press Release – And do you know why? Because it’s right! Even addresses a pretty compelling use-case too! So think about it a moment. Does this apply to you? I’d never compress ALL of my data (reminisces back to the days of DoubleSpace where let’s just say, for any of us who lived it… those were interesting times ;)) But think about the volume of data which you have sitting on Primary Storage which is inactive and otherwise wasting space when it continues sitting un-accessed and consuming maximum capacity! But this is more than just about that data type, unlike some solutions this it not an all or nothing.

Think if you could choose to compress on demand! Compress say… your virtual machine right out of vCenter! But wait there’s more! And there’s so much more to say on this, let alone the things which are coming.. I don’t want to reveal what is coming, so I’ll let Mark Twomey do it where he did it here: Storage Services for Clariion Storage Pool LUNs

What does all of this mean for me and Unified Storage?!

Whoa, hey now! What do you mean what does all of this mean?! Are you cutting me short? Yes. Yes I am. :) There are some cool things coming, which I cannot talk about yet… and not to mention some of all of the new stuff coming in Q3 – But things I was talking about… that’s stuff I can talk about –TODAY- there’s only even better things and cake coming tomorrow :)

I can fill this with videos, decks, resources, references, Unisphere and every thing under the sun (You let me know if you really want that.. I’ve done that in the past as well) But ideally, I want you to make your own decision, come to your own conclusions.. What does this mean for you? Stop asking “What is Unified Storage” and start asking “What value can my business derive from technologies in order to save money, save time, save waste!” I’ll try to avoid writing yet another article on this subject unless you so demand it! I look forward to all of your comments and feedback! :)

20May

EMC 20% Unified Storage Guarantee !EXPOSED!

by Christopher Kusek (PKGuild) Celerra, CLARiiON, Deduplication, Efficiency, emc, FAST, NetApp, SSD, Storage

The latest update to this is included here in the Final Reprise! EMC 20% Unified Storage Guarantee: Final Reprise

For those of you who know me (and those who don’t, hi! Pleased to meet you!) I spent a lot of time at NetApp battling the storage efficiency game, always trying to justify where all of the storage space went in a capacity bound situation. However since joining EMC, all I would ever hear from the competition is how ‘space inefficient’ we were and frankly, I’m glad to see the release of the EMC Capacity Calculator to let you decide for yourself where your efficiency goes. Recently we announced this whole "Unified Storage Guarantee" and to be honest with you, I couldn’t believe what I was hearing. So I decided to take the marketing hype, set it on fire and start drilling down into the details, because that’s the way I roll. :)

I decided to generate two workload sets side by side to compare what you get when you use the Calculators

I have a set of requirements – ~131TB of File/Services data, and 4TB of Highly performing random IO SAN storage

There is an ‘advisory’ on the EMC guarantee that you have at least 20% SAN and 20% NAS in order to guarantee a 20% space efficiency over others – So I modified my configuration to include at least 20% of both SAN and NAS (But let me tell you, when I had it as just NAS.. It was just as pretty :))

Using NetApp’s Storage Efficiency Calculator I assigned the following data:

That seems pretty normal, nothing too out of the ordinary – I left all of the defaults otherwise as we all know that ‘cost per TB’ is relative depending upon any number of circumstances!

So, I click ‘Calculate’ and it generates this (beautiful) web page, check it out! – There is other data at the bottom which is ‘cut off’ due to my resolution, but I guarantee it was nothing more than marketing jibber jabber and contained no technical details.

So, taking a look at that – this is pretty sweet, it gives me a cool little tubular breakdown, tells me that to meet my requirements of 135TB I’ll require 197TB in my NetApp Configuration – that’s really cool, it’s very forthright and forth coming.

What’s even cooler is there are checkboxes I can uncheck in order to ‘equalize’ things so to speak. And considering that the EMC Guarantee is based upon Useable up front without enabling any features! Let me take this moment to establish some equality for a second.

All I’ve done is uncheck Thin Provisioning (EMC can do that too, but doesn’t require you to do that as part of the Guarantee, because we all know… some times… customers WON’T thin provision certain workloads, so I get it!) Also turning off deduplication, just so I get a good feel for how many spindles I’ll be eating up from a performance perspective – And turning off dev/test clone (which didn’t really make much difference since I had little DB in this configuration)

Now, through no effort of my own, the chart updated a little bit to report that NetApp now requires 387TB to manage the same workload a second ago required 197TB. That’s a little odd, but hey, what do I know.. This is just a calculator taking data and presenting it to me!

Now… with the very same details thrown into the EMC Capacity Calculator, lets take a look at how it looks.

According to this, I start with a Raw Capacity of ~207TB and through all of the ways as defined on screen, I end up with 135TB Total usable, with at least 20% SAN and about twice that in NAS – Looks fairly interesting, right?

But lets take things one step further. Let’s scrap Snapshots on both sides of the fence. Throw caution in to the wind.. No snapshots.. What does that do to my capacity requirements for the same ~135TB Usable I was looking for in the original configurations.

On the NetApp side I reclaim 27TB of Useable space (to make it 360TB Raw)- while on the EMC side I reclaim 15TB of useable space [150TB Useable now] while Still 207TB Raw.

But we both know the value of having snapshots in these file-type data scenarios, so we’ll leave the snapshots enabled – and now it’s time to do some math – Help me as I go through this, and pardon any errors.

Configuration	NetApp RAW	NetApp Useable	Raw v Useable %	EMC RAW	EMC Useable	Raw v Useable %	Difference
*FILE+DB*
Default Checkboxes	197 TB	135 TB	68%	207 TB	135 TB	65%	-3%
Uncheck Thin/Dedup	387 TB	135 TB	35%	207 TB	135 TB	65%	+30%
Uncheck Snaps	369 TB	135 TB	36%	207 TB	150 TB	72%	+36%

However, just because I care (and I wanted to see what happened) I decided to say "Screw the EMC Guarantee" and threw all caution to the wind and decided to compare a pure-play SAN v SAN scenario, just to see how it’d look.

I swapped out the numbers to be Database Data, Email/Collaboration Data – The results don’t change (Eng Data seems to have a minor 7TB Difference.. Not sure why that is, – feel free to manipulate the numbers yourself though, it’s negligible)

And I got this rocking result! (Yay, right?!) 202TB seems to be my requirement with all the checkboxes checked! But this is Exchange and Sharepoint data (or notes.. I’m not judging what email/collab means ;))… I’m being honest and realistic with myself, so I’m not going to thin provision or Dedup it any way, so how does that change the picture?

It looks EXACTLY the same [as before]. Well, that’s cool, at least it is consistent, right?

However, doing the same thing on the EMC side of the house.

I want to note a few differences in this configuration – I upgraded to a 480 because I used exclusively 600GB FC drives as I’m not even going to lie to myself that I’m humoring my high IO workloads on 2TB SATA Disks – If you disagree you let me know, but I’m trying to keep it real :)

RAID5 is good enough with FC disks (If this was SATA I’d be doing best practice and assigning RAID6 as well, so keeping it true and honest) And it looks like this:

(Side Note: It looks like this SAN Calculation has only 1 hot spare declared instead of the 6 used above in the other configuration – I’m not sure why that is, but I’m not going to consider 5 disks as room for concern so far as my calculations go – it is not reflected in my % charts below – FYI! I fixed the issue and introduced 6 Spare disks. I also changed the system from 14+1 R5 sets to 4+1 and 8+1 R5 sets which seems to accurately reflect more production like workloads :))

Whoa, 200TB Raw Capacity to get me 135TB Usable? Whoa, now wait a second. (says the naysayers) You’re comparing RAID5 to RAID6 – that’s not a fair configuration because there is definitely going to be a discrepancy! And you have snapshots enabled too for this workload. (Side note: I do welcome you to compare RAID6 in this configuration, you’ll be surprised :))

I absolutely agree – so in the effort of equalization – I’m going to uncheck the Double Disk Failure Protection from the NetApp side (Against best practices, but effectively turning the NetApp configuration into a RAID4 config) and I’ll turn off Snapshot copies to be a fair sport.

There, it’s been done. The difference is.. That EMC RAW Capacity has stayed the same(200TB) while NetApp raw capacity has dropped considerably by 30TB from 387TB to 357TB. (I do like how it reports "Total Storage Savings – 0%" :))

So, what does all of this mean? Why do you keep taking screen caps, ahh!!

This gives you the opportunity to sit down, configure what it is YOU want, get a good feel for what configuration feels right to you and be open and honest with yourself and said configuration.

No matter how I try to swizzle it, I end up with EMC coming front and center on capacity utilization from RAW to Usable – Which down right devastates anything in comparison. I do want to qualify this though.

The ‘guarantee’ is that you’ll get 20% savings with both SAN and NAS. Apparently if I LIE to my configuration and say ‘Eh, I don’t care about that’ I still get OMG devastatingly positive results of capacity utilization. – So taking the two scenarios I tested in here and reviewing the math..

Configuration	NetApp RAW	NetApp Useable	Raw v Useable %	EMC RAW	EMC Useable	Raw v Useable %	Difference
*FILE+DB*
Default Checkboxes	197 TB	135 TB	68%	207 TB	135 TB	65%	-3%
Uncheck Thin/Dedup	387 TB	135 TB	35%	207 TB	135 TB	65%	+30%
Uncheck Snaps	369 TB	135 TB	36%	207 TB	150 TB	72%	+36%

*EMAIL/Collab*
Default Checkboxes	202 TB	135 TB	67%	200 TB	135 TB	68%	+1%
Uncheck Thin/Dedup	387 TB	135 TB	35%	200 TB	135 TB	68%	+33%
Uncheck RAID6/Snaps	357 TB	135 TB	38%	200 TB	151 TB	76%	+38%

When we’re discussing apples for apples – We seem to be meeting the guarantee whether NAS, SAN or Unified.

If we were to take things to another boundary, out the gate I get the capacity I require – If I slap Virtual Provisioning, Compression, FAST Cache, Auto-Tiering, Snapshots and a host of other benefits that the EMC Unified line brings to solve your business challenges… well, to be honest it looks like you’re coming out on top no matter what way you look at it!

I welcome you to ‘prove me wrong’ based upon my calculations here (I’m not sure how that’s possible because I simply entered data which you can clearly see, and pressed little calculate buttons… so if I’m doing some voodoo, I’d really love to know)

I also like to try to keep this as realistic as possible and we all know some people like their NAS only or SAN only configurations. The fact that the numbers in the calculations are hitting it out of the ballpark so to speak is absolutely astonishing to me! (Considering where I worked before I joined EMC… well, I’m as surprised as you are!) But I do know the results to be true.

If you want to discuss these details further, reach out to me directly (christopher.kusek@emc.com) – or talk to your local TC (Or your TC, TC Manager and me in a nicely threaded email ;)) – They understand this rather implicitly.. I’m just a conduit to ensure you folks in the community are aware of what is available to you today!

Good luck, and if you can find a way to make the calculations look terrible – Let me know… I’m failing to do that so far :)

!UPDATE! !UPDATE! !UPDATE! :) I was informed apparently every thing is not as it seems? (Which frankly is a breath of relief, whew!)

Latest news on the street is, apparently there is a bug in the NetApp Efficiency Capacity Calculator – So after that gets corrected, things should start to look a little more accurate, let me breathe a sigh of relief around that, because apparently (after being heavily slandered for ‘cooking the numbers’) the only inaccuracy going on there [as clearly documented] was in the source of my data.

However, being that I’m not going to go through and re-write everything I have above again, I wanted to take things down to their roots, lets get down into the dirt, the details, the raw specifics so to speak. (If any thing in this chart below is somehow misrepresented, inaccurate or incorrect, please advise – This is based upon data I’ve collected over time, so hash it out as you feel :))

NetApp Capacity	GB	TB	EMC Capacity	GB	TB	GB Diff	TB Diff	% Diff

Parity Drives	4000	3.91	Parity Drives	4000	3.91	0	0
Hot Spares	1000	0.98	Hot Spares	1000	0.98	0	0
Right Sizing	3519	3.44	Right Sizing	1822.7	1.78	1696.3	1.66
WAFL Reserve	2045.51	2	CLARiiON OS	247.87	0.24	1797.64	1.76
Core Dump Reserve	174.35	0.17	Celerra OS	60	0.06	114.35	0.11
Aggr Snap Reserve	863.06	0.84		0	0	863.06	0.84
Vol Snap Reserve – 20%	3279.62	3.2	Check/Snap Reserve 20%	3973.89	3.88	-694.27	-0.68
Space Reservation	0	0		0	0	0	0

Usable Space	13118.5	12.8	Usable Space	16895.54	16.49	-3777.04	-3.69	+23%
Raw Capacity	28000	27.34	Raw Capacity	28000	27.34	0	0

What I’ve done here is take the information and tried to ensure each one of these apples are as SIMILAR as possible.

So you don’t have to read between the lines either, let me break down this configuration – This assumes 28 SATA 1TB Disks, with 4 PARITY drives and 1 SPARE – in both configurations.

If you feel that I somehow magically made numbers appear to be or do something that they shouldn’t – Say so. Use THIS chart here, don’t create your own build-a-config workshop table unless you feel this is absolutely worthless and that you truly need that to be done.

You’ll notice that things like Parity Drives and Hot Spares are identical (As they should be) Where we start to enter into discrepancy is around things like WAFL Reserve, Core Dump Reserve and Aggr Snap Reserve – Certainly there are areas of overlap as shown above and equally the same can be said of areas of difference, which is why in those areas on the EMC side I use that space to define the CLARiiON OS and the Celerra OS. I did have the EMC Match the default NetApp Configuration of a 20% vol snap reserve (on the EMC side I call it Check/Snap Reserve) [Defaults to 10% on EMC, but for the sake of solidarity, what’s 20% amongst friends, right?] (On a side note, I notice that my WAFL Reserve figures might actually be considerably conservative as a good friend gave me a dump of his WAFL Reserve and the result of his WAFL Reserve was 1% of total v raw compared to my 0.07% calculation I have above, maybe it’s a new thing?)

So, this is a whole bunch of math.. a whole bunch of jibber jabber even so to speak. But this is what I get when I look at RAW numbers. If I am missing some apparent other form of numbers, let it be known, but let’s discuss this holistically. Both NetApp and EMC offer storage solutions. NetApp has some –really- cool technology. I know, I worked there. EMC ALSO has some really cool technology, some of which NetApp is unable to even replicate or repeat. But before we get in to cool tech battles, as we sit in a cage match watching PAM duel it out with FAST-Cache, or ‘my thin provisioning is better than yours’ grudge matches. We have two needs we need to account for.

Customers have data that they need to protect. Period.

Customers have requirements of a certain amount of capacity they expect to get from a certain amount of disks.

If you look at the chart closely, there are some OMFG ICANTBELIEVEITSNOTWAFL features which NetApp brings to bear, however they come at a cost. That cost seems to exist in the form of WAFL Reserve, and Right sizing (I’m not sure why the Right Sizing is coming in a considerably fat consideration when contrasted with how EMC does it, but it apparently is?) So while I can talk all day long about each individual specific feature NetApp has, and equivalent parity which EMC has in that same arena; I need to start somewhere. And strangely going back to basics, seems to come to a 23% realized space savings in this scenario (Which seems inline with the EMC Unified Storage Guarantee) Which frankly, I find to be really cool. Because like has been resonated by others commenting on this ‘guarantee’, what the figures appear to be showing is that the EMC Capacity utilization is more efficient even before it starts to get efficient (through enabling technologies).

Obviously though, for the record I’m apparently riddled with Vendor Bias and have absolutely no idea what I’m talking about! [disclaimer: I have no idea what I’m talking about when I define and disclose I am in this post and others ;)] However, I’d like to go on record based upon these mathematical calculations, were I not an employee of EMC, and whether I did or did not work for NetApp in the past, I would have come to these same conclusions independently when presented with these same raw figures and numerical metrics. I continue to welcome your comments, thoughts and considerations when it comes to a Capacity bound debate [Save performance for another day, we can have that battle out right ;)] Since this IS a Pureplay CAPACITY conversation.

I hope you found this as informative as I did taking the time to create, generate, and learn from the experience of producing it. Oh, and I hope you find the (unmoderated) comments enjoyable. :) I’d love to moderate your comments, but frankly… I’d rather let you and the community handle that on my behalf. I love you guys, and you too Mike Richardson even if you were being a bit snarky to me. {Hmm, a bit snarky or a byte snarky… Damn binary!} Take care – And Thank you for making this my most popular blog-post since Mafia Wars and Twitter content! :)

The latest update to this is included here in the Final Reprise! EMC 20% Unified Storage Guarantee: Final Reprise

30Dec

Avamar Support Super Site! The ultimate source of source deduplication mayhem!

by Christopher Kusek (PKGuild) Avamar, Deduplication, emc, Storage

You may remember my rocking consolidated blog posts for Symmetrix FAST and Celerra FAST – But here is something I didn’t even have to create myself! This is a pure total rocking consolidation for Avamar! Wowza is the first thing I’d say, and for what it’s worth, I’ve seen the internal version of the same site – Believe me when I say, the customer facing version you see below here is WAAAAAAAAAYYYYYY better! Seriously ! Check it out, and if you don’t think this is totally rocking, I’ll one-up it and do it even better! ;)

The link may require credentials on EMC’s PowerLink – so please keep that in mind when it comes to accessing the site!

So, check it out! This will be your best friend when it comes to working with Avamar, and solving your backup problems and other for the future! :)

Category Deduplication

HP’s Disclaimer at the bottom of the document:

My Disclaimer for what you’re about to read:

Challenge #1: Primary deduplication: Understanding the tradeoffs

Takeaway – Deduplication is often the wrong technology for data reduction of primary storage.

Challenge #2: Fixed vs. variable chunking

Takeaway – Using variable chunking allows HP StoreOnce D2D solutions to provide a more intelligent and effective approach for deduplication.

Challenge #3: Performance issues and high deduplication ratios

Takeaway – HP typically finds 95 percent duplicate data in backup and deduplicates the data without impacting performance on the primary array.

In summary…

What I hear over and over is that dedupe rates when using VMware deteriorate over time

Disturbing things learned!

Warning! Warning Will Robinson!

SPOILERS!!!

What is Unified Storage?

Unified Storage Architecture Cost of Living

Unified Storage Perfection

Enterprise Flash Disks

Fully Automated Storage Tiering (FAST)

FAST Cache

FAST + FAST Cache = Unified Storage Performance Nirvana

Block Data Compression

What does all of this mean for me and Unified Storage?!

!UPDATE! !UPDATE! !UPDATE! :) I was informed apparently every thing is not as it seems? (Which frankly is a breath of relief, whew!)