How Large Is Your Digital Shadow (Part 2)

In part 1 of this post, I wrote about the “Digital Shadow” and provided some examples of all of the data that is being created about you or on your behalf, in addition to the data that you create.

Here, we’ll walk a few activities in a typical day, and identify some (but definitely not all!) of the digital shadow that’s being created.   On the left side of the table is a listing of activities, with a discussion on the right noting the digital data that’s being left behind.

Wake up, turn on my phone to check for new texts and emails, surf the web for news and read email in my personal email account. The cell phone carrier has a record of my phone contacting the local tower when it is turned on. knows that I logged into my account and has information about my location based upon my IP address (tracked for security and other purposes).   Since I’m logged into email, my email provider keeps track of my searches (I can turn this off).  My ISP (the cell is often on WiFi for data when I’m home) has information about the sites that I visit.  My browser locally records my history, and the sites I visit may be leaving or updating cookies and/or capturing my IP address along with some unique identifiers on their own servers.

 

Later, I start my work day, logging into the corporate VPN and checking and responding to email messages. The VPN system has information about my log-in, and some of my activities are preserved in email including replies and new messages that I create.  The recipients of my email messages also have a copy, and each copy may be replicated many times for email archives and data protection (backups, etc.).  I don’t think my company tracks my location, but it could.

 

One of my emails includes a file sharing link to a folder for content.  I add this link/file to my folder, and make changes to one of the presentations in that shared area. When I add the link, a copy of everything in the shared folder is made on my laptop, and the information about the link is logged by the system.   This process is replicated for all of the accounts where this app is installed (cell phone, tablets, etc.)   As I change and re-save the presentation, everyone sharing the folder receives the update (and this information is logged and distributed as “news” to other sharing the folder).

 

I grab a quick lunch at my favorite sandwich shop, and while waiting I check in on Facebook, make a few posts and re-tweet a message on Twitter The shop tracks my purchase (and thus my location at that time) with my loyalty card.  Facebook and Twitter both have new content from me that they time-stamp and (unless I’ve turned it off) also know and save my location.  My location is tracked by my phone.  As with every purchase I make today using a credit card, data about my purchase is tracked and available to me online; it is also stored and shared within my credit card company as permitted.

 

I finish my blog post and push “publish”. The post is published to one of our company blogs.  This automatically triggers a tweet about the posting from a few company accounts (and I send my own Tweet), which in turn generates additional data through re-tweets.  My tweet may include my location (this can be turned off).  The blog is captured and republished by several “automatic” online news sites looking for compliance stories – so now it exists on their servers, is backed up by them, and sometimes even re-distributed to hundreds or thousands of subscribers as part of a newsletter distributed in email form.

 

I’m flying later today, so I visit the airline’s site to view the status of my upgrade request and check-in.  I rent and download a movie to my tablet for the trip The airline’s systems capture my log-in and my check-in information.   iTunes records my purchase (as does my credit card company) and the movie is downloaded to my tablet for later viewing.

 

A client calls and we talk for a few minutes.  Then I attend a meeting remotely by phone and web conference.  I call my client back with some additional ideas and leave her a voice message. “Metadata” on both sides of the call is recorded by the carriers – start time and number, end time, etc.  I log into the web conference with my browser and use a password, so that system records my IP address, along with everyone else including the owner of the account. The duration of the conference and the time at which individual attendees “drop” the conference is probably saved.  The voice message system now has a log of my call (time, duration, phone number) along with a recording of the actual message, which might be transcribed and sent by email (e.g. Google Voice).

 

On the drive to the airport, I use a social GPS phone app to check traffic The GPS app uses my location so it knows where I am, and is combining that information with thousands of others to update traffic information.   My cellphone carrier is also creating records as it switches cell towers along the way.  The toll pass on my windshield notes the time and my account (i.e. me) as I electronically pay my fare on the highway.  My car has dozens or hundreds of sensors recording information that will be downloaded at a later date when it is serviced.

 

As I park my car at an offsite location, I send a quick text to a friend. My parking service uses a card reader that registers my entrance to the facility. My carrier creates a record of the text I send, as does my friend’s carrier.  The texts are also stored on each phone (and possibly some tablets and other devices if linked to the same account).

 

At the airport, I check my bag and head through security. My airline knows that I’m at the airport and has data about my luggage, too.  Since I’m in the TSA “Pre” line, TSA’s systems know and record my location on check-in.

 

Waiting for my flight, I buy a cup of coffee and take a photo of an item that may make a good gift. My credit card company registers the time, date, location and amount of my purchase.  The photo that I take is stored, and it tags itself with location info using GPS (unless I turn off this feature….), all of which is stored on my phone, which replicates to a cloud and then across other devices.

 

During the flight, I work on a spreadsheet and a presentation. Okay, this didn’t really happen because there’s no room in the cramped plane – but if it did, I now have new documents on my laptop, which will also be replicated and backed up soon.

 

Upon landing, I reclaim my luggage, pick up my car rental and use my GPS app to drive to my hotel. The airline tracks the location of my bag and the time that my flight arrived.  My rental company records the time and location of my rental, and of course the GPS app knows when and where I left the airport, along with the hotel where I stay.  The hotel keeps information about my check-in, and my credit card company knows that I’m there, too.

 

That night I log into the hotel wifi, check some emails and call it a night. The hotel’s wifi system maintains information on my log-in for billing (and possibly security) purposes.  I may have forgotten, but my trusty DVR at home remembers to record a few of my shows, which are stored on the DVR’s drive.

 

 

I intentionally created a very small amount of this data – the blog post, the photo, and some changes (one copy!) to files that I edited, along with some email and a few pieces of social media content.  Yet my activities generated dozens and dozens – if not hundreds – of discrete data chunks, some of which will be preserved for a long duration.

 

All of this data poses interesting questions, most of which have not been clearly answered:  Who owns and controls this data?  How much of it is / should be subject to privacy requirements?  Is this data available for: eDiscovery; compliance; other purposes?  Should I be made aware of the data that’s being created and stored?  Should I have the right to demand that the data is not retained for long, never retained, or never even created?  Would these answers be different if I lived in Europe?  What if I’m a US Citizen traveling there, or vice-versa?

It’s an interesting exercise, try it out yourself – you may be surprised by your results!

 

 

 

Jim Shook

Jim Shook

Director, eDiscovery and Compliance Field Practice, Data Protection and Availability Division
I am a long-time “lawyer/technlogist”, having learned assembly language on a TRS-80 at age 12 and later a degree in Computer Science. But the law always fascinated me, and after being a litigator and general counsel for over 10 years, the challenges that technology brought to the law and compliance let me combine my favorite pursuits. I spend my days helping EMC’s customers understand their legal and compliance obligations, and then how to apply technology and best practices to meet them.

Chicago Was Cold but EMC Forum Was Hot!

Show is Cold Too, just like ChicagoOctober 23rd was very cold in Chicago, almost 20 degrees below normal.  But at the Westin Hotel near O’Hare Airport things were hot with nearly 600 customers attending the EMC Forum Chicago event.  In 2013 there have been 55 different EMC Forum events happening across the country providing information about EMC’s exciting solutions portfolio helping thousands of existing and potential EMC customers better understand how to lead their own transformation.

The event was kicked off by Steve Crowe, the Central Division Senior VP, with the keynote address given by Jon Peirce, SVP, IT Private Cloud Infrastructure Services sharing the different ways that EMC is leveraging its own solutions to transform EMC into a more efficient organization. Following the keynote, there were 5 different tracks including 20 different sessions to choose from on topics that ranged from cloud transformation, backup recovery and archive, converged infrastructure with Vblock, VIPR, to Big Data.  There were also 14 sponsor booths where folks could stop and talk about specific products and solutions. When I was a customer, I loved to attend these events to get the latest information on all things EMC.

I was lucky enough to be the presenter of “Changing the Game with EMC Backup and Recovery” for BRS.   My session was full with even some attendees standing in the back which tells me there are still lots of folks out there struggling with backup and archive. I talked about how IT organizations that don’t focus enough on servicing the needs of their business units can create an accidental architecture which can be very inefficient, expensive, hard to manage, and not be as scalable as it needs to be. I provided an overview on how EMC’s data protection solutions for backup and archive can provide real value for their transformation journey.  I also provided an update on our most recent launch for Data Domain, Avamar, and NetWorker.

I believe there is only 1 EMC Forum event left in 2013 (30 Oct is Dallas, where you can also say “hi
to EMCBackup) – but if you get the chance to attend EMC Forum next year, I highly recommend it.  It was fun and very informative.  In Chicago, EMC Forum was hot, and there was a real buzz in the air!

Gene Maxwell

Gene Maxwell

Technical Marketing, Data Protection and Availability Division
I am known by many as the creator of documentation that helps others easily understand technology. This is because I discovered that I myself was a visual learner as I worked in many different IT roles over the years. Prior to my technical marketing role, I was an EMC technical consultant for six years. I also have many years of experience as a customer in IT responsible for data center management & disaster recovery, including backups. My hobbies include building PCs, collecting movies (Casablanca is my favorite), singing and playing my guitar. I have a twin brother who is three minutes older than I am.

How Large Is Your Digital Shadow? (Part 1)

Most of us are at least vaguely aware of the staggering amount of electronic data we’re creating.  Here’s a quick refresher from our friends at IDC:

From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 
40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child in 2020). From now until 2020, the digital universe will about double every two years.

That seems like a lot of data!  But once you give some thought to all of the different types of data being created today, it starts to add up and make sense.

Consider the following types of data that are regularly being created:

  • Data that I create directly and on my own – email messages, spreadsheets, presentations, Twitter, Facebook posts, etc.  Remember that each time that I reply to a message or forward an email with photographs, I’m “creating” a copy of that data in addition to whatever new information I add to the original
  • Data that is created for me using a device or a tool – think about digital still and video cameras, scanners, DVRs
  • Copies of data that I create or are created on my behalf – downloaded (video rentals, e-books, MP3s) and uploaded (YouTube, Facebook, Instagram) music and videos, photographs from friends that I keep, etc.
  • “Digital Shadow” data – information that is created about me (IDC says that the data in the digital shadow is actually larger than the information that you create).  This includes credit card transactions, preferences on systems like Amazon, loyalty cards, etc.
  • System data and logs.  A large amount of data is created by our activities through the systems that we use such as firewall information, sites we have accessed, cookies on our browsers, toll pass data, etc.  (Some of this is covered within our Digital Shadow).
  • A significant amount of data is also created by various systems, including those for data protection and compliance – archives, replication and backup systems that ensure data is available when needed.

Why is this important?  Much of this data is directly subject to compliance obligations (and even when it’s not, it’s often hard to separate it from data that is, so it’s all lumped together), which costs organizations money to properly store, secure, protect and even “discover” for litigation purposes.  Other data leaves a record of activities that we may not want to share – today or next year,  depending on who is accessing that information and for what purpose.  If you put it all together, in many ways all of this information forms a diary of our thoughts and activities.  And there are few of us who would want our diary to be an open book.

In part 2 of this post, we’ll cover a “day in the life” and detail many of the types of data being created by normal activities.  What you see may surprise you!

Jim Shook

Jim Shook

Director, eDiscovery and Compliance Field Practice, Data Protection and Availability Division
I am a long-time “lawyer/technlogist”, having learned assembly language on a TRS-80 at age 12 and later a degree in Computer Science. But the law always fascinated me, and after being a litigator and general counsel for over 10 years, the challenges that technology brought to the law and compliance let me combine my favorite pursuits. I spend my days helping EMC’s customers understand their legal and compliance obligations, and then how to apply technology and best practices to meet them.

I’ll Take “EMC for SharePoint” for $100 Alex

I’ll take “EMC for SharePoint” for $100 Alex.

As I participated in the Chicago SharePointFest event last week I talked with lots of SharePoint customers about the many ways that SourceOne for SharePoint could help them.  Several of these customers asked me an interesting open-ended question that made me stop and think.   “So what does EMC do for SharePoint?”   With so many good answers to this question, I thought this would make a great new Jeopardy category.

If “Why EMC for Microsoft SharePoint?” ever appears as a category on Jeopardy, here are what some of the answers would be:

  1. What is Primary Storage:  EMC offers multiple primary storage options that offer a wide variety of storage features many of them with our Fully Automated Storage Tiering (FAST) technology.
  2. What is Virtualization/Cloud Platform:  EMC as part of VCE offers VBlock for first class virtualization of any application environment including SharePoint and the other Microsoft applications.
  3. What is Externalize Active Content:  SourceOne for SharePoint gives customers the ability to externalize their active SharePoint content out of the SQL database enhancing SharePoint performance & scalability and decreasing licensing costs while maintaining full transparency to SharePoint users.
  4. What is Archive Inactive Content:  SourceOne also provides the ability for customers to archive inactive SharePoint content out of their SQL databases by moving it to a more cost appropriate tier of storage that can leverage features like deduplication, compression, and single instancing. SourceOne offers SharePoint users full access to their content via a web plug in, maintaining ease of search and full transparency to SharePoint users.
  5. What is E-Discovery:  SourceOne Discovery Manager provides easy-to-use yet very powerful e-discovery capabilities across all SourceOne Archive data.  SourceOne Discovery Manager can discover, manage, and apply secure hold to any content in the EMC SourceOne archives.
  6. What is Archive Storage:  EMC offers multiple archive storage options for SharePoint including Data Domain, Atmos, and Centera that provide many storage efficiency and data protection advantages.
  7. What is Backup & Recovery: Avamar and NetWorker with Data Domain provide the best backup and recovery for SharePoint with intelligent agents that allow recovery of individual SharePoint items (when combined with Kroll) or the entire SharePoint farm.
  8. What is Enterprise Content Management:  Some customers try to get SharePoint to do things it really wasn’t designed to do.   EMC Documentum integrates with SharePoint to provide many of these common document management requirements such as business process management and compliance while maintaining the familiar SharePoint user experience.

I’m probably forgetting something, but that’s one heck of a list!  I think EMC has SharePoint well covered.

Gene Maxwell

Gene Maxwell

Technical Marketing, Data Protection and Availability Division
I am known by many as the creator of documentation that helps others easily understand technology. This is because I discovered that I myself was a visual learner as I worked in many different IT roles over the years. Prior to my technical marketing role, I was an EMC technical consultant for six years. I also have many years of experience as a customer in IT responsible for data center management & disaster recovery, including backups. My hobbies include building PCs, collecting movies (Casablanca is my favorite), singing and playing my guitar. I have a twin brother who is three minutes older than I am.

Is Archiving Part of Your Checklist?

154241959

While we do a lot of talking here on The Backup Window about backup and recovery, as we should, it is important to keep in mind that archiving is also a key component of a data protection strategy.

The good news is that with EMC Data Domain systems, completing your data protection strategy doesn’t have to be hard. In addition to providing efficient backup and disaster recovery, Data Domain is an ideal platform for archive data. All Data Domain systems inherently support archiving workloads without the need for additional software.

Why Data Domain for archive storage?

Here’s how Data Domain checks off key requirements to be a leading protection storage platform for archive data:

  • Compliance: With EMC Data Domain Retention Lock, Data Domain systems can simultaneously meet governance policies and compliance regulations for your archive data and provides secure data retention for file and email archive data that meets the strictest US and International standards including SEC 17a-4(f), ISO 15489, and MoReq2010.
  • Reliability: Protection storage is the storage of last resort.  It is the last place you go to access critical data when you can’t access it anywhere else.  So, when selecting an archive storage solution, it is imperative you find a solution you can trust to provide you access to your data when you need it.  The Data Domain Data Invulnerability Architecture instills such trust.  The Data Invulnerability Architecture is built into every Data Domain system and provides end-to-end data verification, continuous fault detection and self-healing.
  • Efficiency: Archiving data on a Data Domain system gives you the benefits of inline deduplication: by eliminating duplicate data segments, you can reduce your archive storage footprint by up to 5x and your consolidated backup and archive footprint by 10 to 30x.  This is the industry’s highest dedupe rate for archive data. In addition, the unique ability to consolidate backup and archive workloads onto a single Data Domain system eliminates silos of storage leading to administration savings and maximized storage efficiency.
  • Cost-savings: Specifically, Data Domain offers a savings of up to 82% lower cost per gigabyte compared to other archive storage solutions.  Furthermore, a recent study from IDC shows that customers who consolidate their backup and archive data onto Data Domain gain a payback on their investment in under 6 months!
  • Breadth of Our Ecosystem: Data Domain systems have a broad archive partner ecosystem as they are qualified with leading archiving applications, which enables them support a variety of archiving use cases, including file and email, database, content management, and storage tiering. For file/email and SharePoint archiving, Data Domain systems integrate with EMC SourceOne (which just announced some great new features) to provide a complete end-to-end EMC archiving solution.

Got questions? Shoot me a comment or a note to @A_Langon and check us out on-line at www.emc.com/backup-and-archive and www.emc.com/backupleader.

Alyson Langon
A couple years ago, fresh out of Business School at Boston College, I started at EMC and dove head first into all things backup and archive, focusing on Data Domain systems. I love the challenge of communicating complicated technologies in innovative and engaging ways and there is certainly no shortage of inspiration at EMC’s Data Protection and Availability Division. Outside of the tech world, I am an artist, animal lover and sufferer of wanderlust. You can also find me on Twitter achieving the perfect balance of data protection and cat gifs.