The cloud + big data

Photo by Susanne Feldt
Photo by Susanne Feldt

What to watch for

After completing this lesson, you’ll be able to:

  • Define cloud computing
  • Describe what differentiates big data from regular old data ?
  • Discuss the importance and application of the cloud and big data in various fields
  • Develop and apply an understanding of how to use Wikipedia appropriately in academic settings

The Cloud

Let’s start here:

Cloud sticker by Chris Watterson
Cloud sticker by Chris Watterson

It’s true! Well, mostly. It’s a bunch of other computers working together in a complicated way, but yeah, that’s the basic idea.

Let’s dive in.

Required readings

A fun infographic highlighting some of the key dates / events in the evolution of cloud computing. Probably a good idea to go back through this graphic again after you’ve read everything else here. A few dates of particular important:

  • Salesforce.com launching in 1999 really marked the advent of the Software as a Service (SaaS) model, a key offering of cloud computing.
  • The launch of Amazon Web Services1 in 2006 made cloud computing to a dramatically larger audience.
  • Apple’s iCloud service launches, further cementing the cloud’s place in the general public’s tech vocabulary.

? “The Beginner’s Guide to the Cloud” by Jess Fee

(925 words / 5-7 minutes)

A great way to ease into what’s admittedly a fairly technical topic even for this class. As you’re thinking through this concept of “networks of servers”, this map of data centers showing where “the cloud” is actually located might help:

? “Cloud computing“, Wikipedia

(6,912 words / 35-42 minutes)

I know, I know. Another topic, another Wikipedia article, right? But—they’re really so very good for general overviews of broad topics!2 Things to note:

  • You need to be able to understand (at least fairly well) every word of this introductory paragraph:

Cloud computing is a kind of Internet-based computing that provides shared processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services),[1][2] which can be rapidly provisioned and released with minimal management effort. Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in third-party data centers.[3] It relies on sharing of resources to achieve coherence and economy of scale, similar to a utility (like the electricity grid) over a network.

  • The discussion of the origin of the term “cloud” is pretty fun, in a nerdy way.
  • The concept of distributed access to pooled computing resources and data arose in the early days of computer networks (along with ARPANET), but it took many decades for the necessary infrastructure to develop.
  • Spend a good bit of time familiarizing yourself with the “Characteristics” section—this does a pretty solid job of explaining why the cloud has succeeded.
  • Also spend some time understanding the various service models (Saas, PaaS, and IaaS). To understand these well 3, you’ll have had to have paid pretty good attention to “What is Code?”. This graphic may help:
  • Don’t worry too terribly much about the “Deployment models” section.

? “What Every CEO Needs to Know About the Cloud” by Andrew McAfee

(4,014 words / 21-26 minutes)

Unless this piece particularly interests you, you can skim through it until you get to the section called “The Skeptics’ Concerns”. This section is why I included the article—it gives you a good dose of reality to counterbalance all the “the cloud is amazing!” sentiment present in some of our other readings.

Big data

Now that we have a decent grasp on what the cloud is4, let’s talk very briefly about the related concept of big data. We’ll begin with two activities!

  • Click here to see the info Google already has about you and the settings you can change surrounding this data collection (under Personal Info & Privacy).
  • Download Ghostery to see how different companies and sites track our information on any site you visit.

Required reading:

? “Big data“, Wikipedia

(7,264 words / 37-44 minutes)

Salient details from the article:

  • Good top-level definition: “Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. ” “Traditional data processing applications” more or less means “humans.”
  • “Data sets are growing rapidly in part because they are increasingly gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.”
  • The three Vs (“volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources)”) are a useful way to think about big data. Pay attention to this section.
  • Don’t worry too much about the “Architecture” and “Technologies” sections; instead, spend a good deal of time carefully reading the “Applications” section. You might want to do a bit of Googling about Obama’s “Project Narwhal”
  • You can skim the “Research activities” section, but spend a bit of time on the “Critique” section.

As with every other topic we’ve touched upon, there’s so much more that we could and should say about big data! Talk about those very things in your discussion groups, and feel free to ask me questions, too!

Non-required readings

Map of the Internet by Quartz

A great interactive resource. All the articles are worth reading, but especially check out numbers 1, 2, and 6.

Discussion Questions

  • Explain the understand of the cloud you had before this lesson.
  • Spend some time discussing all the various ways you interact with the cloud on a daily basis.
  • Talk about some of the potential downsides of the cloud, especially if they’ve ever impacted you personally.
  • How do you feel about the phrase “if you’re not paying for the product, you are the product”? It’s fair to say most if not all of us make this trade-off in various ways and to various extents. Where do you fall on that spectrum?

Words on / reading time for this page: 1,093 words / 6-8 minutes

Words in / reading time for required readings: 19,115 words / 98-109 minutes

Total words in / reading time for this lesson: 20,294 words / 104-117 minutes


  1. There are going to be a lot of acronyms in this lesson—brace yourselves.

  2. Also, I’ve been meaning to say this for a while, but I’ll say it here: Wikipedia is pretty darn reliable, especially if you learn how to use it properly. How do you use it properly? Well, first, you evaluate it critically, just like any other source. Then, you understand that anyone can edit it at any time, so you do need to exercise a bit of caution. But, you should also understand that there are quality control mechanisms built in (though these have their own issues, but that’s a topic for another day). Finally, any Wikipedia article worth its salt should have an extensive list of sources for its claims, and you should always verify anything important / controversial / etc. in a Wikipedia article through its source before relying on it too heavily. With that off my plate, back to our regularly scheduled programming!)

  3. Don’t worry about understanding them absolutely perfectly

  4. Hopefully!