Episode 104 - Scraping Facts Online: If You Can’t Beat ’Em, Datum

32:58
 
Share
 

Archived series ("Inactive feed" status)

When? This feed was archived on May 26, 2022 12:11 (3M ago). Last successful fetch was on April 07, 2022 03:18 (4M ago)

Why? Inactive feed status. Our servers were unable to retrieve a valid podcast feed for a sustained period.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 265085859 series 2370245
By Paul Giesting, William Schmitt, Paul Giesting, and William Schmitt. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.
  1. At the time of this taping, Paul was in the middle of the Metis “bootcamp” program learning the capabilities, tools, and insights of data science. This conversation ranged widely in the realm of data analysis and management, examining its relevance to Paul’s field of geology but also exploring the world’s immersion in what Bill would call a data ecology: It seems every datum is connected, or connectable, to every other datum That word is the original singular form of the plural word “data.”
  2. The growing plethora of data has to be tracked and organized, even though today’s computer hardware doesn’t allow all the world’s data—or even relatively large slices of that data—to be stored and analyzed in one place at one time. Realizing that words are data, too, Paul pointed out that geology encountered a data explosion crisis a few decades ago as science developed enough new names for various rocks to make the new information less useful. That was until geologists produced a plan for sorting out and categorizing rock names according to rocks’ bulk chemistry instead of their constituent minerals (example here). Paul came to see the value of advanced organization in obtaining, thinking, and acting upon geological data—hence, his pursuit of this certificate in data science.
  3. Discussion of this specific field of science led to the use of various other terms, with various meanings, none of them fully understood by Bill. The terms included informatics, data scraping, the analysis of data clustering, “big data,” and “machine-learning algorithms.” These terms can be anticipated to be influential in nearly all fields, so it behooves the layperson to develop some familiarity with them. It is quite possible to become skeptical of such a body of knowledge and skills that can be used for benevolent or malevolent purposes, like everything. But Paul said the hopeful side of his personality recognizes what data scientists already recognize—namely, that this amazingly powerful field also has its limitation.
  4. He recalled there is an author who currently is writing books with a robust skepticism about machine-learning. Separately, one can get a laugh from the current results seen in the hybrid field of machine-learning poetry. Bill guessed the author was Julia Evans, but it was likely Janelle Shane, the author of You Look Like a Thing and I Love You.
  5. The bottom line is that, as with all science, its tools and results cannot provide their own guidance on how to use wisely the fruits they bear. The guidance must come from external forces driven by human virtue and values.

Liner notes by Bill. Audio editing by Morgan. Cover art for this epsiode was produced by Paul... in conjunction with the Landsat 8 mission, the scikit-learn and seaborn libraries, and Mauna Loa and Kilauea volcanoes. (See his final project slides here.)

153 episodes