-
"The most notable “discovery” in the dataset was that if you simply plotted the number of steps versus the BMI, you would see an image of a gorilla waving at you (Fig. 1b). While we teach our students the benefits of visualization, answering the specific hypothesis-driven questions did not require plotting the data. We found that very often, the students driven by specific hypotheses skipped this simple step towards a broader exploration of the data. In fact, overall, students without a specific hypothesis were almost five times more likely to discover the gorilla when analyzing this dataset (odds ratio = 4.8, P = 0.034, N = 33, Fisher’s exact test; Fig. 1c). At least in this setting, the hypothesis indeed turned out to be a significant liability."
-
Highly valuable for all affected, and yet still completely absurd to think about. Technology Is People (and is also a complete nuisance).
-
On the problems of machine-learning and medical data.
-
A 'yes' to all of this; English Weird as a thing, and Christmas is the time of English Weird. TDIR begins on December 20th; time to read along.
-
Filed away for a thing on data next year.
-
I love jq at the command line for even the simplest tasks; I need to go over this at some point.
-
This, like everything Tony and Taylor did, is very good. Not just on film, but on creative work, too.
-
Excellent overview and collection of links on cryptocurrencies from the Co-op Digital Newsletter; some useful links in here for A Thing.
-
"pv – Pipe Viewer – is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion." Looks very handy.
-
"xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable." iiinteresting.
-
OK, this is great: Bret Victor's library for exploring interactive documents. Tidy – thanks to its use of data-attributes – but super-clear. Really nice to have a web-based library, too, and one focused on text. Now thinking about this conceit again.
-
"There was also one late night when a stranger opened the door and walked into the house when August should have auto-locked the door. (The stranger was trying to enter our next-door neighbor’s house and didn’t realize he was at the wrong door.)" YOU HAD ONE JOB etc.
-
Rather looking forward to seeing this play out: thirty days of processing and spelunking CSV, from Paul Downey. Lots of new tools and tricks emerging already.
-
Really nice exploration of a small stack for poking data at the commandline. I'm a fan of jq and its ilk already, so this extends some of those techniques.
-
"Sheetsee.js is a JavaScript library, or box of goodies, if you will, that makes it easy to use a Google Spreadsheet as the database feeding the tables, charts and maps on a website. Once set up, any changes to the spreadsheet will auto-saved by Google and be live on your site when a visitor refreshes the page." This is good.
-
"All it takes to get a website going for a repository on GitHub is a branch named gh-pages containing web files. You also don’t need a master branch, you can have a repo with just one branch named gh-pages. Here is what I think is really cool, if you fork a project with just a gh-pages branch, you’re only a commit away from having a live version yourself. If this repo being forked is using sheetsee.js then everyone is a fork, commit and spreadsheet away from having a live website connected to an easy (a familiar spreadsheet UI and no ‘publish’ flow because Google autosaves) to use database that they manage (control permissions, review revision history)." Very smart.
-
Hosted statistics tool with attractive interface and smart API. Not cheap for its single-tier plan ($99/mo), but looks like it might be worth a poke.
-
"Here we get a glimpse of an alternative figuration of data itself. Rather than some kind of precious (but immaterial) stuff, or fuel for market speculation, data here is a relationship, a link between one part of the world with another, and a trace that can be endlessly reshaped."