Trust is mission-critical

Artificial intelligence and data science must instill trust because good decision-making depends on it, which, in turn, drives better outcomes, reputation, and ultimately adoption.

๐Ÿค– So it's a core message of my book that if we are to replace or extend software systems with A.I. systems, we have to guarantee improvements in trustworthiness. And producing trustworthy insights and models is a constant struggle in data science.

Featured image by: MIT Sloan Mgmt Review

โš–๏ธ Interpretable Machine Learning (a.k.a Explainable AI) provides tools to address trust/ethical concerns organized in three levels: ๐—™๐—ฎ๐—ถ๐—ฟ๐—ป๐—ฒ๐˜€๐˜€, ๐—”๐—ฐ๐—ฐ๐—ผ๐˜‚๐—ป๐˜๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†, ๐—ฎ๐—ป๐—ฑ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ฝ๐—ฎ๐—ฟ๐—ฒ๐—ป๐—ฐ๐˜† โ€” collectively known as F.A.T. I like to see these in a pyramid structure because each level depends on the one beneath it. And there are interpretability tools to diagnose problems on each level as well as to fix each problem.

๐Ÿ“– It's an extensive area of active research with hundreds of methods. My book is an introduction with several forays into advanced topics.

Why does Interpretable Machine Learning matter?

It's hard to tell from all the hype, but Artificial Intelligence ๐—ถ๐˜€ ๐—ฏ๐—ฎ๐—ฟ๐—ฒ๐—น๐˜† ๐—ถ๐—ป ๐—ถ๐—ป๐—ณ๐—ฎ๐—ป๐—ฐ๐˜† ๐Ÿ‘ถ. But I'm hopeful that we can bring it into maturity.

โ˜” One of the most significant issues Machine Learning projects face is that models are ill-equipped to weather changing, adversarial, and ๐˜‚๐—ป๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฐ๐˜๐—ฒ๐—ฑ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ฐ๐—ผ๐—ป๐—ฑ๐—ถ๐˜๐—ถ๐—ผ๐—ป๐˜€, much like planes facing storms and turbulence. But aircraft are robustly built and can overcome severe conditions both automatically and guided by experienced pilots. On the other hand, we know models must generalize well, but this proves to be an elusive property.

๐ŸŽ›๏ธ Ever since I wrote my book, I've been asked many times why I'm passionate about ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐—ฝ๐—ฟ๐—ฒ๐˜๐—ฎ๐—ฏ๐—น๐—ฒ ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด. I've responded that it's the instrument panel to pilot machine learning even in the worst conditions, from unfair to uncertain outcomes. So why wouldn't I prefer to have a complete instrument panel available? But, on the other hand, using predictive performance alone is like piloting with a single instrument!

Featured image by: WikiImages from Pixabay

โœˆ๏ธ Currently, flying is the safest mode of transportation. But for A.I., there is still a long way to go. For starters, we will need better no-code AutoML with human-in-the-loop and Interpretable M.L. built-in โ€” like cockpits for Machine Learning engineers. And methods that automatically audit and test models, much like commercial planes, undergo strict maintenance regimens. And given what I've seen currently being built by Auto ML, MLOps, and XAI startups and researchers, it seems like it's heading in this direction, so I have reasons to be hopeful that for most commercial use cases, A.I. someday will be the ๐˜€๐—ฎ๐—ณ๐—ฒ๐˜€๐˜ ๐—บ๐—ผ๐—ฑ๐—ฒ ๐—ผ๐—ณ ๐—ฑ๐—ฒ๐—ฐ๐—ถ๐˜€๐—ถ๐—ผ๐—ป-๐—บ๐—ฎ๐—ธ๐—ถ๐—ป๐—ด!

Not All is Lost from My Biggest Failure

๐Ÿฅ Recently, I found this box of frisbees in my parent's basement, and it's what's left of my biggest failure โ€” a search engine #startup. ๐—™๐—ฎ๐—ถ๐—น๐˜‚๐—ฟ๐—ฒ ๐˜€๐—ผ๐˜‚๐—ป๐—ฑ๐˜€ ๐—ต๐—ฎ๐—ฟ๐˜€๐—ต, but we learn by trial and error, so every mistake is an opportunity for growth.

Featured image by: fauve othon on Unsplash

๐Ÿ“Š One of the biggest lessons I learned was technical, and it had to do with the importance of #analytics and ๐—ฑ๐—ฒ๐—ฏ๐˜‚๐—ด ๐—ฎ๐—น๐—ด๐—ผ๐—ฟ๐—ถ๐˜๐—ต๐—บ๐˜€ for points of failure. It was then I realized that Machine Learning had a problem. After all, how do you debug ML models? This is how in 2017, I first stumbled upon Interpretable ML / Explainable AI research. Fast forward to 2020, and I was writing a book about it! And I spoke about this journey to San-Francisco-based A.I. startup entrepreneurs and workers.

๐Ÿ’ช ๐ผ๐‘› ๐ถ๐‘œ๐‘›๐‘๐‘™๐‘ข๐‘ ๐‘–๐‘œ๐‘›: the frisbees may have been the only tangible items, but my failure left behind stories, ideas, lessons, and a brand new perspective โ€” that has only made me ๐˜€๐˜๐—ฟ๐—ผ๐—ป๐—ด๐—ฒ๐—ฟ! As for the frisbees, they will find a new home with goodwill.

Opinion: Resource Constraints Foster Creative Solutions

I learned to program on this computer โ€” I was a child during the '80s ๐Ÿค“. It had a 4.77 MHz CPU, 256 KB RAM, monochrome display, and no hard drive, so you had to be creative to overcome ๐—ฟ๐—ฒ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—ป๐˜€๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐˜๐˜€ โ€” not to mention exercise patience!

Above picture by: s3freak

We are ๐˜€๐—ผ ๐˜€๐—ฝ๐—ผ๐—ถ๐—น๐—ฒ๐—ฑ these days! To put it in context, most smartphones ๐Ÿ“ฑ have over 16 thousand times the RAM and more storage than would have fit in a room in the 80s. Add that to cheap, limitless cloud storage. I am not complaining.. That is great! However, I wonder how much does resource constraints foster software innovation โ€” and optimal code.

Today, trillion-parameters deep learning ๐Ÿค– models are pushing the envelope. Still, at the same time, it seems illogical that they represent the most ๐—ฒ๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜ ๐˜€๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป grounded in, for instance, biology, causal understanding of the world, or statistics. So before ushering in the age of quantum computing, I'm hoping we hit some resource limitations to focus more energy on more creative and intuitive solutions โ€” not to mention cost-effective.

What do you think? How much does an abundance of resources hinder or enable creative solutions?

Opinion: What Makes Us Care?

๐Ÿ‡จ๐Ÿ‡ท 7 years ago, I had a fantastic 4-day journey trekking through the ๐—–๐—ผ๐˜€๐˜๐—ฎ ๐—ฅ๐—ถ๐—ฐ๐—ฎ๐—ป ๐—ฟ๐—ฎ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฒ๐˜€๐˜. On the 1st day, we had to cross a wild river with a metal basket hanging on a rusty rope. And I thought to myself, "what the hell have I gotten into?!".

Featured image by: Havardtl

๐Ÿ’ On that journey, I saw ๐—บ๐—ฎ๐—ป๐˜† ๐˜€๐—ฝ๐—ฒ๐—ฐ๐—ถ๐—ฒ๐˜€ of wildlife. I slept smelling the moss on the bark and wet ferns. And I woke up every morning to a majestic orchestra of birds, insects, monkeys, and frogs. It's also hard to realize the sheer scale of a rainforest when you are in it. On peaks, we could see the many green valleys we had crossed with Ceiba trees towering 17 stories high over the canopy!

๐ŸŒŽ We only have 36% of rainforests left. When I was born it was well over 50%. Today is #WorldRainforestDay and I thought Iโ€™d share a story of why I care. In #DataScience, we think ๐˜ง๐˜ข๐˜ค๐˜ต๐˜ด & ๐˜ง๐˜ช๐˜จ๐˜ถ๐˜ณ๐˜ฆ๐˜ด alone are convincing. But often it's the ๐˜ญ๐˜ช๐˜ท๐˜ฆ๐˜ฅ ๐˜ฆ๐˜น๐˜ฑ๐˜ฆ๐˜ณ๐˜ช๐˜ฆ๐˜ฏ๐˜ค๐˜ฆ & ๐˜ฆ๐˜ฎ๐˜ฐ๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ด that come with them that make things matter to us. I don't regret crossing the river on the basket because the journey the followed was life-changing. If I was an environmentalist before because of the facts I knew, now I had more conviction than ever that #nature had to be preserved for future generations!

Food Security & Climate Change

Today is ๐–๐จ๐ซ๐ฅ๐ ๐…๐จ๐จ๐ ๐’๐š๐Ÿ๐ž๐ญ๐ฒ ๐ƒ๐š๐ฒ. For me, it's a day of reflection.

๐Ÿฆ  After all, ๐—–๐—ข๐—ฉ๐—œ๐——๐Ÿญ๐Ÿต had a food safety-related genesis. Natural disasters and clearing land for urbanization + agriculture pushes wildlife closer to human settlements, which fuel pandemic risk.

๐ŸŒฝ Food safety is essential, no doubt, but it's intrinsically related to ๐—ณ๐—ผ๐—ผ๐—ฑ ๐˜€๐—ฒ๐—ฐ๐˜‚๐—ฟ๐—ถ๐˜๐˜† โ€” and this is what worries me the most. There's a need to feed another 2 billion mouths by 2050 and double food production to that end. So, as a data scientist in agriculture, I'm inspired to make my tiny contribution to improving food security.

๐ŸŒŽ However, ๐—ฐ๐—น๐—ถ๐—บ๐—ฎ๐˜๐—ฒ ๐—ฐ๐—ต๐—ฎ๐—ป๐—ด๐—ฒ can make our food production goals nearly impossible. Under a high-emission scenario, by 2050, a huge swath of the United States will suffer from a sizable decline in crop yields. However, this is offset by the fact that other areas of the country will experience an increase. Other countries won't be that lucky since they are entirely vulnerable given so many land challenges: desertification, land degradation, climate change adaptation, undernourishment, biodiversity, groundwater stress, and water quality (see IPCC for details). It's an existential threat to humanity, and we have only a few years to reverse this trajectory.

Above picture by: ProPublica and UK Met Office, Featured image by: Sven Lachmann from Pixabay

Book Review: Noise

When discussing human judgment and, by extension, algorithmic decisions, we are used to talking about ๐›๐ข๐š๐ฌ, but what about ๐ง๐จ๐ข๐ฌ๐ž?

Above picture by: Little, Brown Spark, Featured image by: Sophie Huiberts

๐ŸŽฏ Nobel Laureate แด…แด€ษดษชแด‡สŸ แด‹แด€สœษดแด‡แดแด€ษด and co-authors make a case for why we should pay close attention to it in their new book ๐‘๐‘œ๐‘–๐‘ ๐‘’: ๐ด ๐น๐‘™๐‘Ž๐‘ค ๐‘–๐‘› ๐ป๐‘ข๐‘š๐‘Ž๐‘› ๐ฝ๐‘ข๐‘‘๐‘”๐‘’๐‘š๐‘’๐‘›๐‘ก. It has some compelling stories to underpin how widespread the problem is in business and government with succinct illustrations. For instance, I love the target illustration and the error decompositions.

๐Ÿ“ข The book covers group dynamics such as information cascades, social pressure, group polarization as amplifiers of noise, and some cognitive #biases to boot. Lastly, it outlines noise mitigation strategies with decision hygiene, decision observers, and noise audits, which were BY FAR the biggest takeaways for me.

๐Ÿ˜’ However, if you are already familiar with the topic, the book will likely disappoint (at least a little). It can feel very repetitive and not getting into enough depth, and its entanglement with bias means it keeps referring to concepts covered in ๐‘‡โ„Ž๐‘–๐‘›๐‘˜๐‘–๐‘›๐‘” ๐น๐‘Ž๐‘ ๐‘ก ๐‘Ž๐‘›๐‘‘ ๐‘†๐‘™๐‘œ๐‘ค, as it was some long-lost final chapter. I still enjoyed it, regardless.

Have you read it? Do you want to?

Opinion: Chance & Decision-Making

๐ŸŽฒ ๐‚๐ก๐š๐ง๐œ๐ž & ๐Ÿ“ˆ ๐ƒ๐ž๐œ๐ข๐ฌ๐ข๐จ๐ง-๐Œ๐š๐ค๐ข๐ง๐  โ€” Long before I called myself a data scientist, I helped build a backend website for sports betting on my very first job. After that, for about a decade, I improved the user experience for gambling sites of all kinds. As it turns out, the first data I engaged with professionally taught me a lot about human nature.

Throughout human history, we have been fascinated with chance. The first known tools used to this end were knucklebones in ancient Sumer, either for fortune-telling or games of chance. Better tools have been invented since, like dice, playing cards, and more recently, random number generators (RNGs). However, now we wield randomness for business/scientific purposes and not just mysticism/entertainment. In fact, the most powerful #MachineLearning methods depend on RNGs.

Above picture by: MET Museum (1 & 2) and World of Playing Cards, Featured image by: Simon Cockell

I recently read the book ๐˜›๐˜ฉ๐˜ฆ ๐˜‹๐˜ณ๐˜ถ๐˜ฏ๐˜ฌ๐˜ข๐˜ณ๐˜ฅ๐˜ด ๐˜ž๐˜ข๐˜ญ๐˜ฌ, which made me reflect on what drew me to the discipline. We are surrounded by randomness, but humans want to be in control, often attributing skill to successful random events (ษขแด€แดส™สŸแด‡ส€'๊œฑ ๊œฐแด€สŸสŸแด€แด„ส), and lack thereof otherwise. #Data can improve decisions by separating the signal from the noise and tracing outcomes to plausible causes. This possibility is what inspires my journey! What's yours?

Opinion: Why is ending zoom meetings so awkward?

๐—ช๐—ต๐˜† ๐—ถ๐˜€ ๐—ฒ๐—ป๐—ฑ๐—ถ๐—ป๐—ด ๐˜‡๐—ผ๐—ผ๐—บ ๐—บ๐—ฒ๐—ฒ๐˜๐—ถ๐—ป๐—ด๐˜€ ๐™จ๐™ค ๐™–๐™ฌ๐™ ๐™ฌ๐™–๐™ง๐™™ ? First, you say bye and, during what seems like an eternity, have to gracelessly stare at the host and any remaining attendees as you all fumble around clicking the end meeting button or keyboard shortcut!

Above picture by: itsbenlee, Featured image by: Deror Avi

I realize there are more pressing issues to solve with #machinelearning but can't Zoom come up with a gesture or voice-activated feature to stop the meeting as soon as it's over to spare introverts like me from those clumsy moments. Does this happen to you? If so, what do you think should prompt the ending?

  • ๐Ÿ‘‹๐Ÿผ A wave gesture?
  • ๐Ÿ’ฌ The words "zoom end"?
  • โฌ…๏ธ A slide left gesture?
  • ๐Ÿคท๐Ÿฝ None: Suck it up!

Causal Explanations of ML Models through Counterfactuals

We often hear "๐™˜๐™ค๐™ง๐™ง๐™š๐™ก๐™–๐™ฉ๐™ž๐™ค๐™ฃ ๐™™๐™ค๐™š๐™จ ๐™ฃ๐™ค๐™ฉ ๐™ž๐™ข๐™ฅ๐™ก๐™ฎ ๐™˜๐™–๐™ช๐™จ๐™–๐™ฉ๐™ž๐™ค๐™ฃ". And when working with data, it's easy to fall into this trap! Even aided by domain knowledge and complex models, it's often tough to disentangle both.

๐Ÿ“ˆ I'm an advocate of ๐ˆ๐ง๐ญ๐ž๐ซ๐ฉ๐ซ๐ž๐ญ๐š๐›๐ฅ๐ž ๐Œ๐š๐œ๐ก๐ข๐ง๐ž ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  (also known as XAI) because using approximations it can help understand models. However, I must accept that the most popular XAI methods rely on correlations, which is a significant limitation.

๐Ÿ”„ The solution to this problem is ๐œ๐š๐ฎ๐ฌ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ which yields a causal explanation rather than a correlation-based one. The authors of a recent paper (Leon Chou, Catarina Moreira, Peter Bruza, Chun Ouyang, and Joaquim Jorge), propose counterfactuals as a means to provide causability.

๐Ÿค” Counterfactuals are a good fit because they ask the question, "๐™ฌ๐™๐™–๐™ฉ ๐™ž๐™›?" which comes naturally to us humans and, given some properties, serves as a satisfactory causal explanation. There's a family of counterfactual methods that meet many of the properties. But, unfortunately, in recent years they have been overshadowed by other XAI methods.


The authors of the paper performed a topic modeling and word co-occurrence analysis on academic research since 2012. It shows nodes for each keyword where size denotes frequency and color the most popular year. While it's good news that discussion has evolved from machine-centric topics such as pattern recognition to more human-centric such as XAI (see 1st figure), there are clear research gaps between causality and XAI - not to mention counterfactuals and causality (see 2nd figure). Check out their amazing paper for more details. Featured image by: Michal Jarmoluk from Pixabay