Weekly recap: 2023-04-23

What you see here is the last week’s worth of links and quips I have shared on LinkedIn, from Monday through Sunday.

For now I’ll post the notes as they appeared on LinkedIn, including hashtags and sentence fragments. Over time I might expand on these thoughts as they land here on my blog.

2023/04/17: Short run versus long run

As I’ve noted before:

In the short run, it’s all about the technology.

In the long run, it’s about policy, regulation, insurance, and all of the non-technical issues that crop up when a new technology encounters age-old institutions.

“Who Owns a Song Created by A.I.?” (New York Times)

2023/04/17: Why I keep talking about topics “unrelated” to AI

A lot of my work fits squarely in the “AI” space. So why do I keep talking about metrics, analytics, business models, risk, and other topics that seem unrelated to AI and ML modeling?

It’s because** these subjects are indeed related! They are cornerstones of effective AI:**

Analytics? That tells you a ton about your data, which you’ll need if you plan to build or use AI solutions.

**Metrics? **You will need to apply metrics to your ML models (and the processes they impact) to gauge what they’re actually doing for the business.

Business models? Understanding what your business actually does will inform where AI can actually help.

Risk? Any AI solution carries a certain risk, including “buried-in-the-balance-sheet intangibles” like reputation risk. You really want to know how that AI-based product could bite you before you release it.

Sum total: “doing AI” in a company involves so much more than just AI technology.

If you’re only thinking about the tools and techniques … well … best of luck.

2023/04/17: LLM-as-a-Service

Amazon Web Services is getting into the generative AI / LLM chatbot game, but more as a gateway to other companies’ LLMs.

“IA : Amazon entre dans la course avec « Bedrock »” (Les Echos)

The advertising space is showing a lot of interest in LLMs as a way to write ad copy. Such as the example described here:

C’est surtout dans le marketing que l’IA générative pourrait faire fureur. Un exemple a d’ailleurs été donné par Amazon : « Imaginez qu’un responsable du marketing souhaite développer une nouvelle campagne publicitaire pour une gamme de sacs à main. Pour cela, il va fournir à Bedrock quelques exemples de leurs slogans les plus performants et des campagnes précédentes, ainsi que les descriptions de produits associés. A partir de là, Bedrock pourra générer automatiquement des publications efficaces pour les réseaux sociaux et des annonces publicitaires et des textes web sur chaque produit ».

2023/04/18: blog post on data scientist communication

Data scientists: you’re often told to “use less technical jargon” when speaking with stakeholders. But that’s just one step. Here are three more ways to improve how you communicate with people outside of your team.

blog post: “Three ways for data scientists to improve their communication with stakeholders”

2023/04/20: Not the end of faking it

This will most certainly not be the end of “faking it” in Silicon Valley … but perhaps there will be a brief pause? And/or more rigorous due diligence going forward?

“The End of Faking It in Silicon Valley” (New York Times)

“[Investors] want to tighten up the protocols around how they’re assessing founders,” Ms. Abrahams said. “We had a series of events which should be prompting reflections.”

Start-ups have many of the conditions most associated with fraud, Mr. Dyck said. They tend to employ novel business models, their founders often have significant control and their backers do not always enforce strict oversight. It is a situation that’s ripe for bending the rules when a downturn hits. “It’s not surprising we’re seeing a lot of frauds being committed in the last 18 months are coming to light right now,” he said.

2023/04/21: Anomalies, lurking in the aggregates

When it comes to data analysis, It’s not just a matter of what you measure, but how you measure it.

You’ve probably seen this story making the rounds:

“A 50-Mile Race, a Quick Car Ride and a Scandal at the Finish Line” (New York Times)

There’s a lot to be said here, but I’ll focus on the data aspect.

According to reports, the runner:

… had completed one mile of the Manchester to Liverpool race on April 7 in 1 minute 40 seconds, a split much more likely to be posted by a late-model sedan than by a 47-year-old human being on two legs.

In other words: race officials weren’t just measuring runners’ overall course times; they were also able to track point-to-point times. One segment’s time simply didn’t make sense, and that opened the door to an investigation.

The data lesson here is to think about when to roll-up a calculation versus when to break it out.

Sometimes those aggregate statistics – sums, means, medians – will tell you enough of what you want to know. But they can also obscure important information that point-to-point checks, sliding-window functions, and such will reveal.

“Transit time,” “transaction amount,” and “data processing throughput” all come to mind here.

What are some other figures where anomalies can hide?