(Photo by Joel Dunn on Unsplash )
Someone recently brought up the Enron E-Mails dataset in conversation. This was a collection of ~600k (!) e-mails collected as part of the Enron investigation in the early 2000s.
Working through the Enron dataset way back when was solid practice in data munging, NLP, network analysis, the whole lot.
For me, the greatest and most lasting lesson was:
Real-World Data Is Messy
You can't get to the "fun" analysis till you've sorted that out.
From that we get two important sub-lessons:
1/ it's important to get to know a dataset (actually sift through it) before you try to use it
2/ thank your data engineers
For more details on the dataset, you can check out its Wikipedia entry and a Technology Review article from 2013.
A security professional's take on LLM risks
Someone who attended the BlackHat conference left with concerns about LLMs
Popular versus useful
It helps to know the difference