The following is a list of selected talks that I have delivered across industry conferences, private speaking engagements, academic guest lectures, training sessions, and similar events.
My talks cover a variety of topics, but the unifying theme is solving business problems through practical application of data and technology.
Following up on the February webinar, "ChatGPT In Education," I met with Michael Manley (CTO, ThinkCERCA) and Daniel Rivera (Technology Director for First District RESA) to talk about AI in education.
Generative AI is poised to be around for a while, so it's in educators' best interests to put it to good use. Our talk explored new developments in the generative AI landscape, and offered five ways teachers and students can responsibly use generative AI in classroom settings.
Almost overnight, the name "ChatGPT" seems to be everywhere. Now that people have a tool that can generate reams of human-readable (though, not always correct or appropriate) text on-demand, what does that mean for educators? What's a teacher to do when students can have a machine write their essays for them?
I delivered this talk with ThinkCERCA CTO Michael Manley. We explained what AI really is, how generative AI models work, and what ChatGPT can(not) do. We wrap up with three things educators can do right now when it comes to ChatGPT, plus three patterns to avoid.
The field we now call "AI" started around 2009, when predictive analytics and Big Data started getting traction. More than a decade later, the field has certainly changed: we have new tools and new business use cases for this powerful technology. Then again, it's very much the same: it is still all about, as I like to say, analyzing data for fun and profit.
This talk explores what we can expect for the next decade of AI.
(Some of the themes therein stem from the "Rebranding Data" article I published on O'Reilly Radar.)
This is the second talk in a series for technical leaders, to help them understand the work their company's data scientists do and how to make the company more effective with ML/AI.
The tools for ML/AI continue to improve year after year. Companies are able to derive even greater benefit from this technology as a result. Automating machine learning (autoML) tools stand to reduce the time, effort, and money involved in developing ML models. They may also lead data scientists to question their purpose, and perhaps even their employment.
In this talk, I explore why autoML is such a powerful opportunity, describe the current state of the tools available, and explain how to broach this topic with your company's data scientists and machine learning engineers.
(Some of the ideas in this talk relate to the article "Automating the Automators: Shift Change in the Robot Factory" which I published on O'Reilly Radar.)
As a CTO or Director of Engineering, you're in an interesting spot: either you're getting pressure from stakeholders to "do AI" (whatever that means) or you're trying to make sense of what your data science team is asking of you. And then there are the incessant calls from vendors who just swear that their latest AI-flavored offering will cure all that ails you. For a hefty fee, of course.
AI is a big field and there's a lot to learn. The first step to making sense of all this is learning how to talk the talk. In this talk, I explain what AI really is, how to guide your stakeholders down the road to effective AI, and how to "speak machine learning" with your data scientists. You'll also get better at evaluating the vendor pitches.
The COVID-19 pandemic has induced stress on, among other things, business budgets. Companies are now reviewing every department in search of ways to trim spending. Does this mean the end of data science/ML/AI's heyday of no-accountability, no-questions-asked money? If so, how do we recover?
In this talk, I explore the data world's coming scrutiny through the lens of financial risk, stock market bubbles, and gray rhinos.
Time series analysis is a branch of mathematical modeling that focuses on time-based phenomena. This is a cornerstone of electronic trading, as market data is expressed as time series of prices.
In this lecture, I explored the theory behind the when/why of univariate time series modeling (AR, MA, ARIMA), and used live code examples to demonstrate the what/how of the model-evaluate cycle (pandas, statsmodels).
It's easy to think that being a software developer is all about writing code. That's the focus of most computer science courses, and what's most often mentioned in media coverage of the profession.
In reality, writing code is only a small part of being a successful and effective software developer. In this lecture, I walked students through several concrete, straightforward measures – "cheat codes," as I called them – they could take to make the most of their career once they graduate.
You're a new analyst in the firm, and someone has just handed you a pile of data on which to build your models. What do you do?
In this lecture, I shared a step-by-step approach on how to explore a new, unfamiliar dataset while explaining the realities of the troublesome data they would see in the workplace. (This was a live demo, using Python and Jupyter Notebook.)
As bonus material, I demonstrated how they could generate their own data for model testing.
The business world is sold on the promise of data: whether you call it big data, data science, machine learning, or anything else, everyone seems hell-bent on using data to transform and improve their company. Once you push away the press hype and vendor pitches, though, there is little practical advice on how to get started. It's no wonder that so many companies' data efforts start off on the wrong path and ultimately derail with no return on investment.
The promise of data is indeed real. It doesn't start with the alphabet soup of terminology, but with you, the leadership: you need to understand what is possible with data and develop a realistic plan. In this talk we'll explore what "data science" really means, the people and processes you'll need to make it work for you, and ways it can get you into hot water. You'll get practical guidance from an industry veteran who has seen what works and what doesn't.
Scan any number of financial industry news publications, and stories regarding Wall Street’s hunger for new data sets to improve alpha abound. While this might seem like a new trend, monetizing data - or the ability to turn corporate data into revenue streams - has existed for decades. But both the supply side and the demand side have changed. On the supply side, the extreme variety of data that now exists (location-based, geospatial, socio-demographic, online search trends, pricing, etc.) combines with high computing power and new digital requirements to create a fertile data market environment. On the demand side, to remain competitive, companies in a wide variety of industries, not just the financial sector, are leveraging data in all forms to maintain an edge or be disruptive.
During this session we’ll explore what data monetization is and the forms it can take; characteristics of data that could make it more valuable to external parties; and key considerations in making data products available to external parties. Intellectual property, data privacy, and contractual issues will also be explored.
I met up with Miguel Molina-Cosculluela, founder of Analytikus, to discuss the state of data science, how companies can make smart first steps in data, and how the field may change over time.
At this private event, my co-presenter and I provided more than 100 executives with practical, industry-specific, actionable guidance on how to use data to improve their business standing. Our data maturity model served as a foundation for the talk, which explored business data initiatives from traditional business intelligence (BI) all the way through advanced reinforcement learning.
As moderator, I led the discussion of four panelists to discuss business, social, and technological concerns of using Internet of Things (IoT) data in the enterprise. Panelists included representatives from the University of Chicago's department of Computational Analysis and Public Policy, Zebra Technologies, IBM, and the Digital Manufacturing & Design Innovation Institute (DMDII).
Hadoop is a powerful tool for large-scale data analysis. That power comes with a hefty price tag: the cost of building and maintaining the underlying compute cluster can hinder Hadoop adoption in small and large firms alike.
In this talk, I'll explain how to put Hadoop to work for you, and how to use Elastic MapReduce (EMR), the hosted Hadoop solution provided by Amazon Web Services. Learn how EMR can help you get Hadoop in a hurry and on the cheap, without the costly cluster commitment.
Also known as: Busting Myths About Building Your Analytics Shop.
Whether you call it "Big Data," "data science," or simply, "analytics," the field has quickly become an integral part of business. There is plenty of technical guidance for the hands-on practitioner, but people in leadership roles -- those who are responsible for setting a company's direction and aligning analytics to business goals -- are left with scant help beyond vendor
I delivered this talk with Brett Goldstein, Chicago's first Chief Data Officer. We framed it as an exercise in busting myths of building analytics practice.
One can draw several parallels between today's data analysis fever and the IT boom of the 1990s. How do we learn from the IT fallout, to reap our rewards yet avoid the pitfalls?
This was built on the "R+Hadoop" talk, described below, but with a focus on when to apply that strategy to the data analytics pipeline.
As an analytics tool, R strains under modern large-scale datasets. People have devised a number of ways to help R function in the big-data arena, one of which is to drive it with Hadoop. But how does this work, and when is it an appropriate solution? This talk will describe the what and the how of mixing R and Hadoop, and more importantly, the when and why of this approach.
I presented this talk with Brett Goldstein, Chicago's first chief data officer (CDO). We had collaborated on some social media analysis for civic good, and delivered a talk to explore what we learned.
People often limit the definition of "bad data" to missing values or difficult formats. I say that "bad data" is so much more: it is any data that gets in the way. This can involve missing or hard-to-access datasets, inconsistencies, unexpected modifications in upstream data sources, and so on. If it derails your analysis effort, then it's bad data in my book.
Dealing with Bad Data explores this notion of "bad data," how it impacts an analysis effort, and how to address the problems it causes.
Quite a bit of library science involves categorizing and indexing content, to make it easy to find later, and librarians have been doing this since long before "Big Data" was a buzzword. That puts librarians among the world's first data miners.
Technology advances have changed the types and volume of content librarians manage, as well as the expectations of information-seekers. Similarly, technology will play a strong role in the evolution of library science. In this talk, I explained how natural langauge processing (NLP) and machine learning would influence the next phase of how librarians classify and index content.
The formal title for this talk is, A Bit of R & Hadoop: Getting R to Dance With the Elephant. It is sometimes listed under other titles, such as Mixing R and Hadoop: Large-Scale Data Analysis and Computations, though people mostly know it as my "R+Hadoop talk."
In short: R is quite a useful tool, but it strains under new-age Big Data problems. One solution is to use Hadoop's scalable, parallel computing framework to drive R. This talk explores the what, how, and why of getting R to dance with the elephant. It starts with an introduction to Hadoop/MapReduce, and wraps up with some tools to use Hadoop through R.
This presentation's formal title is, Elephants in the Cloud, on the Cheap. People sometimes refer to is as my "EMR talk."
Many companies want Hadoop's power, but the cost of an on-site, self-managed cluster can be quite a shock. Amazon Web Services offers Elastic MapReduce (EMR), a hosted, on-demand form of Hadoop as an alternative to an on-site cluster. This session explains MapReduce concepts, Hadoop, and EMR, in terms of both theory and hands-on practical guidance.