Text Analytics and Machine Learning Workshop
Friday, October 26, 2018
CPA Canada, 277 Wellington Street West, Toronto, Canada
9:00 am – 5:00 pm
Register now to avoid disappointment. Space is limited to 50 participants.
CPD Certificates will be issued.
Registration fee of $600 (+HST) includes all materials, lunch and refreshments.
The explosion in machine-readable text, coupled with developments in textual analysis methods and tools, has expanded possibilities for applying new analysis techniques for advisory, tax and assurance services. In addition to simpler text mining tools, machine learning has been added to the tools available for addressing topics of interest to accounting professionals. However, the applications of text analytics and machine learning are still not widely covered. This professional development workshop is an opportunity to learn about textual analysis and machine learning applications to meet a variety of objectives.
The aims of this workshop will be to highlight the use of both simple and complex text analytics approaches for addressing a variety of issues of interest to workshop participants as described in more detail below.
Format/Structure of the Workshop
A combination of presentations by workshop leaders, table group exercises and open discussion will be used. Participants will need to bring laptops or other devices suitable for downloading and reading web-based content.
The workshop leaders are involved in exploring and applying text analytics and machine learning techniques in both academic and consulting settings. Thus, they can provide extensive and up to date coverage of these topics.
What’s it all about? This workshop will cover both simple and complex text analytics approaches for addressing a variety of issues of interest to workshop participants; for example: word search; term frequency and concordance reporting; entity recognition and extraction; document similarity measurement; dictionary-based classification; unsupervised clustering classification; supervised machine learning classification; topic modelling; and sentiment analysis.
Who should attend? This workshop would be useful to practitioners who are interested in developing services using text analytics and machine learning in their areas of interest.
Learning objectives
After completing this workshop an attendee will be able to:
- Describe the purpose of different text analytic techniques
- Explain procedures commonly used in textual analysis that are not used in structured data analysis
- Ask appropriate questions when evaluating text analytic tools and services in the rapidly changing landscape of text analytics that includes cloud services
- Ask appropriate questions when hiring/acquiring text analytic talent
- Evaluate textual analysis results
- Consider privacy implications and restrictions on gathering text data
Program Overview
Time |
Topic |
Speaker/Instructor |
Coverage |
8:30 |
Coffee, Welcome |
Efrim Boritz, UWCISA |
|
9:00 |
Introduction to Text Analytics and Machine Learning |
Louise Hayes, University of Guelph |
What text analytics can and can’t do; overview of other sessions |
9:30 |
The Evolution of Accounting Skills: An Introduction to Topic Modeling (Part 1) |
Andy Bauer, University of Waterloo |
Topic modelling - how to classify text-based narratives. |
10:00 |
Coffee |
||
10:30 |
The Evolution of Accounting Skills: An Introduction to Topic Modeling (Part 2) |
Andy Bauer, University of Waterloo |
Comparing manual and computer-based approaches. |
11:00 |
Text Extraction, Textual Analysis, and Model Building Workshop/Demo; SeekiNF by SeekEdgar, LLC |
Raj Srivastava, Professor Emeritus, University of Kansas; and CEO, SeekEdgar, LLC |
Fraud risk assessment using text analytics:
|
12:30 |
Lunch |
||
1:30 |
Business Intelligence Workshop |
Theo Stratopoulos, University of Waterloo |
Business Intelligence through text analytics using R and SQL |
2:30 |
Coffee |
||
3:00 |
Machine Learning Workshop |
Louise Hayes, University of Guelph |
Classifying textual narratives with Naïve Bayes and other tools. |
4:30 |
Wrap-up |
Efrim Boritz, UWCISA |
Advance Preparation & Workshop Materials
Introduction to Text Analytics and Machine Learning
Introduction to Text Analytics and Machine Learning PowerPoint Presentation
For the Introduction to Topic Modeling sessions, you will need to download and read materials at
Pre-Workshop Readings
For the Fraud Risk Assessment using Text Analytics session, there is no specific preparation; however, the instructor has provided a newsletter with links to materials that you may wish to review in advance of the session.
SeekEdgar August Newsletter
SeekEdgar User Guide
Text Extraction, Textual Analysis, and Model Building Workshop PowerPoint Presentation
For the Business Intelligence session, please read advance preparation instructions available at
“Business Intelligence” How to Prepare for the Workshop
For the Machine Learning session, we will be using both WordStat 8 and RapidMiner Studio. This hands-on machine learning session will demonstrate steps in the machine learning process including data preparation, results visualization, model validation and optimization. We will begin by using WordStat 8, a content analysis software that processes large amounts of text data, to explore (both graphically and numerically) the term frequencies and concordances of words in the announcements of financial restatements. Next, we will structure these words for input into RapidMiner Studio, a data science platform used for machine learning. Finally, using supervised machine learning, we will develop and validate several models that may be used to "predict" the classification of financial restatements based on their announcement text.
Please install the trialware versions (30 day limit for each) of WordStat 8 and RapidMiner Studio and save this "sample.csv" data to your Desktop before attending the workshop.
WordStat 8: https://provalisresearch.com/products/content-analysis-software/
- click the "download trial" link at the bottom of the webpage
- the download page that opens has a link at the bottom to additional information for MAC users: https://provalisresearch.com/products/simstat/simstat-technical-information/mac-os/
RapidMiner Studio: https://rapidminer.com/get-started/
Machine Learning Hands-on Session PowerPoint Presentation
Workshop Instructors:
Andrew Bauer Andy’s recent research and general interests are being shaped by data analytics. This includes completion of online specializations in computer science and integration of specific data analytics and machine learning techniques in his research. He has given presentations on data analytics topics at the American Taxation Association Midyear Meeting, the Accounting IS Big Data Conference, and the Financial Accounting and Reporting Section Midyear Meeting.
J. Efrim Boritz, Professor, University of Waterloo and Member of the AICPA Trust Information Integrity Task Force and Data Analytics Task Force. He has extensive experience with data analytics in audit and forensic contexts as well as machine learning in fraud detection and bankruptcy prediction contexts.
Louise Hayes Louise teaches accounting, IT audit and data analytics at the University of Guelph. She uses text analytics and machine learning in research on internal control and financial reporting quality.
Rajendra P. Srivastava Raj is currently Emeritus Professor and formerly the EY Professor of Accounting & Information Systems and Director of EY Center for Auditing Research and Advanced Technology at the School of Business, University of Kansas. His research has resulted in patentable ideas. FRAANK and SeekiNF are the two such technologies. Please see the website https://www.seekedgar.com for more details on these technologies.
Theo Stratopoulos Theo’s teaching and research focus is on the economics of information technology (IT). He is currently working on projects related to emerging technology adoption, blockchain, and IT budgets. He has consulted on data analytics (DA) projects for large US firms and has led training workshops for accounting professionals on DA. Recently, Theo has presented at the Accounting Information Systems (AIS) mid-Year meeting (January 2018), AIS boot-camp (May 2018), CFA Society - Toronto (May 2018), CPA Ontario (June 2018), the AAA - Intensive Data Analytics Workshop (June 2018), the CAAA annual meeting (June 2018), and the AAA Annual meeting (August 2018).