Text Analytics and Machine Learning Workshop

UW CISA logo

Text Analytics and Machine Learning Workshop

Friday, October 26, 2018
CPA Canada, 277 Wellington Street West, Toronto, Canada
9:00 am – 5:00 pm
Register now to avoid disappointment. Space is limited to 50 participants.
CPD Certificates will be issued.
Registration fee of $600 (+HST) includes all materials, lunch and refreshments.

The explosion in machine-readable text, coupled with developments in textual analysis methods and tools, has expanded possibilities for applying new analysis techniques for advisory, tax and assurance services. In addition to simpler text mining tools, machine learning has been added to the tools available for addressing topics of interest to accounting professionals. However, the applications of text analytics and machine learning are still not widely covered. This professional development workshop is an opportunity to learn about textual analysis and machine learning applications to meet a variety of objectives.

The aims of this workshop will be to highlight the use of both simple and complex text analytics approaches for addressing a variety of issues of interest to workshop participants as described in more detail below.

Format/Structure of the Workshop

A combination of presentations by workshop leaders, table group exercises and open discussion will be used. Participants will need to bring laptops or other devices suitable for downloading and reading web-based content.

The workshop leaders are involved in exploring and applying text analytics and machine learning techniques in both academic and consulting settings. Thus, they can provide extensive and up to date coverage of these topics.

What’s it all about? This workshop will cover both simple and complex text analytics approaches for addressing a variety of issues of interest to workshop participants; for example: word search; term frequency and concordance reporting; entity recognition and extraction; document similarity measurement; dictionary-based classification; unsupervised clustering classification; supervised machine learning classification; topic modelling; and sentiment analysis.

Who should attend? This workshop would be useful to practitioners who are interested in developing services using text analytics and machine learning in their areas of interest.

Learning objectives

After completing this workshop an attendee will be able to:

  • Describe the purpose of different text analytic techniques  
  • Explain procedures commonly used in textual analysis that are not used in structured data analysis
  • Ask appropriate questions when evaluating text analytic tools and services in the rapidly changing landscape of text analytics that includes cloud services
  • Ask appropriate questions when hiring/acquiring text analytic talent
  • Evaluate textual analysis results
  • Consider privacy implications and restrictions on gathering text data

Program Overview

Time

Topic

Speaker/Instructor

Coverage

8:30

Coffee, Welcome

Efrim Boritz, UWCISA

 

9:00

Introduction to Text Analytics and Machine Learning
 

Louise Hayes, University of Guelph

What text analytics can and can’t do; overview of other sessions

9:30

The Evolution of Accounting Skills: An Introduction to Topic Modeling (Part 1)

Andy Bauer, University of Waterloo

Topic modelling - how to classify text-based narratives.

10:00

Coffee

   

10:30

The Evolution of Accounting Skills: An Introduction to Topic Modeling (Part 2)

Andy Bauer, University of Waterloo

Comparing manual and computer-based approaches.

11:00

Text Extraction, Textual Analysis, and Model Building Workshop/Demo; SeekiNF by SeekEdgar, LLC

Raj Srivastava, Professor Emeritus, University of Kansas; and CEO, SeekEdgar, LLC

Fraud risk assessment using text analytics:

  • Readability Indices
  • Cosine similarity measure;
  • Vector space model;
  • Word variation over time;
  • Sentiment analysis

12:30

Lunch

   

1:30

Business Intelligence Workshop

Theo Stratopoulos, University of Waterloo

Business Intelligence through text analytics using R and SQL

2:30

Coffee

   

3:00

Machine Learning Workshop

Louise Hayes, University of Guelph

Classifying textual narratives with Naïve Bayes and other tools.

4:30

Wrap-up

Efrim Boritz, UWCISA

 

Advance Preparation & Workshop Materials

Introduction to Text Analytics and Machine Learning
Introduction to Text Analytics and Machine Learning PowerPoint Presentation

For the Introduction to Topic Modeling sessions, you will need to download and read materials at 
Overview & Pre-Workshop Instructions
Pre-Workshop Readings

For the Fraud Risk Assessment using Text Analytics session, there is no specific preparation; however, the instructor has provided a newsletter with links to materials that you may wish to review in advance of the session.
SeekEdgar August Newsletter
SeekEdgar User Guide
Text Extraction, Textual Analysis, and Model Building Workshop PowerPoint Presentation

For the Business Intelligence session, please read advance preparation instructions available at
“Business Intelligence” How to Prepare for the Workshop

For the Machine Learning session, we will be using both WordStat 8 and RapidMiner Studio. This hands-on machine learning session will demonstrate steps in the machine learning process including data preparation, results visualization, model validation and optimization.  We will begin by using WordStat 8, a content analysis software that processes large amounts of text data, to explore (both graphically and numerically) the term frequencies and concordances of words in the announcements of financial restatements. Next, we will structure these words for input into RapidMiner Studio, a data science platform used for machine learning. Finally, using supervised machine learning, we will develop and validate several models that may be used to "predict" the classification of financial restatements based on their announcement text.

Please install the trialware versions (30 day limit for each) of WordStat 8 and RapidMiner Studio and save this "sample.csv" data to your Desktop before attending the workshop. 

WordStat 8:  https://provalisresearch.com/products/content-analysis-software/ 

RapidMiner Studio: https://rapidminer.com/get-started/

Machine Learning Hands-on Session PowerPoint Presentation

Workshop Instructors:

 Andrew Bauer headshot

Andrew Bauer Andy’s recent research and general interests are being shaped by data analytics. This includes completion of online specializations in computer science and integration of specific data analytics and machine learning techniques in his research. He has given presentations on data analytics topics at the American Taxation Association Midyear Meeting, the Accounting IS Big Data Conference, and the Financial Accounting and Reporting Section Midyear Meeting.

J. Efrim Boritz photo

J. Efrim Boritz, Professor, University of Waterloo and Member of the AICPA Trust Information Integrity Task Force and Data Analytics Task Force. He has extensive experience with data analytics in audit and forensic contexts as well as machine learning in fraud detection and bankruptcy prediction contexts.

Louise Hayes headshot

Louise Hayes Louise teaches accounting, IT audit and data analytics at the University of Guelph. She uses text analytics and machine learning in research on internal control and financial reporting quality.

Rjendra Srivastava

Rajendra P. Srivastava Raj is currently Emeritus Professor and formerly the EY Professor of Accounting & Information Systems and Director of EY Center for Auditing Research and Advanced Technology at the School of Business, University of Kansas. His research has resulted in patentable ideas. FRAANK and SeekiNF are the two such technologies. Please see the website https://www.seekedgar.com for more details on these technologies.

Theo Stratopoulos

Theo Stratopoulos Theo’s teaching and research focus is on the economics of information technology (IT). He is currently working on projects related to emerging technology adoption, blockchain, and IT budgets. He has consulted on data analytics (DA) projects for large US firms and has led training workshops for accounting professionals on DA. Recently, Theo has presented at the Accounting Information Systems (AIS) mid-Year meeting (January 2018), AIS boot-camp (May 2018), CFA Society - Toronto (May 2018), CPA Ontario (June 2018), the AAA - Intensive Data Analytics Workshop (June 2018), the CAAA annual meeting (June 2018), and the AAA Annual meeting (August 2018).