2 Day – Python Text Analysis

A 2-day introduction to Python text processing that covers running external processes, accessing databases, handling XML data, reading web pages, and accessing JSON web services. The focus is on developing applications that can retrieve, manipulate and store data using common data formats and repositories.

Prerequisites

This course is designed for programmers who have practical programming experience using Python with and understanding of tuples, lists, dictionaries, list comprehension, classes, exception handling, file I/O and module structure.

Learning Objectives

The course will help developers be aware of more complex features of Python and commonly used modules, including:

  • Developing scripts for use in a Linux or Windows environment
  • Working with files and external commands
  • Manipulating data using regular expressions
  • Using the database API
  • Reading and writing XML files
  • Reading web pages
  • Accessing JSON web services

Delivery

This course comprises a mix of theory, demonstrations and hands on exercises. Approximately 50% of the time is hands-on. This course can be delivered using python 2.7 or Python 3.

Day 1 – Processing text data

  • Writing applications
  • Working with the filesystem
  • Running O/S commands
  • Redirecting sub process I/O streams
  • Regular expression pattern matching
  • Repetitions and capture groups
  • Pattern substitution

Day 2 Handling Data Formats

  • Using the DBI database API
  • SQL replaceable parameters
  • Stored procedures and transactions
  • Parsing XML using ElementTree
  • Creating XML files
  • Reading data from a web page URL
  • Working with JSON web services