Dissertation

Categorical Time Series Analysis and Applications in Statistical Quality Control

 
Dissertation’s
Reference List »

 

Dissertation
(Fakultät für Mathematik und Informatik der Universität Würzburg),
dissertation.de – Verlag,
Berlin, 2009.

552 Seiten, broschürt, € 63,00.
ISBN 978-3-86624-442-9

Christian H. Weiß

Categorical Time Series Analysis
and Applications in Statistical Quality Control

Dissertation

 

Summary

Categorical (nominal) time series occur in various fields of practice like computer science, biology, linguistics, and others. In spite of their practical relevance, there does not exist monographic literature covering the different aspects of categorical time series analysis, and the few published research articles on categorical time series appeared scattered over magazines from different scientific areas like statistics, computer science, biology and others. In fact, many of the standard tools of statistics cannot be applied to categorical time series: A repertoire of standard distributions does not exist, a visual analysis is problematic, not even elementary mathematical operations can be applied to categorical values. Typical techniques of cardinal time series analysis for seasonal adjustment or trend elimination cannot be used for categorical time series, it is not even clear how to define the terms `trend‘ or `season‘ in this case.
The present text attempts to fully describe the discipline of categorical time series analysis in the time domain. It is a compilation of known and new results, it integrates concepts from previously isolated research areas into an overall structure. Chapter I discusses approaches to an exploratory analysis of a given categorical time series. Approaches for sequence comparison, string matching and for detecting patterns and regularities in categorical time series are reviewed. A procedure for sequential pattern analysis, based on iterated function systems, can even be applied to visually mining patterns. Chapter II introduces basic concepts of categorical time series analysis in the time domain. Forms of a weak stationarity for categorical processes are proposed, which are of practical relevance for categorical time series analysis and modeling. They are also helpful to define measures of serial dependence, which are important to identify a suitable process model for a given categorical time series. Such models for categorical processes are discussed in chapter III. After a review of elementary process models of Bernoulli- and Markov-type, advanced models for categorical processes are discussed and analyzed in great detail. Also the special case of a binary process is considered. Chapter IV is centered around approaches towards a statistical analysis of categorical time series. Characteristic features of categorical processes like patterns or runs are investigated as well as models for time series of counts, which may arise from an appropriate transformation of a categorical process. Chapter V shows how the results of chapters I to IV can be applied to design approaches for controlling a categorical process. After having reviewed important concepts from statistical process control in general, approaches to monitor the marginal distribution of a categorical process, to monitor categorical features of the process like runs and patterns, and approaches to control a serially dependent process of counts are considered. Chapter VI demonstrates the practical relevance of the theory developed within this text by a number of real-data examples. These examples illustrate the different aspects of and approaches to categorical time series analysis.

Contents

  1. Introduction
     
    The Discipline of Categorical Time Series Analysis ♦ Modeling Categorical Processes ♦ Patterns in Categorical Time Series ♦ Visually Analyzing Categorical Time Series ♦ Statistical Control of Categorical Processes ♦ Contributions of this Work ♦ Structure of the Text
     
  2. Fields of Categorical Time Series Analysis
     
    Intrusion Detection ♦ Alarm Time Series in Telecommunication Networks ♦ Event Pattern Analysis in Automation ♦ Modeling Software Usage ♦ Manufacturing Event Management ♦ Biological Sequences ♦ Musical Analysis ♦ Speech and Text Recognition ♦ Part-of-Speech Tagging
     
    I. Exploratory Analysis of Categorical Time Series
      1. Strings in Categorical Time Series
         
        Categorical Time Series: Basic Terms and Notations ♦ Suffix Tries and Suffix Trees ♦ Similarities in the Range of a Categorical Time Series ♦ Comparing Strings and Categorical Time Series ♦ Expressing Similarity between Strings ♦ The Dot Plot – Visual Sequence Comparison ♦ String Matching
         
      2. Detecting Sequential Patterns
         
        KDD and Data Mining ♦ Association Rule Mining ♦ Sequential Pattern Analysis ♦ Sequential Pattern Analysis Based on Suffix Tries ♦ Finding Frequent Patterns in a Categorical Time Series ♦ Finding Rare Patterns in a Categorical Time Series
         
      3. Discovering Patterns in Categorical Time Series using IFS
         
        Iterated Function Systems ♦ Chaos Algorithm for Transformation of a Categorical Process ♦ Pattern Discovery based on Fractal Time Series ♦ Cube Transformations ♦ Circle Transformations ♦ Visual Tree Representations
         

II. Foundations of Categorical Time Series Analysis

      1. Statistical Properties of Categorical Random Variables
         
        Measures of Location and Dispersion ♦ Measures of Dependence ♦ Basic Concepts ♦ Proportional Reduction of Variation ♦ Measures based on Pearson’s Chi²-Statistic ♦ Further Measures of Dependence ♦ Signed Dependence
         
      2. Statistical Properties of Categorical Processes
         
        Stationarity of Categorical Processes ♦ Concepts of Stationarity ♦ The Rate Evolution Graph ♦ Measures of Serial Dependence ♦ Serial Dependence of Weakly Stationary Processes ♦ Time Series Bitmaps ♦ Periodicity of Categorical Processes ♦ Trends in Categorical Processes ♦ Categorical Features: Runs, Cycles, Patterns
         
      3. Elements of Categorical Time Series Analysis
         
        Sequential Estimation of Probabilities ♦ Marginal Probabilities ♦ Conditional Probabilities ♦ Smoothing Categorical Processes ♦ Transforming Categorical Processes ♦ Model Building: Identification, Estimation and Evaluation ♦ Model Identification ♦ Model Estimation ♦ Model Evaluation and Selection ♦ Forecasting Categorical Processes
         

III. Modeling Categorical Time Series

      1. Fundamental Models for Categorical Time Series
         
        Bernoulli and Markov Models ♦ Elementary Properties of Ordinary Markov Chains ♦ Asymptotic Properties of Ordinary Markov Chains ♦ Estimation of Parameters and Model Choice ♦ Variable Length Markov Models ♦ Probabilistic Suffix Trees ♦ Estimation of Parameters and Model Choice
         
      2. A Stochastic Model for Sequential Pattern Analysis
         
        Rule Generation in Sequential Pattern Analysis – An Introduction ♦ Rule Generation in Sequential Pattern Analysis – Implicit Assumptions ♦ A Simple Model for Sequential Pattern Analysis ♦ Model-Based Optimization of Sequential Pattern Analysis
         
      3. Advanced Models for Categorical Time Series
         
        Mixture Transition Distribution Models ♦ Properties of MTD(p) Models ♦ Model Choice and Estimation of Parameters ♦ A Generalization of MTD(p) Models ♦ Infinite-Memory MTD Models ♦ Discrete ARMA Models ♦ Definition and Interpretation ♦ Alternative Representations ♦ Markov Chain Representation of NDARMA Models ♦ Serial Dependence Structure ♦ Model Identification and Estimation ♦ Joint Distributions ♦ Predicting NDARMA Processes ♦ DAR(p) Models ♦ DAR(p) Processes as Markov Chains ♦ DMA(q) Models ♦ Generalized Choice Models ♦ The Backshift Process of NDARMA Models ♦ Generalized Choice Models: Definition and Properties ♦ Hidden Markov Models ♦ State Space Models ♦ Hidden Markov Models: Definition and Basic Properties ♦ Hidden Markov Models: Model Estimation ♦ Decoding the Hidden States ♦ Regression Models ♦ Introduction to Generalized Linear Models ♦ Regression Models for Time Series
         
      4. Models for Binary Time Series
         
        Basic Properties of Binary Processes ♦ Serial Dependence and Stationarity of Binary Processes ♦ Binarization of Categorical Processes ♦ Binary Markov Processes ♦ Properties of Binary Markov Chains ♦ The Markov Binomial Distribution ♦ Higher Order Binary Markov Processes ♦ An ARMA Model for Binary Processes ♦ The BinARMA(p, q) Model ♦ The BinAR(p) Model ♦ The BinMA(q) Model
         

IV. Statistical Analysis of Categorical Time Series

      1. Run Statistics for Categorical Processes
         
        Runs and Run Length Distributions ♦ Run Length Properties of Bernoulli and Markov Processes ♦ Run Length Properties of NDARMA Processes ♦ Run Length Properties of DAR(p) Processes ♦ Run Length Properties of DMA(q) Processes ♦ Cycles in Categorical Processes ♦ Pattern Histograms ♦ ARL Computation
         
      2. Patterns and Runs in Categorical Processes
         
        Patterns in Markov Processes ♦ Pattern Transitions ♦ Distribution of Pattern Counts ♦ Moments of Pattern Counts ♦ Counting Runs in Markov Processes ♦ Repeated Patterns
         
      3. Models for Time Series of Counts
         
        NDARMA Models for Counts ♦ Hidden Markov Models for Counts ♦ Regression Models for Counts ♦ The INGARCH Model ♦ Thinning Operations ♦ Binomial Thinning ♦ Hypergeometric Thinning ♦ Further Thinning Operations ♦ The INAR(1) Model ♦ Definition and Interpretation ♦ Basic Properties ♦ Regression Properties ♦ Model Estimation ♦ Joint Distributions ♦ The Binomial AR(1) Model ♦ BARMA Models for Binomial Marginals
         
      4. Advanced Integer-Valued ARMA Models
         
        INMA(q) Models ♦ Introduction to INMA(q) Models ♦ Overall Process Distribution ♦ INMA(q) – Independent Elements Model ♦ INMA(q) – Changing States Model ♦ INMA(q) – Lifetime Model ♦ INMA(q) – Sale Model ♦ INARMA Models with Poisson Marginals ♦ Poisson INMA(q) Model ♦ Poisson INAR(1) Model ♦ Jumps in Poisson INARMA Processes ♦ INAR(p) Models ♦ Introduction to INAR(p) Models ♦ INAR(p) – Moving Elements Model ♦ INAR(p) – Independent Reproductions Model ♦ INAR(p) – Lifetime Model ♦ Combined INAR(p) Models ♦ CINAR(p) – Identical Thinnings Model ♦ CINAR(p) – Independent Thinnings Model ♦ Binomial AR(p) Models ♦ Binomial AR(p) – Identical Thinnings Model ♦ Binomial AR(p) – Independent Thinnings Model
         

V. Controlling Categorical Processes

      1. An Introduction to Statistical Process Control
         
        Terms and Aims of Statistical Process Control ♦ A Brief Review of Standard Variables Control Charts ♦ Shewhart Control Charts ♦ Evaluating the Performance of Control Charts ♦ EWMA Control Charts ♦ CUSUM Control Charts ♦ Hotelling’s T2 Control Chart ♦ Basic Concepts for Monitoring a Categorical Process
         
      2. Monitoring the Marginal Distribution of a Categorical Process
         
        Monitoring Binary Processes ♦ A Review of Sampling Approaches ♦ Group Inspection of Dependent Binary Processes ♦ Continuous Control: A Moving Average Approach ♦ Continuous Control: An EWMA Approach ♦ Monitoring Categorical Processes ♦ Monitoring the Components of the Distribution p ♦ Monitoring a Summarizing Statistic
         
      3. Monitoring Runs and Patterns in Categorical Processes
         
        Monitoring Runs in Binary Processes ♦ Monitoring Runs in Independent Binary Processes ♦ Monitoring Runs in Dependent Binary Processes ♦ Monitoring Runs in Categorical Processes ♦ Monitoring Cycles in Categorical Processes ♦ Monitoring Patterns in Categorical Processes ♦ Monitoring the First Occurrence of a Critical Pattern ♦ Continuous Monitoring of Patterns
         
      4. Controlling Processes of Counts
         
        Controlling Processes of Independent Poisson Counts ♦ Controlling INAR(1) Processes ♦ Controlling INAR(1) Processes: Possible Approaches ♦ Controlling INAR(1) Processes: ARL Performance ♦ Controlling INAR(p) Processes ♦ Controlling INMA(q) Processes ♦ Controlling Binomial AR(1) Processes ♦ Controlling Binomial AR(1) Processes: Possible Approaches ♦ Controlling Binomial AR(1) Processes: ARL Performance
         

VI. Applications

      1. Analysis of Shakespeare Data
         
        Stationarity Analysis ♦ Analysis of Serial Dependence Structure ♦ Model-Based Analysis of Sequential Patterns
         
      2. Analysis of Diagnosis Data
         
        Process Control with Estimated p0 ♦ Process Control with Given p0 ♦ Process Control without Knowledge on p0
         
      3. Analysis of Log-in Data
         
        Model Building ♦ Checking Model Adequacy ♦ Control Charting in Phase II
         
      4. Analysis of Server Data
         
        Stationarity Analysis of Categorical Time Series ♦ Access Counts: Model Building ♦ Analysis of User Activity ♦ IP Counts per Minute: Model Building ♦ IP Counts within Periods of 2 Minutes Length ♦ Control Charting in Phase II
         

VII. Appendix: Statistical Foundations

    1. Probability Theory and Statistics: Basic Concepts
       
      Probabilities in Discrete Sample Spaces ♦ Random Variables ♦ Moments of Discrete Random Variables ♦ Generating Functions ♦ Stochastic Processes ♦ Likelihood Concepts ♦ Cardinal ARMA Models ♦ MA(q) Models ♦ AR(p) Models
       
    2. Popular Discrete Distributions
       
      The Binomial Distribution ♦ The Poisson Distribution ♦ The Negative Binomial Distribution ♦ The Hypergeometric Distribution ♦ The Multinomial Distribution ♦ Generalized Binomial Distributions ♦ Correlated Trials ♦ Varying the Probability of Success ♦ Varying the Number of Trials ♦ The Quasi-Binomial Distribution ♦ The Multivariate Poisson Distribution ♦ The Generalized Poisson Distribution
       

 

HSU

Letzte Änderung: 22. November 2017