Automatic Detection of Alpine Rockslides in Continuous Seismic Data Using Hidden Markov Models

Summary

We present a novel technique to solve the automatic detection and classification problem of earth tremor in a single step by using Hidden Markov Modelling (HMM). While this technique was originally developed in speech recognition, it already showed great promise when applied to volcano induced seismic signals. We apply the HMM classifier to a much simpler problem, that is, the detection and distance dependent classification of small to medium sized earthquakes. Using the smaller and possibly not perfect data set of earthquakes recorded with three stations of the Bavarian Earthquake Service enables us to better evaluate the advantages and disadvantages of the proposed algorithm and to compare the results with simple and widely used detection techniques (e.g. recursive short-term versus long-term average). Overall the performance of HMM shows good results in the pre-triggered classification tasks and reasonable results in the continuous case. The application of HMMs is illustrated step by step so it can be used as recipe for other applications. Special emphasize is given to the important problem of selecting the features, which best describe the properties of the different signals that are to be classified.

1 Introduction

Detection, that is, the separation of a signal from the background noise which we are not interested in, and pattern recognition have a very long tradition in seismology (e.g. Joswig 1996). In summary, we can distinguish detection and pattern recognition into three main fields: detecting weak signals in background noise, identifying seismic phases (automatic phase picking) and the discrimination of different earth tremors by its generating process (e.g. tectonic earthquakes versus artificial explosions). Early applications of detection and classification, mostly implemented in a cascade like schema, were part of the developments in monitoring nuclear test ban treaties (Tjostheim 1981; Joswig 1990; Hoffmann et al. 1999).

The majority of these algorithms are based on the computation of so-called characteristic functions as a first step. These functions are then used for detecting the onset of a phase or of a complete event by using the short-term versus long-term average (STA/LTA). Following the detection step the different waveforms are then often compared to templates of known signal classes (Tjostheim 1981; Joswig 1990). For solving the problem of phase identification numerous algorithms were developed throughout the last decades (Allen 1982; Baer & Kradolfer 1987; Sleeman & van Eck 1999; Leonard 2000; Gentili & Bragato 2006). Most of them are based again on the computation of characteristic functions, which often contain the computation of the seismic envelope and/or its behaviour in time. When automatization is the key issue, some algorithms correlate the estimated phases directly with trail locations (binder module in EARTHWORM Johnson et al. 1995).

The most promising field for applying pattern recognition techniques is seen in volcano seismology. This originates from the idea that different types of volcano induced seismic signals are produced by different physical processes within the volcanos feeder system and therefore represent different states of volcanic activity. This idea is reflected in the frequency of number of event types per time interval as part of a volcano monitoring systems. It is also highly desirable to automatize this procedure. The majority of already developed algorithms is based on Artificial Neuronal Networks (ANN) which are mainly applied to pre-triggered data (Langer & Falsaperla 2003; Gentili & Bragato 2006; Langer et al. 2006). In this case, features (or in other words characteristic functions) are stored in a so-called feature vector (e.g. frequency partition, envelope, etc.) and then are used in the framework of ANN to determine the different signal classes.

The technique we propose in this study is based on a single step procedure. We combine detection and classification by using double stochastic models, so-called Hidden Markov Models (HMM), which proved to be successful in speech recognition (e.g. Young et al. 2006). HMMs provide a powerful tool to describe highly variable time-series based on a data driven stochastic model, and therefore, allow a more general description of signal classes. This makes it possible to handle large sets of reference patterns (earthquakes or volcanic signals) as opposed to template based pattern matching techniques (Joswig 1994). In contrast to classical ANNs, HMMs incorporate the time dependence explicitly in the models and thus provide a more adequate representation of the seismic signals. The biggest advantage of HMMs versus pre-triggered classification is its application directly to the continuous data stream which enables the detection and classification of events with small signal-to-noise ratio (SNR). This is especially important in the framework of volcano seismology or micro seismicity studies as it is a priori not clear whether the amplitude of an event is more important for the mechanism (e.g. volcanic activity or landslides) than its rate of occurrence.

Consequently HMMs were first introduced in seismology in the field of volcano seismology (Ohrnberger 2001; Alasonati et al. 2006). While the first author already applied HMMs on continuous seismic data, the latter author describes the technique of so-called Hidden Markov Trees on pre-detected volcanic signals. Wassermann et al. (2007) applied the technique of HMMs on a volcanic data set, spanning one year. They showed that the automatic recognition of volcano induced events appears to have the same statistics of event rates as the recognition of an experienced observer doing interactive classification on a daily routine analysis (see Fig. 1).

Figure 1.

Number of visual interactive classified earthquakes per day of the local volcanologists (plotted as bars) versus automatic classification using discrete HMMs (plotted as line graphs). Classification according to the local observatory: VT volcano tectonic, MP multi phase, ROCK rockfall, BaF block and ash flow.

Number of visual interactive classified earthquakes per day of the local volcanologists (plotted as bars) versus automatic classification using discrete HMMs (plotted as line graphs). Classification according to the local observatory: VT volcano tectonic, MP multi phase, ROCK rockfall, BaF block and ash flow.

Even though this result is encouraging, the evaluation of the classification result is nearly impossible as it includes the interactive reclassification of the whole data set with more than 50 000 detected and classified volcano-seismic events during the time range considered.

Another problem in the evaluation process of volcano induced seismic signals and the assessment of the performance of automatic pattern recognition systems is the assumptions on which the signal separation is based. Only if the separation in, for example, volcanic tremor, hybrid events or volcano tectonic events (McNutt 2000) is also justified by the true underlying physical process, is it possible to evaluate the overall performance correctly.

In order to circumvent the latter problems, and also because we can increase the performance of the underlying automatic system used within the Bavarian seismic network by separating signals that matters from signals a local service is not interested in the first place, we demonstrate the HMM technique in this paper by applying it to a limited continuous data set of the Bavarian seismic network. In this case the signal or class separation is done by simply using the epicentral distance of earthquakes with respect to the stations used. Therefore, it is possible to easily decide whether a classification is correct or wrong.

We next formulate the detection and classification problem in the framework of HMMs. We describe, in particular, the used features, the properties of the models and explain the solution of the detection and classification problem by adopting HMMs. Finally, we evaluate the performance of the one step detection and classification algorithm by comparing its result with the detection rate of a simple recursive STA/LTA trigger.

2 Theory

2.1 Feature extraction

As for most classifiers, the time and amplitude dependent seismogram is pre-processed in order to transform it into a domain where the earthquake classes are easier distinguished. This transformation is known as feature generation. Examples of features (or in other words characteristic functions) in seismology are envelope, dominant frequency, power of a frequency band or average absolute amplitude of the seismogram.

An experienced seismologist is able to distinguish between earthquake classes 'easily', therefore it seems promising to pre-process the seismogram for the classifier by including features which mimic the eye-brain chain: The sharpness in detail can be modelled by detailed features of the time-frequency domain (sonogram); the grosser shape in time, which is recognized by an interaction of eyes and brain, can be mimicked by the seismogram envelope.

The classifier will only perform well if the used features contain different information for the different classes. Therefore, the crucial step is to actually find d features in which the classes distinguish themselves most. However, the general formulation of the classifier allows varying configurations of the generated features, and so in the following it is only assumed that some d features have been generated. Detailed explanation of the generation and selection of the features is given in Section 3.2.

2.2 Formulation of the classification problem

The classification problem can be expressed as the argument of the maximum probability of the ith class w i given the observation sequence O = o 1 o 2 o T :

1

With the observation o t at time t being either (i) a vector containing the d features in the case of general HMMs, incorporating continuous density models or (ii) a symbol o t = o t in case of discrete HMMs (DHMM). In the latter case a standard vector quantifier combined with a pre-whitening transform is used to discretize the feature sequence.

In summary a vector quantifier is cross-plotting the training data in a graphic feature space and segmenting this feature space in such a way that the resulting point clouds are best separated from one another (see Fig. 2).

Figure 2.

Sketch of a vector quantifier: The training data are cross-plotted in the feature space resulting in the black points. The feature space is segmented (dashed lines), and labelled by the alphabet symbols A, B and C such that the point clouds are separated best. A sequence of the green, blue, blue, blue, red and red points is then vectorquantified by assigning the nearest segment, that is, alphabet symbol, resulting in the (alphabet) symbol sequence A, C, C, C, B and B.

Sketch of a vector quantifier: The training data are cross-plotted in the feature space resulting in the black points. The feature space is segmented (dashed lines), and labelled by the alphabet symbols A, B and C such that the point clouds are separated best. A sequence of the green, blue, blue, blue, red and red points is then vectorquantified by assigning the nearest segment, that is, alphabet symbol, resulting in the (alphabet) symbol sequence A, C, C, C, B and B.

Each segment is further labelled with an alphabet symbol (A, B and C in Fig. 2) forming a so-called codebook. The vector quantization step is then accomplished by cross-plotting a point in the d dimensional feature space and discretizing the resulting graphic feature point by assigning the nearest (based on the Euclidean distance) graphic alphabet symbol. The number of segments, (i.e. number of alphabet symbols, three in case of Fig. 2) is further referenced as codebook size. The vector quantifier basically (time independently) pre-classifies the feature vector and therefore enables us to better control the detection and classification system. Despite the fact that information is lost by discretizing the feature vector sequence, we favour the DHMM in order to better control the behaviour of the system and used a standard vector quantifier to produce a discrete symbol sequence (for details on VQ see e.g. Ohrnberger 2001). The remainder of the paper refers to DHMM.

A pre-whitening transform is used before the vector quantification in order to make the feature vectors as independent from each other as possible (in an Euclidean sense). Therefore, the autocovariance matrix of the feature vectors is decomposed by using the singular value decomposition and the resulting eigenvectors are used to transform the feature vectors to the resulting coordinate system (for further details see Kittler & Young 1973).

Having explained the observation o t we return now to eq. (1). P(w i ǀ O ), the probability that the true class is w i given the observation symbol sequence O , is not directly computed but is calculated from P( O ǀw i ), the probability that the observation symbol sequence is O given the class w i , by the use of the Bayes's Rule:

2

P(w i ), the prior probability of class w i , can either be computed from past events or alternatively be seen as uniform for each class (usual assumption in speech recognition, e.g. Rabiner 1989). The inverse probability for the observation sequence α = P( O )−1 is not depending on the class w i , and therefore, is a constant for the maximum calculation of eq. (1).

Given the dimensionality of the problem, the direct estimation of P( O ǀw i ) is not feasible. However, if it is assumed that an underlying parametric model such as a DHMM is generating the observation sequence, the problem of estimating P( O ǀw i ), the probability that the observation symbol sequence is O given the class w i , is replaced by the much simpler problem of estimating P( O ǀλ i ), the probability that the observation symbol sequence is O given the parameters of the DHMM λ i ( Young et al. 2006).

2.3 Parameters of DHMMs

In general, a Hidden Markov Model is a finite state machine, where the states are hidden from direct observation, that changes state once every time unit. Each time t that a state q t is entered, an observation symbol o t is then generated ( Young et al. 2006, p. 4).

In detail, a DHMM is described by the following parameters.

  • (i)

    π i = P(q 1 = i): the probability that the first state q 1 is i.

  • (ii)

    a ij = P(q t + 1 = jǀq t = i): the probability of a state transition from the current state i to the next state j.

  • (iii)

    b j (k) = P(o t = kǀq t = j), the observation symbol probability of the symbol k given the current state is j.

Fig. 3 shows a sketch of a DHMM, where the hidden states are 'unhidden' in order to better illustrate the model parameters. Also for better illustration, we use instead of an arbitrary symbol sequence the number of car accidents per day in Germany 2005 as observation with an average of 6160 accidents per day (Statistisches-Bundesamt 2006). Consecutive observations during 5 days O = (7219, 7100, 6700, 5980, 6160) could be generated by an underlying DHMM, in which the states are different realizations of the weather: sun, clouds and rain. The double stochastic process in this model is composed out of: (i) the transition probabilities of the states (sun, clouds and rain), that is, the probability of sun tomorrow given that it is cloudy today is a 12 = P(q t + 1 = 1ǀq t = 2) and the (ii) the observation probability (the number of car accidents) which differs in each state, that is, we assume a high probability of a high number of accidents given sun shine b 1("high") = P(o t = "highq t = 1) due to fact that there is denser traffic and therefore it is more probable to have a car accident. The probability of the observation sequence O and state sequence shown in Fig. 3 given model parameters λ is calculated by multiplying the temporal evolution of all probabilities which generated the observation sequence:

3

Figure 3.

DHMM example, which generates a observation sequence O containing the numbers of car accidents per day in Germany (on average 6160 accidents per day) as realizations of different states of the weather (sun, clouds and rain).

DHMM example, which generates a observation sequence O containing the numbers of car accidents per day in Germany (on average 6160 accidents per day) as realizations of different states of the weather (sun, clouds and rain).

As mentioned in the example, the probability P( O ǀλ) can be accessed by simply multiplying the relevant DHMM parameters. However, in order to be practicable in real-word applications there are three basic problems that must be solved for the DHMM (Rabiner 1989):

  • Problem 1:

    Given the observation sequence O and model λ, how to efficiently compute P( O ǀλ)?

  • Problem 2:

    Given the observation sequence O , how to choose the corresponding state sequence q 1q T which best explains the observations?

  • Problem 3:

    How to adjust the model parameters λ in order to maximize P( O ǀλ)?

The solution to problem 1 is efficiently given by the so-called forward procedure, where dynamic programming is used to avoid redundant calculation. Problem 2 is solved by the Viterbi algorithm which also is based on dynamic programming. An expectation maximization (EM) algorithm called Baum-Welch is used for the solution of problem 3, the training problem. For details see Schukat-Talamazzinin (1995), Rabiner (1989) and Young et al. (2006).

2.4 Continuous classification and post-processing

While the classification of a pre-detected feature symbol sequence O is given by eq. (1) the real-world detection and classification of a continuous feature symbol sequence is accomplished by cutting feature symbol sequences graphic of model dependent length T i , centred at a midframe, out of the continuous feature symbol stream (see Fig. 4). Here frame is used as synonym for window so as to not mix up with the window used for feature generation (Section 3.2).

Figure 4.

Sketch of the continuous classification: Feature symbol test sequences  of model dependent length Ti are selected from the continuous symbol stream shown in the top row around a midframe (coloured in yellow) and classified using a normalized likelihood measure (eq. 6) (modified from Ohrnberger 2001).

Sketch of the continuous classification: Feature symbol test sequences graphic of model dependent length T i are selected from the continuous symbol stream shown in the top row around a midframe (coloured in yellow) and classified using a normalized likelihood measure (eq. 6) (modified from Ohrnberger 2001).

These resulting partial feature symbol sequences are then evaluated by a normalized log maximum likelihood measure (Juang & Rabiner 1985; Ohrnberger 2001):

4

Moving the midframe along the continuous symbol sequence achieves a classification per frame step.

In order to achieve better performance in this study the following post-processing steps are applied. (i) So as to achieve robustness we average the probability of multiple models λ ik per class w i with different number of states kK:

5

The combination of eqs (4) and (5) results in the following 'averaged', normalized maximum likelihood measure:

6

(ii) In the continuous case solely classifications with a three times higher probability than the probability of noise are taken into account. A class dependent minimum event length is used, which is regulating the number of consecutive classifications of one class that must occur such that an event is declared. Furthermore, the coincidences of the three stations used in this study are calculated by accepting a classification only if at least two station declare an earthquake classification.

By using the steps explained in this section the earthquake detection and classification algorithm can now be built and is applied in the following.

3 Case Study

3.1 Properties of the different classes

In this paper we use real data from three seismological stations from the Bavarian earthquake service to exemplify the steps needed for a DHMM based detection and classification system. The study area is located in a mountainous region where it is of interest to separate local from nearby and from regional earthquakes (Fig. 5 shows example seismograms).

Figure 5.

Common seismogram for the classes reg, near, loc, dn and nn (for class definition see Table 1).

Common seismogram for the classes reg, near, loc, dn and nn (for class definition see Table 1).

The local earthquakes are located within a high mountain and the corresponding wave fields are therefore strongly affected by topography. In contrast, the nearby and regional earthquakes are less affected by topography due to the deeper travelling paths and longer wavelengths involved. This is also expressed by their more similar overall appearance regarding their seismogram.

The classification is crucial for further localizing by using CPU intensive non-linear techniques for the local earthquakes (3-D-topographic effects) and more classical techniques for the nearby and regional earthquakes. The classes in this case study are identified and defined in Table 1.

Table 1.

Characteristics of the different classes/training data.

Characteristics of the different classes/training data.

Noise in this context is referred to as events of no interest, so that each part of the seismogram can now be associated with one of the above classes. The noise is further divided into two subclasses, day noise dn and night noise nn. nn events are mostly non-anthropogenic, that is, metrological influences, vibrating trees or similar sources known as ambient noise. In contrast dn events are mostly anthropogenic, for example, traffic, logging or saw mills.

Now that the definitions of the classes are given, the next step is to extract features by which the classes can be distinguished from one another so that the classification problem is easier to solve.

3.2 Feature extraction and selection

The determination of the different classes is not directly based on the seismogram itself but on features extracted from the seismogram (see Section 2.1). In order to process the features in near real time, a 3s sliding window with step size 0.05s is used for computation. Therefore, frequencies are nominally resolved down to 1/3s = 0.33 Hz.

In the following a short background information regarding the calculation of the used features is given. This includes a measure of the classes discriminative power for the different features based on Figs 6–8. The figures show for each feature and for each class, respectively, the median and histogram over all events in the class dependent training data sets (for number of events see Table 1).

Figure 6.

The figures show for each feature and for each class, respectively, the median and histogram over all events in the class dependent training data sets. The features here are hob1– hob10 Colour-coding: reg in blue, near in green, loc in red, dn in yellow and nn in grey.

The figures show for each feature and for each class, respectively, the median and histogram over all events in the class dependent training data sets. The features here are hob1– hob10 Colour-coding: reg in blue, near in green, loc in red, dn in yellow and nn in grey.

Figure 7.

The figures show for each feature and for each class, respectively, the median and histogram over all events in the class dependent training data sets. The features here are rectilinearity 'rect', planarity 'plan', azimuth φP and incidence θP. The spikes in the histogram of 'rect' and 'plan' are due to the number of bins which is constant for all histograms and not optimal for 'rect' and 'plan'. Colour-coding as in Fig. 6.

The figures show for each feature and for each class, respectively, the median and histogram over all events in the class dependent training data sets. The features here are rectilinearity 'rect', planarity 'plan', azimuth φ P and incidence θ P . The spikes in the histogram of 'rect' and 'plan' are due to the number of bins which is constant for all histograms and not optimal for 'rect' and 'plan'. Colour-coding as in Fig. 6.

Figure 8.

The figures show for each feature and for each class, respectively, the median and histogram over all events in the class dependent training data sets. The features here are phase difference θNZ, instantaneous frequency fZ, centroid time centr Z, normalized envelope  and instantaneous phase σZ. The zeros in the histogram of θNZ are due to the number of bins which is constant for all histograms and not optimal for θNZ. Colour-coding as in Fig. 6.

The figures show for each feature and for each class, respectively, the median and histogram over all events in the class dependent training data sets. The features here are phase difference θ NZ , instantaneous frequency f Z , centroid time centr Z , normalized envelope graphic and instantaneous phase σ Z . The zeros in the histogram of θ NZ are due to the number of bins which is constant for all histograms and not optimal for θ NZ . Colour-coding as in Fig. 6.

If the features separate the classes in the median or the histogram distribution, that is, if the graphs for the different classes do not overlay or in other words are separated, the discriminative power is assumed to be high.

3.2.1 Sonogram calculation

In order to model the human analyst, the feature of the short term Fourier transform STFT (sonogram) introduced by Joswig (1994) is used. The STFT is smoothed by binning (from k low to k high) the squared complex short time spectrum Z k ) of the seismogram in n half octave bands (hob n ) (Ohrnberger 2001, p. 78):

7

The following gradation is used in this study; hob1 0.47–0.78 Hz, hob2 0.70–1.17 Hz, hob3 1.05–1.76 Hz, hob4 1.58–2.63 Hz, hob5 2.37–3.95 Hz, hob6 3.56–5.93 Hz, hob7 5.33–8.89 Hz, hob8 8.00–13.33 Hz, hob9 12.00–20.00 Hz and hob10 18.00–30.00 Hz.

In Fig. 6, the hob features separate the classes well for both, the median and the histogram distribution.

3.2.2 Polarization analysis

The polarization properties of the three component seismogram X = ((x Z1, …, x ZT); (x N1, …, x NT); (x E1, …, x ET)) are calculated from the eigenvalue (λ1 ≥ λ2 ≥ λ3) decomposition of the covariance matrix C = X T X/T′ of length T′: (C − λ2 l 1) u l = 0. The features rectilinearity 'rect', the planarity 'plan', the azimuth φ P and the incidence ϑ P are then computed by using the following equations (Kanasewich 1981, p. 336f):

8

9

10

11

While the median of the azimuth in case of loc events in Fig. 7 separates well from all the other classes and while there is a difference visible in the median of the incidence between earthquakes and noise, the features of the planarity and rectilinearity do not show clear separation, neither in the median distribution nor in the histogram distribution.

3.2.3 Complex trace analysis

The Hilbert transform graphic of the ith component of the Fourier transformed seismogram x i (ω) is used for the calculation of the complex trace graphic and its associated features. The envelope A i (t) and phase θ i (t) (for details see Taner et al. 1979; Kanasewich 1981, p. 367f) are defined as follows:

12

13

14

By using the definition of the envelope and phase, the instantaneous frequency f i , the instantaneous bandwidth σ2 i , the normalized envelope graphic, the centroid time centr i and the instantaneous phase difference θ ij can be calculated as follows (for details see Taner et al. 1979; Ohrnberger 2001):

15

16

17

18

19

The definition of the newly introduced normalized envelope (eq. 17) is justified as follows. The smoothed instantaneous bandwidth is normalized by the Nyquist frequency and is integrated in order to receive a normalized measure of the signals envelope. To enlarge the variance of that new measure, a constant is multiplied and the exponent is taken.

The also non-standard feature centroid time is the time instance in the 3s sliding window, where 50 per cent of the area below the envelope is reached (Ohrnberger 2001).

This feature is another type of a normalized envelope measure and performs well for the separation of the earthquake classes in the centroid-time time-space (see Fig. 8).

Good separation in Fig. 8 is also obtained by the normalized envelope and the instantaneous bandwidth. Also the instantaneous frequency shows good separation in the histogram distribution while the phase difference shows little separation capabilities.

Having extracted the features, it is now crucial to find and add only those d features to a DHMM detection and classification which separate the different classes best. Adding features which do not separate the different classes will result in a reduced performance of the vector quantifier and consequently affect the following classifier.

In this study, we visually select the features with the best discriminative power using Figs 6–8. These features are then combined to feature collections (fv) with different total numbers of features, see Table 2.

Table 2.

Structure of feature collections. (For feature definitions see Section 3.2).

Structure of feature collections. (For feature definitions see Section 3.2).

Given a feature collection and a training data set (see Table 1) features are generated from the training data, pre-whitened, vector quantified (64 codebook size, recall Section 2.2) and used as input for the next step where DHMMs are trained for each class.

3.3 Classifier and system performance

As the number of states is unknown it is beneficial to average over multiple HMMs with different number of states. The number of states in a HMM (e.g. 3 in Fig. 3) should be as small as possible in order to estimate robust parameters especially in the presence of a small training data set. On the other hand the model should be complex enough, that is, have a large enough number of states, to describe the appropriate class characteristics. Being also bound by computational processing time three DHMMs per class are trained with a total number of 3, 4 and 5 states.

After having trained the various DHMMs the detection and classification system is completed. The following approaches are then used to determine the performance of the trained DHMMs, that is, the classifier.

  • (i)

    Usually the performance is tested on single, pre-detected events applying eq. (5). In speech recognition this is done on an independent pre-labelled (class memberships are known) test data set, which was previously separated from the training data set. In our case there is just a limited amount of training data available. Therefore, it is not possible to further reduce the size of the training data set by separating a test data set. Consequently, as a first check, the training data are re-evaluated. This test is not impartial as the classifier is evaluated with the same events which are used for training. However, due to the low computational costs this method is interesting for evaluating DHMMs with different feature collections and selecting the one with the best discriminative power.

    Fig. 9 exemplifies the results of four re-evaluations of the training data using different feature collections. The figure shows the true class against the classified class. In order to evaluate the best feature collection, the different types of errors and their corresponding error weight need to be discussed.

    A true noise event which is classified as earthquake is called false alarm; a true earthquake which is classified as noise is called missed event and a true earthquake which is classified as a different earthquake type is called confused event. Since reclassifying the whole data set is the only possibility to correct for a missed event, it is common to put the error weight of a missed event higher than of a false alarm. Indeed, a false alarm can be easily corrected by re-verifying only the already classified events. In case of the confused event there is no need for sorting out any false alarms and consequently the lowest weight is put on this kind of error.

    Applied to Fig. 9 we get the following results. There are no missed events in feature collections fv11, fv16, fv25 and fv29 at all. Fv11 and fv16 are the only ones which have no false alarm and they both have the same classification rates (diagonal elements) for the earthquake classes; only in the classification of the noise classes fv16 performs slightly better (see Fig. 9).

  • (ii)

    A computational more expensive test is the 'leave one out' method. Here an independent test data set is simulated by removing one event from the training data set, training the DHMMs from the remaining events and finally classifying the removed event with these DHMMs (eq. 5). This procedure is repeated in a loop over all of the events of the training data set, thus achieving a better measure of the classifiers performance.

    Fig. 10 shows the results of the 'leave one out' method in the form of a confusion matrix. Here, in all feature collections, 6 per cent of the reg events are missed. The position of the false alarms are marked in Fig. 10 by a red rectangle. Fv11 has together with fv16 the lowest false alarm rate of 12 per cent. Focusing on the correct classifications (sum of diagonal elements) fv11 performs better and is consequently used (in agreement with the previously described evaluation criterions, see Section 3.3, i) for the following continuous classification.

  • (iii)

    Since the 'leave on out' method is based on single pre-detected events with certain length, continuous data are a more realistic performance test for the detection and classification system. For example, in only one hour 60 × 60s/0.25s = 14 400 (frame rate 0.25s) symbol sequences are classified, thus giving a better view of the DHMM detection and classification as the in total 142 events in the training data set (see Table 1).

Figure 9.

Confusion matrix in case of the re-evaluation of the training data set. Annotations as follows. cb64fv11-RMOA denotes codebook (cb) with size 64, feature collection fv11 (see Table 2) and the station name is RMOA.

Confusion matrix in case of the re-evaluation of the training data set. Annotations as follows. cb64fv11-RMOA denotes codebook (cb) with size 64, feature collection fv11 (see Table 2) and the station name is RMOA.

Figure 10.

Confusion matrix in case of the 'leave on out' method. Annotations as follows: cb64fv11-RMOA denotes codebook (cb) with size 64, feature collection fv11 (see Table 2) and finally the station RMOA. The position of the false alarms in the confusion matrix are emphasized by a red rectangle.

Confusion matrix in case of the 'leave on out' method. Annotations as follows: cb64fv11-RMOA denotes codebook (cb) with size 64, feature collection fv11 (see Table 2) and finally the station RMOA. The position of the false alarms in the confusion matrix are emphasized by a red rectangle.

The continuous, one step detection and classification is based on eq. (6), that is, no pre-detector is used, and the post-processing steps described in Section 2.4 are applied.

To provide a better understanding Fig. 11 shows an example of a continuous classified seismogram. In the top row a seismogram containing two loc earthquakes is shown in black. The bottom row displays the corresponding averaged log probabilities for each class. The log probabilities are calculated in a sliding frame (frame rate 0.25) with class dependent length (Fig. 4) and averaged over three HMMs per class (eq. 6). It can be seen that the probability of the loc class increases during the local events. At the same time the probability of the two noise classes decreases (the onset of the probability change lies actual before the loc events due to the 3s window of the feature generation). A classification is raised as soon as the log probability of an earthquake class is greater than the probability of a noise class and the seismogram is coloured according to the presently active earthquake class. The coda of the loc class is confused as near, however, in the post-processing the minimum event length, introduced in Section 3.2, tries to avoid confused events in that case.

Figure 11.

Top panel: Seismogram of two local events recorded with vertical component. Colour sections correspond to final continuous classification without post-processing. Bottom panel: Log probabilities for each class (blue ≡ reg, green ≡ near, red ≡ loc, yellow ≡ dn and grey ≡ nn).

Top panel: Seismogram of two local events recorded with vertical component. Colour sections correspond to final continuous classification without post-processing. Bottom panel: Log probabilities for each class (blue ≡ reg, green ≡ near, red ≡ loc, yellow ≡ dn and grey ≡ nn).

The performance was tested for a one month period starting on 2004 June 1. Certainly earthquakes falling in this time period are excluded from the training data set. For the evaluation of the results a recursive STA/LTA, which proved to be sufficient as event detector regarding its low computational costs and good performance ( Withers et al. 1998), was applied to the same time period. As trigger parameters a 1s STA window, a 30s LTA window and threshold 4, respectively, 1.5 for switching the trigger on and off were used (Trnkoczy 1998). The coincidence of the three stations was calculated such that an simultaneous detection was raised if at least two of three stations detected an event. The results in the continuous case are shown in Table 3, the comparison to the recursive STA/LTA is shown in Table 4 and discussion follows in the next section.

Table 3.

Classification performance of DHMM on a one month continuous seismic period.

Classification performance of DHMM on a one month continuous seismic period.

Table 4.

Comparison of the detection rates (without classifying) between DHMM and the recursive STA/LTA on a 1 month continuous seismic period.

Comparison of the detection rates (without classifying) between DHMM and the recursive STA/LTA on a 1 month continuous seismic period.

4 Discussion

In the continuous case one month, that is, 30 × 24 × 60 × 60s/0.25s = 10 368 000 (frame rate 0.25s, see Section 2.4) symbol sequences, are selected from the continuous symbol stream and classified using DHMMs. The DHMMs detect (correct + confused events) 78 per cent of all loc events, 100 per cent of all near events and 74 per cent of all reg events (see Table 3). Measuring the classification performance DHMMs classify 26 (70 per cent) loc events, 7 (54 per cent) near events and 10 (53 per cent) reg events correctly.

The intrinsic similarities between the reg and near class are the origin of higher confused event rates and consequently lower classification rates; already apparent in the 'leave one out' method (Fig. 10). Joining the classes reg and near to a regnear class results in 27 (27/32 ≈ 84 per cent) correct events for the regnear class, together with the unchanged 27 (70 per cent) correct events for the loc class.

In order to rate these results a recursive STA/LTA detector is applied to the same time period. The recursive STA/LTA detects 90 per cent of all earthquakes (sum of loc, near, far events) as compared to the DHMMs with an overall detection rate (correct + confused events) of 81 per cent (see Table 4). The false alarms of both methods are comparable with 223 for the DHMM and 219 for the recursive STA/LTA. However, in contrast to the recursive STA/LTA the DHMM not only detects but also classifies the detection. Furthermore, when comparing the DHMM detection and classification rates (Table 3) with other classification algorithms, the erroneous detection rate of the pre-detector (here 10 per cent) still needs to be subtracted. This means, for example, that 80 per cent successful classifications for a pre-triggered ANN classifier would reduce to, respectively, 72 per cent ( Langer et al. 2006; Ibs-von Seht 2008).

Certainly an improvement of the DHMM classification could be achieved by running first the recursive STA/LTA detector and then the HMM on the pre-detected data, similar to Langer et al. (2006) and Ibs-von Seht (2008). Especially the good performance in the 'leave one out' method would suggest this approach.

However, the main goal of this study was to present a detection and classification system which needs not to rely on a prior detection and thus can operate directly on the continuous data, which consequently allows the classification of low SNR earthquakes which would usually fall through the standard detector.

Even with a small sized data set (in total 142 sequences), the DHMM detection and classification achieves reasonable results. In contrast to template based pattern recognition, where templates of the whole training data set need to be stored for later classification, for DHMMs one probabilistic model per class is estimated from the training data and in the following the training data can be discarded. This allows the use of nearly unlimited amounts of training data (e.g. 160 000 sequences in speech recognition). By running the DHMM continuously, earthquakes will be detected and classified which then can sequentially be added to the training to estimate improved DHMMs.

Furthermore a problem we have not addressed so far is the occurrence of mixed earthquakes, that is, two or more earthquakes whose signals are running into each other. Definitely there is need for more research in this area.

Another point is, as already mentioned, a potential loss of information due to vector quantization of the DHMM. The migration to continuous HMMs which incorporate multivariate probability densities could improve the results and be less sensible to the feature selection.

5 Conclusion

We presented an alternative method for detecting and classifying earthquakes. This method is not based upon the usual system which is a STA/LTA detector combined with, for example, an ANN or template based pattern recognition. We prefer HMM because of their strength in single step detection and classification, their explicit incorporation of the time and their major success in speech recognition. In speech recognition HMMs are favoured over Time Delay Neuronal Networks (TDNN), an implementation of time windows in several ANN layers in order to process time dependent signals (e.g. Waible et al. 1989), or over template based pattern recognition in combination with Dynamic Time Wrapping (DTW), a similarity measure of signals which may vary in time or speed (e.g. Myers et al. 1980). Both show lower performance. HMM as TDNN and template based pattern recognition are not based on the time and amplitude dependent seismogram itself but on features estimated from the seismogram which clearly characterize the different classes. Special emphasis in this study was given to the determination of these features which is crucial for the performance of the classifier.

For testing the performance of this technique, DHMMs are applied to three stations of the Bavarian earthquake service. The signal or class separation in this case is done by simply using the epicentral distance of earthquakes with respect to the stations used. Therefore, it is possible to easily decide whether a classification is correct or wrong. Using this possibly smaller data set enables us to better evaluate the advantages and disadvantages of the proposed algorithm and to compare the results with simple and widely used detection techniques. The application to this data set is also of interest as for further hypocentre determination on real time analysis it is crucial to use cost intensive non-linear techniques for local earthquakes inhabiting strong 3-D topographic site effects and classical techniques for nearby and regional earthquakes. Thus the class memberships need to be known in advance.

Overall, the performance of the DHMM shows good results for the pre-triggered 'leave one out' method. For the continuous case DHMMs are used to detect and classify a continuous one month period. In order to rate the DHMM results a recursive STA/LTA detector is applied to the same period. When comparing only the number of detected events, which underestimates the performance of the DHMM method as it not only detects but also classifies, the recursive STA/LTA detection performs slightly better with a 90 per cent detection rate compared to the DHMM detection and classification with 81 per cent, respectively. However, when comparing the DHMM to other classification systems, where the erroneous detection rate of the pre-detector needs to be included, the DHMM detection and classification certainly shows comparable results.

The purpose of this paper was to demonstrate the possibility of a classification system which directly operates on continuous data, thus making it possible to detect low SNR earthquakes which usually would fall through the pre-detector. The possibility of applying such a one step detection and classification system increases in relevance as more and more stations continuously record data.

Based on the aspects discussed in this paper, one step HMM detection and classification is competitive to common detection and classification systems.

Acknowledgments

We want to thank Matthias Ohrnberger whose PhD thesis (Ohrnberger 2001) and various discussion was one of the bases for this work. Also we want to thank Heiner Igel for his kind support and the staff of the Geophysical Observatory in Fürstenfeldbruck who maintains the stations used in this work. The suggestions of two anonymous reviewers substantially improved the manuscript.

References

,

2006

.

Signal classification by wavelet-based Hidden Markov Models: application to seismic signals of volcanic origin

, in

Statistics in Volcanology

, eds ,

Geological Society of London

.

,

1982

.

Automatic phase pickers: their present use and future prospects

,

Bull. seism. Soc. Am.

,

72

(

6B

),

225

242

.

,

1987

.

An automatic phase picker for local and teleseismic events

,

Bull. seism. Soc. Am.

,

77

,

1437

1445

.

,

2006

.

A neural-tree-based system for automatic location of earthquakes in northeastern italy

,

J. Seismol.

,

10

,

73

89

.

,

1999

.

Introduction to the verification regime of the Comprehensive Nuclear-Test-Ban Treaty

,

Phys. Earth planet. Inter.

,

113

,

5

9

.

,

2008

.

Detection and identification of seismic signals recorded at Krakatau volcano (Indonesia) using artificial neural networks

,

J. Volc. Geother. Res.

,

.

,

1995

.

Earthworm: a flexible approach to seismic network processing

,

Iris Newslett.

,

14

,

1

4

.

,

1990

.

Pattern recognition for earthquake detection

,

Bull. seism. Soc. Am.

,

80

(

1

),

170

186

.

,

1994

.

Knowledge-based seismogram processing by mental images

,

IEEE Trans. Syst., Man, Cypernet.

,

24

(

3

),

429

439

.

,

1996

.

Pattern recognition techniques in seismic signal processing

, in

Proceedings of the 2nd Workshop on Application of Artificial Intelligence Techniques in Seismology and Engineering Seismology

, Vol.

12

, pp.

37

56

, Cahiers du Centre Europeen de Geodynamique et de Seismologie.

,

1985

.

A probabilistic distance measure for hidden markov models

,

AT&T Syst.Tech. J.

,

64

,

391

408

.

,

1981

.

Time Sequence Analysis in Geophysics

, 2nd edn,

The University of Alberta Press

,

Edmonton, AB

.

,

1973

.

A new approach of feature selection based on the karhunen-loeve expansion

,

Pattern Recognit.

,

5

,

335

352

.

,

2003

.

Seismic monitoring at stromboli volcano (italy): a case study for data reduction and parameter extraction

,

J. Volc. Geother. Res.

,

128

,

233

245

.

,

2006

.

Automatic classification and a-posteriori analysis of seismic event identification at soufrière hills volcano, montserrat

,

J. Volc. Geother. Res.

,

153

,

1

10

.

,

2000

.

Comparison of manual and autoatic onset time picking

,

Bull. seism. Soc. Am.

,

90

(

6

),

1384

1390

.

,

2000

.

Volcanic seismicity

, in

Encyclopedia of Volcanoes

, pp.

1015

1033

,

Academic Press

,

San Diego

.

,

1980

.

Performance tradeoffs in dynamic time warping algorithms for isolated word recognition

,

IEEE Trans. Acoust., Speech, Signal Process.

,

ASSP-28

(

6

),

623

635

.

,

2001

.

Continous automatic classification of seismic signals of volcanic origin at Mt. Merapi, Java, Indonesia

, PhD thesis,

Institut für Geowissenschaften, Universität Postdam

.

,

1989

.

A tutorial on hidden markov models and selected applications in speech recognition

,

Proc. IEEE

,

77

(

2

),

257

286

.

Schukat-Talamazzinin

E.G.

,

1995

.

Automatische Spracherkennung; Statistische Verfahren der Musteranalyse

,

Vieweg Verlag

.

,

1999

.

Robust automatic p-phase picking: an on-line implementation in the analysis of broadband seismogram recordings

,

Phys. Earth planet. Inter.

,

113

(

1

),

265

275

.

Statistisches-Bundesamt

,

2006

.

Verkehr; verkehrsunfälle, Tech. Rep. 2080700057004

,

Statistisches Bundesamt

,

Wiesbaden

, Fachserie 8 Reihe 7.

,

1979

.

Complex seismic trace analysis

,

Geophysics

,

44

(

6

),

1041

1063

.

,

1981

.

Identification of seismic sources: earthquake or underground explosion. in multidimensional discrimination techniques: theory and application in: identification of seismic sources—earthquake or underground explosion

, in

Proceedings of the NATO Advanced Study Institute held at Voksenasen, Oslo, Norway, September 8-18, 1980

, eds ,

D. Reidel Publishing Company

,

Boston, Massachusetts

.

,

1998

.

Understanding & setting sta/lta trigger algorithm parameters for the k2

, Tech. rep., Kinemetrics, Inc., Application Note #41.

,

1989

.

Phoneme recognition using time-delay neural networks

,

IEEE Trans. Acoust., Speech, Signal Process.

,

ASSP-37

(

3

),

328

339

.

,

2007

.

The usability of hidden markov modelling in seismic detection and automatic warning level estimation of volcanic activity

,

Seismol. Res. Lett.

,

78

, p.

249

.

,

1998

.

A comparison of selected trigger algorithms for automated global seismic phase and event detection

,

Bull. seism. Soc. Am.

,

88

,

95

106

.

et al. ,

2006

.

The htk book, Tech. rep.

,

Cambridge University Engineering Department

, HTK Version 3.4.

guerraandlegis.blogspot.com

Source: https://academic.oup.com/gji/article/175/3/1055/634811

0 Response to "Automatic Detection of Alpine Rockslides in Continuous Seismic Data Using Hidden Markov Models"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel