Chapter 6 – Image Segmentation – Fundamentals of Digital Image Processing

Chapter 6

Image Segmentation

  • To discuss segmentation as an important step in image processing applications.
  • To discuss isolated points and line detection.
  • To explain the use of gradient and Laplacian operators to detect edges in the image.
  • To discuss various ways in which a closed boundary of an object is formed.
  • To analyse region oriented and thresholding segmentation.
  • To illustrate the use of motion in segmentation.

Image analysis is an area used for extracting the information from an image. Before we extract the information, the image has to be subdivided into constituent parts or objects. The process of subdividing the given image into its constituent parts or objects is called image segmentation. Image segmentation is the first step in image analysis. The level at which the subdivision is carried out depends on the problem being solved. For example, let us consider the image of a basket containing fruits like apples, oranges, and grapes. To know the size of the orange, the image is subdivided into its constituent parts until we get a subimage of the orange. Further subdivision is then not required.

One of the most difficult task in image processing is segmentation process. It plays a vital role in any application and its success is based on the effective implementation of the segmentation technique.

The segmentation algorithms can be divided into two broad categories based on the two important properties, namely,

  1. Discontinuity and
  2. Similarity.

The various segmentation techniques based on (1) gray level discontinuity and (2) gray level similarity are well depicted in a graph as shown in Figure 6.1.

The forthcoming sections deal with the detection of isolated points, lines, and edges.



FIGURE 6.1 Segmentation techniques


In order to detect the isolated points due to noise or any other interference, the general mask employed is shown in Figure 6.2(a) and the typical values of the weights ‘W’ are shown in Figure 6.2(b). This mask consists of coefficients ‘−1’ everywhere except at the center which is 8. The sum of these coefficients is 0. When we place the mask over an image it covers 9 pixels in the image. The average response to the mask is computed as



where Wi is the coefficient in the mask and Zi denotes the gray level values of the pixel in the image under the mask. Now the mask is placed at the top left corner of the image and the response to the mask is computed using equation (6.1).



FIGURE 6.2 (a) The general representation of the mask (b) The mask with coefficient values


If the mask is over a uniform intensity area, the response due to this mask is equal to 0. This means there are no isolated pixels with different gray level values. On the other hand, if the mask is placed over the area having an isolated point with different gray levels, the response to the mask will be a nonzero value. The average response will be maximum when the isolated points are just below the center of the mask. Therefore, from the mask response it is possible to locate the isolated points resulting due to noise.


The various masks used for detecting horizontal, vertical, +45°, and −45° slanting line are shown in Figure 6.3.



FIGURE 6.3 Masks for detecting lines. (a) Mask for horizontal line detection (b) Mask for + 45° slanting line detection (c) Mask for vertical line detection (d) Mask for − 45° slanting line detection


If the first mask shown in Figure 6.3(a) is moved around an image, its response will be a large value to lines oriented horizontally. The response will be maximum when the line passes through the middle row of the mask with constant background. For example, when we move the mask over an image consisting of all ‘1’s as background and with a line of different gray level ‘10’s, then the response due to the first mask is computed as



This high response indicates that the mask is moving along a horizontal line with different gray levels compared to the background pixel gray level values. Similar experiments with second mask results in high response to vertical lines and the third mask to the lines +45° and the fourth mask to the lines in the −45° direction.

Suppose all the masks are applied to an image and the responses computed are denoted as R1, R2, R3, and R4. If at a certain point in the image |Ri| > |Rj| for all j ≠ i, then the point is more likely to be associated with the line in the direction of mask i. For example, if a point in the image where | R1| > | Rj | for j = 2, 3, and 4 then that particular point is more likely to be associated with a horizontal line.


In image processing, the points and line detection are seldom used. In most of the practical image applications edge detection plays a vital role and the concept involved in the edge detection is illustrated in this section with the help of the image shown in Figure 6.4.



FIGURE 6.4 Edge detection. (a) The dark object on a light background with its derivatives (b) The bright object on the dark background with its derivatives


An edge is a boundary between two regions with relatively distinct gray level properties. Consider the image shown in the Figure 6.4(a) consisting of a dark object in a light background. The gray level profile along the horizontal line of the image corresponding to the location shown by the arrow line is also given in the Figure 6.4(a).


Edge: An edge is a boundary between two regions with relatively distinct gray level properties.


The first derivative of the gray level profile is negative at the leading edge of the transition, positive at the trailing edge, and zero in the areas of constant gray levels. The second derivative is negative for that part of transition associated with the light side of the edge, positive for that part of the transition associated with the dark side of the edge, and zero for pixels lying exactly on edges. By analyzing the first derivative and second derivative of the image profile corresponding to a horizontal line, the following inference is obtained.

The magnitude of the first derivative is used to detect the presence of an edge in the image and the sign of the second derivative is used to determine whether the edge pixel lies on the dark side or light side of an edge. For example, if the second derivative is positive it shows that the corresponding pixel lies on the dark side of the edge and vice versa.

The second derivative has a zero crossing at the midpoint of the transition in gray level. The first derivative and the second derivative at any point in an image are obtained by using the magnitude of the gradient at that point and Laplacian operator, respectively. The detailed discussion of the gradient and Laplacian operator is given in the following sections.

6.4.1 Gradient Operators

The gradient of an image f(x, y) at the location (x, y) is given by the vector



The gradient vector points in the direction of the maximum rate of change of fat (x, y). In the edge detection we employ the magnitude of the gradient vector and it is denoted as



To reduce the computational complexity the magnitude of the gradient vector can be approximated as given in equation (6.4).



The direction of the gradient vector is another important quantity and is given in equation (6.5).



where the angle is measured with respect to x-axis.

The computation of the gradient of an image is obtained from the partial derivatives and at every pixel in the image. It is always possible to implement the derivatives in digital form in different ways. One of the equivalent digital forms for the gradient is given by Sobel operators and they are given by the following equations.






where P1 to P9 are pixel values in a subimage as shown in Figure 6.5(a).

The equations (6.6) and (6.7) can be represented by two 3 × 3 masks as given in Figure 6.5(b) and (c).



FIGURE 6.5 Sobel masks. (a) Sub image (b) Sobel mask for horizontal direction (c) Sobel mask for vertical direction

The mask in Figure 6.5(b) is used to compute Gx at the center point of the 3 × 3 region and mask in Figure 6.5(c) is used to compute Gy. The other mask called Prewitts can also be used to compute the gradient Gx and Gy as shown in Figure 6.6.

The following two equations give the computations of Gx and Gy components.




FIGURE 6.6 Prewitt’s masks for horizontal and vertical components. (a) Mask to compute Gx (b) Mask to compute Gy





The simplest possible way to implement the partial derivative at the center of the 3 × 3 mask is to use the Roberts, cross gradient operators.






The gradient image computed using Sobel operators is given in Figure 6.7.

Figure 6.7(a) shows the original image and Figure 6.7(b) shows the result of computing the modulus of Gx. This result gives the horizontal edges, which is perpendicular to the x-axis. Figure 6.7(c) gives the computation of gradient |Gy |, for vertical edges, which is perpendicular to the y-direction. Combining the above two components results in Figure 6.7(d), which is nothing but the gradient image.

6.4.2 Laplacian Operator

The Laplacian of the two-dimensional image f(x, y) is the second-order derivative and is defined as



For a 3 × 3 subimage the digital form equivalent to the Laplacian operator is given as



From equation (6.13) it is possible to define the digital Laplacian mask so that the coefficient associated with the center pixels should be positive and that associated with the outer pixels should be negative. Moreover, the sum of the coefficients should be zero. Such a spatial mask [corresponding to equation (6.13)] is shown in Figure 6.8.



FIGURE 6.7 The gradient images using Sobel operators. (a) Original image (b) Image obtained using gradient Gx (c) Image obtained using Gy (d) Complete gradient image (Gx + Gy)


The Laplacian response is sensitive to noise and is rarely used for edge detection.



FIGURE 6.8 The mask used to compute Laplacian


In practice, the set of pixels detected by the gradient operators do not form a complete boundary due to noise, non-uniform illumination, and other effects. Thus a linking and other boundary detection procedures to assemble the edge pixels into meaningful boundary follow the edge detection algorithm. There are a number of techniques available for this purpose and they are

  1. Local processing
  2. Global processing using Hough transform
  3. Graph theoretic approach
  4. Thresholding
  5. Region growing and
  6. Region splitting and merging.

The fourthcoming section explains the local processing technique used to get the complete boundary of the objects in the given image.

6.5.1 Local Processing

The edge pixels are determined using the gradient operators. If we closely analyze the boundary, constructed by the edge pixels one may come across small openings or gaps. Thus the boundary is not completely defined by the edge pixels. In order to fill the gaps or openings at each pixel, we also consider the characteristics of pixels in a small neighborhood (3 × 3 or 5 × 5). All the neighborhood pixels which are similar to this boundary pixel are linked.

Two important properties used for checking the similarity of the neighborhood pixels with respect to the edge pixels are

  1. The strength of the gradient operator response to produce the edge pixel
  2. The direction of the gradient.

Now let us consider an edge pixel with coordinates (x, y) and a neighborhood pixel at the coordinate (x′, y′). Then the neighborhood pixel (x′, y′) is similar to the edge pixel (x, y) if



where T is a non-negative threshold value. Then neighborhood pixel (x′, y′) with respect to the edge pixel at (x, y) has an angle similar to the edge pixel if






and A is an angle of threshold.

A neighborhood pixel (x′, y′) is linked to the edge pixel (x, y) if both magnitude and direction given in equations (6.14) and (6.15) are satisfied.

This process is repeated for every edge pixel location and a record must be kept for the linked neighborhood pixels. This procedure is applied to the image shown in Figure 6.9(a).



FIGURE 6.9 (a) Input image (b) Gx component (c) Gy component (d) Result of edge linking


Figure 6.9(b) and (c) shows the components of Sobel operators, Figure 6.9(d) shows the result of linking all the points that had a gradient difference less than 25° and the difference of the gradient direction not more than 15°.

6.5.2 Global Processing using Graph Theoretic Approach

In this section, a global approach for edge detection and linking based on representing the edge elements in the form of a graph and searching the graph for the low-cost path that correspond to the significant edges are discussed.

This graph approach performs well in the presence of noise. Before we explain the graph approach in detail, we introduce some basic definitions. A graph can be represented as



where N is a set of non-empty nodes and V is a set of unordered pairs of nodes in N. Each pair (Ni, Nj)of V is called an edge. A graph in which the edges are directed is called a directed graph. If an edge is directed from node Ni to Nj then the node Ni is called a parent of the child node Nj. The process of identifying the successor of a node is called expansion of the node. In a graph, the levels are also defined and root node or the start node is node at level 0. The nodes in the least level are called goal nodes. The cost associated with the edge between two nodes ni and nj is denoted as C(ni, nj). The sequence of nodes n1, n2,…, nk with each node ni being the child of the node ni−1 is called a path from n1 to nk. Then the cost of the path is given in equation (6.18).



An edge element is defined as the boundary between two pixels p and q such that p and q are four neighbors as given in Figure 6.10.



FIGURE 6.10 An edge element


The edge elements are identified by the (x, y) coordinates of p and q. In other words the edge element between p and q is given by the pairs (xp, yp) and (xq, yq), which define the edge sequence of connected edge elements. Now let us consider a 3 × 3 image and apply the concepts discussed so far to detect an edge. The 3 × 3 image is shown in Figure 6.11.

The outer numbers are pixel coordinates, the numbers in the bracket represent gray level values. Each edge element between pixels p and q is associated with the cost, as given in equation (6.19).




FIGURE 6.11 A 3 × 3 image


where H is the highest gray level in the image and f(p) and f(q) are the gray level values of pixels p and q. For the image 3 × 3, shown in Figure 6.11 the graph is drawn and depicted in Figure 6.12. Each node in this graph corresponds to an edge element. There exists an arc between two nodes if the corresponding edge elements are considered in succession. The cost of each edge element is computed and shown against the edge leading into it and the goal nodes are shown as shaded rectangles. Let us assume that the edge starts in the top row and terminates in the last row. Therefore, the possible start edge elements are [(0,0), (0,1)] and [(0,1), (0, 2)].

The terminating edge elements are [(2,0), (2,1)] or [(2,1), (2,2)]. The minimum cost path computed is indicated in the graph by means of a thick line. The edge with minimum cost is shown in Figure 6.12(b). In Figure 6.12(b), the node denoted by the pair of pixels (0, 0) and (0, 1) has a cost 7, denoted on the arc terminating on it. The cost is computed using equation (6.19). The image has a maximum gray level of 6 and pixels under consideration have gray level values of 1 and 2. Therefore, the cost is computed as



Similar computations are used for all the pairs of pixels and the final graph is drawn as in Figure 6.12.

In general, finding a minimum cost path is not trivial and in order to reduce the search time an effective heuristic procedure is used. The steps involved in the heuristic procedure are explained as follows.

Let S be the start node and pass through an intermediate node n to reach the goal node. Let R(n) be the estimate of the cost of the minimum cost path from start node to goal node. Then R(n) can be given by the expression




  G(n) is the cost of the lowest cost path from S to n found so far.

  H(n) is obtained by using available heuristic information.

  H(n) is revised as and when we move from one node to another node on the basis of minimum cost path.



FIGURE 6.12 (a) Graph for finding an edge and the minimum cost path is as ABC (4 + 4 + 6 = 14) (b) Minimum cost edge


In this section, we discuss segmentation technique that finds the region of interest directly. This technique uses the neighborhood pixel properties. The basic rules used in this approach are explained in Section 6.6.1.

6.6.1 Basic Rules for Segmentation

Let R be the entire image region. Then by using the segmentation algorithm the image region R is subdivided into ‘n’ subregions R1, R2,…, Rn such that

  1. . This expression indicates that the segmentation process is complete in all respects.
  2. Ri is a connected region for i = 1, 2,…, n. This condition indicates that the points in the Ri must be connected.
  3. RiRj = Φ for all i and j, where i ≠ j. This condition indicates that region must be disjoint.
  4. P(Ri) = True for i = 1,2,…, n. This means all the pixels in the region Ri have the same intensity.
  5. P(Ri U Ri) = False for i ≠ j. This condition indicates that the regions Ri and Rj are different in the sense of the predicate P.

6.6.2 Region Growing by Pixel Aggregation

Region growing is a technique that groups the pixels or subregions into larger regions. The simplest of these approaches is pixel aggregation. In this approach a set of seed points are used as starting points and from this, regions grow by appending to each seed point those neighboring pixels that have similar properties.


Pixel Aggregation: A procedure used to group pixels with similar gray level properites in an image or a region in the image.


To illustrate this procedure, consider a small region of image 5 × 5 shown in Figure (6.13), where the gray levels are indicated by the numbers inside the cells. Let the pixel 2 and 7 with the co-ordinate (4, 2) and (3, 4), respectively be the two seeds. Using the two seeds as starting point, the regions are grown by applying the predicate property P. The predicate P to be used to include a pixel in either region is that the absolute difference between the gray level of that pixel and the gray level of the seed be less than a threshold value T.

Any pixel that satisfies this property simultaneously for both seeds is arbitrarily assigned to region R1.

Figure 6.13(b) shows the result obtained using the threshold value T = 3. The segmentation results show two regions namely, R1 and R2 and they are denoted by a′s and b′s. It is also noted that the two regions formed are independent of the starting point. However, if we choose T = 8, the resulting region is shown in Figure 6.13(c).

The pixel aggregation technique used to grow the region is simple and suffers from two difficulties. The first one is the selection of initial seeds that properly represent the region of interest and the second is the selection of suitable properties for including points in the various regions during the region growing processes. The number of seed points selected can be based on the nature of the problem. Figure 6.14(a) shows an image with a single seed point. The threshold used for the region growing is the absolute difference in the gray level between the seed and a candidate point, not exceeding 10% of the difference between maximum and minimum gray level in the entire image.



FIGURE 6.13 (a) A subimage with co-ordinates and gray values (b) Region growing with T = 3 (c) Region growing with T = 8


The property used to add a pixel into the region is an 8-connected one. Figure 6.14(b) shows the region in the early stages of region growing. Figure 6.14(c) shows the region in an intermediate stage of the region growing and Figure 6.14(d) shows the complete region grown by using this technique.

6.6.3 Region Splitting and Merging

In this technique an image is initially divided into a set of arbitrary subimages of disjoint regions, and then merge and/or split operations are carried out based on certain criteria. The split and merge algorithm is explained as follows.

Let R represent the entire image region and a predicate P(Ri) is used to check the conditions. For any region Ri for i = 1 to 4, P(Ri) = TRUE is applied. For any image of region R, subdivide the image into smaller and smaller regions so that for any region Ri, P(Ri) = FALSE. This means if P(Ri) = FALSE, divide the image into quadrants. If P(Ri) is false for any quadrant, subdivide that quadrant into subquadrant, and so on. This process can be conveniently represented by means of a quad tree shown in Figure 6.15(b). For a square region [shown in Figure 6.15(a)] the splitting and merging procedure is applied and the result is given by means of a quad tree as shown in Figure 6.15(b).



FIGURE 6.14 (a) An image with seed point (given as dark point) (b) Region growing after few iterations, (c) Intermediate stage of region growing (d) Final growth


In this approach splitting must be followed by merging operation.



FIGURE 6.15(a) An image subdivided into quadrants



FIGURE 6.15(b) Quad tree


The procedure used can be given by the following three steps:

  1. Split the given region Ri into four disjointed quadrants when P(Ri) = False
  2. Merge any adjacent regions Rj and Rk for which P(Rj U Rk) = True
  3. Stop when no further merging or splitting is possible.

The split and merge algorithm is illustrated with the sample image shown in Figure 6.16(a).

For simplicity, a dark object in a white background is considered as shown in Figure 6.16(a). Then for the entire image region R, P(R) = false and hence the image is split into four equal regions as shown in Figure 6.16(b).

In the second step all the four regions do not satisfy the predicate P(Ri) where Ri is one of the quadrant. Hence each quadrant is further subdivided into four smaller regions as depicted in Figure 6.16(c).



FIGURE 6.16 The steps of split and merge algorithm


At this point several regions can be merged with the exception of two subquadrants that include some points of the object. In Figure 6.16(c) the regions R11, R12, R21, R22, R23, R33,…, R14 satisfy the predicate P(Ri) = TRUE and all of them are merged to form a larger region whereas the regions R13 and R42 do not satisfy the predicate P(Ri). These two subquadrants do not satisfy the predicate and hence they split further as in Figure 6.16(d). At this point all the regions satisfy the predicate P and merging the appropriate region from the last split operation gives the final segmented region as in Figure 6.16(e).


Thresholding: Thresholding is an operation in which a reference gray level is selected using histogram such that the object and background can be separated.


Thresholding is one of the most important techniques used for image segmentation. In this section we discuss the various thresholding techniques for image segmentation. The merits and demerits of various techniques are also discussed.

6.7.1 Fundamental Concepts

The histogram of an image f(x, y) consisting of a light object on the dark background is shown in Figure 6.17(a). This histogram consists of two dominant regions, one for the object and the other for the background. For such an image it is easy to select a threshold T that separates the object and background region. Then for any point (x, y), f(x, y) > T is called an object point, otherwise the point is called a background point. A general case of this approach is shown in Figure 6.17(b).



FIGURE 6.17 (a) Histogram of an image consisting of dark background and a light object (b) Histogram for two objects in a dark background


Figure 6.17(b) has three dominant regions that characterize the histogram of the given image. This histogram corresponds to two different light objects on a dark background. From the histogram it is possible to select two different threshold values T1 and T2, respectively. Then a point (x, y) belongs to the first object if T 1 < f(x, y)≤ T2, or it belongs to the second object if f(x, y) > T2 and to the background if f(x, y) ≤ T1. Usually this kind of thresholding is called multilevel thresholding and is less reliable than its single threshold counterpart.

The reason for this is, that it is difficult to locate multiple thresholds in a given histogram for a real image. The thresholding technique can be put into three different types based on the function T and its associated parameters as given in equation (6.21).



where f(x, y) is the gray level at the point (x, y) and p(x, y) denotes some local property at that point (e.g. the average gray level of neighborhood center on (x, y)). The threshold image g(x, y) is given in equation (6.22).



In the thresholded image the pixel labeled 1 corresponds to the object, whereas pixels labeled 0 corresponds to the background. When the threshold value T depends only on f(x, y) the threshold technique is called global. If T depends on both f(x, y) and p(x, y), the threshold is called local. If T depends on all the three parameters, that is, the coordinates (x, y), local property p(x, y), and f(x, y) then the threshold is called dynamic.

6.7.2 Optimal Thresholding

Let us assume that an image has only two principal brightness regions. Let p(z) be the probability density function (histogram) of the gray level values in the image. The overall density function p(z) is the sum of two densities, one for the light and the other for the dark regions in the image. Further, the mixture parameters are proportional to the area of the picture of each brightness. It is possible to determine the optimal threshold for segmenting the image into two brightness regions if the form of the density function is known. Suppose an image contains two brightness values, the overall density function can be given by the equation (6.23).



In the case of Gaussian distribution the P(z) can be given as



where μ1 and μ2 are the mean values of the two brightness levels, σ1and σ2 are the standard deviations of the means. P1 and P2 are a priori probabilities of the two gray levels. The constraint P1 + P2 = 1 must be satisfied so that these are the only five unknown parameters.

Suppose the dark region corresponds to the background and the bright region corresponds to an object, then a threshold T may be defined so that all the pixels with gray level below T are considered as background points and all pixels with gray levels above T are considered as object points.

The probability of error in classifying an object point as background point is given by



Similarly, the probability of error in classifying the background point as an object point is given by



Therefore, the overall probability of error is given by



To find a threshold value for which the error is minimum, requires differentiating E(T) with respect to T and equating the result to zero.




Applying the above result to Gaussian density, taking logarithm and simplifying gives the quadratic equation






If the variances are equal σ2 = σ12 = σ22 a single threshold is sufficient.



If the power probabilities are equal P1 = P2,the nth optimal threshold is the average of the means, that is,



Thus the optimum threshold is obtained using equation (6.30).

6.7.3 Threshold Selection Based on Boundary Characteristics

The threshold selection depends on the mode peaks in a given histogram. The chances of selecting a good threshold should be considerably enhanced if the histogram peaks are tall, narrow, symmetrical, and separated by deep valleys. One way of improving the histogram is by considering only those pixels that lie on or near the boundary between the objects and background. By doing so, the appearance of the histogram will be less dependent on the relative sizes of the object and background. This will inturn reduce ambiguity and lead to improvement in threshold selection.

For example, consider an image of large background area of constant intensity with a small object then the histogram of such an image will be a large peak for background region and small peak for object. On the other hand, if we consider only the pixels on or near the boundary between the object and background, the resulting histogram will contain peaks of approximately the same height. In addition, the probability that a pixel lies on an object will be approximately equal to that on the background, hence improving the symmetry of the histogram peaks. Finally, selecting the pixels that satisfy the gradient and Laplacian operators will give rise to histogram peaks with deep valley between the peaks.

A three level image is formed using the gradient at any point (x, y) in the image and the Laplacian at the image at the same point. The three level image is denoted by U(x, y) and is given in equation (6.31).



where the symbols 0,+, and − represent any three distinct gray levels, T is the threshold and the gradient and Laplacian are computed at every point in the image f(x, y).

For an image with a dark object on a light background, the meaning for the labels 0, +, − can be given as follows:

In the three level image U(x, y) the label ‘0’represents all the pixels that are not on the edge, the label ‘+’represents all the pixels on the dark side of an edge and the label ‘−’represents all the pixels on the light side of an edge. From the three level image it is possible to generate a segmented binary image in which 1′s correspond to the object of interest and 0′s correspond to the background. The transition from light background to dark object is represented by the occurrence of the label ‘−’ followed by the label ‘+’ in the image U(x, y). The interior of the object consists of the labels either ‘0’ or ‘+’. The transition from the object back to the background is represented by the occurrence of the label ‘+’ followed by the label ‘−’. Thus when we scan the image either in the horizontal direction or vertical direction, the string of labels will be as follows:



where (…) represents any combination of ‘+’, ‘−’, and ‘0’. The innermost parenthesis ‘0’ or ‘+’ corresponds to the object points and they are labeled as 1. The remaining pixels along the scan line are labeled 0. A sample image for a blank cheque is shown in Figure 6.18(a).

Figure 6.18(b) shows the histogram as a function of gradient values for pixels with gradients greater than 8. This histogram has two dominant modes that are symmetrical nearly of the same height and are separated by a distinct valley.

Figure 6.18(c) gives the segmented image obtained by using equation (6.31) with T at or near the midpoint of the valley (T = 19).

6.7.4 Use of Motion in Segmentation

In many of the applications such as, robotic application, autonomous motion navigation, and dynamic scene analysis, motion is used as a powerful tool to extract objects of interest from a background of unwanted detail. The motion arises due to the relative displacement between the sensing system and scene being used. There are two different approaches to apply the motion in segmentation and they are

  1. Spatial domain approach
  2. Frequency domain approach.

Let us consider the spatial domain techniques in the forthcoming section.

Spatial domain techniques   Let us consider that there are a number of image frames available, taken at different instances of time. Let f(x, y, t1) and f(x, y, t2) be the two images taken at time t1 and t2, respectively. Also assume that there are many objects in the image and only one object in motion. Under such circumstances, it is possible to detect the boundary of the object in motion. The procedure to obtain the boundary of the object in motion is to find the difference image between the images taken at different instances of time. The image taken at 0th interval can be used as a reference image and the images taken at subsequent intervals can be used to subtract from the reference image, so that the resulting image contains only the boundary of the moving object and cancels all the stationary objects.



FIGURE 6.18 (a) Image of a cheque (b)Histogram of the image (c) Segmented image


The difference image between two images taken at time t1 and t2 may be defined as



Figure 6.19(a) shows a reference image having only one moving object at uniform velocity with white background. Figure 6.19(b) shows the image taken at time t1. Figure 6.19(C) shows the difference image between the reference and the image taken at time t1. This procedure will give rise to expected results only if the two images mentioned earlier are taken at relatively constant illumination. In Figure 6.19(c), ‘1’ corresponds to the boundary of the moving object. Note that the two disjoint regions, shown in the Figure 6.19(C), are the leading edge and trailing edge of the moving object.



FIGURE 6.19 (a) Original image with a moving object (b) Image taken at time t1 (c) Difference image


The difference image often contains isolated entries due to spurious noise. These entries can be eliminated by using 4 or 8 connectivity analysis. There is an alternative approach to eliminate spurious noise entries by considering changes at location over several frames and using a memory into the process.

Consider a sequence of image frames taken at times t1, t2, t3,…, tn and represented as f(x, y, t1), f(x, y, t2),…, f(x, y, tn). Let the first image frame f(x, y, t1) be the reference image. An accumulative difference image is obtained by comparing the reference image in the sequence. A count for each pixel location in the accumulative image is incremented every time a difference occurs at that pixel location between the reference and the image in the sequence. Thus when the kth frame is being compared with the reference, the entry in the given pixel of the accumulative image gives number of times the gray level at that position was different from the corresponding pixel value in the reference image. The differences are computed using the equation (6.32). These concepts are illustrated in Figure 6.20.



FIGURE 6.20 (a) Reference image (b)−(e) Frames taken at 2, 3, 4 and 5th instant of time (f)−(i) Accumulative differences of image for frames 2, 3, 4, and 5


Figure 6.20(a) shows the reference image and frames (b) and (c) are the frames corresponding to second and third in the sequence, and (e) corresponds to the 5th frame in the sequence. Figures 6.20(f)(i) are the corresponding accumulative images and are explained here.

In Figure 6.20(f) the left hand columns 1′s gives the result of difference between the object in Figure 6.20(a) and the background in Figure 6.20(b). The right hand column of 1′s is due to the background in the reference image and the leading edge of the moving object. At the end of the third frame shown in Figure 6.20(d), the first nonzero column of the difference between that column in the reference image and the corresponding column in the subsequent image is shown in Figure 6.20(h). A similar explanation can be given to other entries.

Often we use three types of accumulative images and they are,

  1. Absolute Accumulative Difference Image (AADI)
  2. Positive Accumulative Difference Image (PADI) and
  3. Negative Accumulative Difference Image (NADI).

The positive and negative quantities are obtained using equation (6.33).



where T is the threshold. D1,2(x, y) is the difference between the images f(x, y, t1) and f(x, y, t2) and they are the images taken at the timings t1 and t2, respectively.

If the gray levels of the given object are numerically greater than the background then the difference image is positive and is compared against a positive threshold, and if it is negative the difference is compared against a negative threshold. This definition is reversed, if the gray level of the objects are less than the background.

Figure 6.21(a)–(c) shows the AADI, PADI, and NADI for 5 × 5 pixel object intensity of which is greater than the background and moves with a constant velocity in the southeastern direction. At each interval of time the object moves one step right and one step down. The spatial growth of PADI is stopped when the object is displayed from its original position. The AADI contains the region of both PADI and NADI and the entries in these images indicate the speed and direction of movement of the object.



FIGURE 6.21 Different difference images. (a) Absolute accumulative difference image (b) Positive accumulative difference image (c) Negative accumulative difference image



The image processing techniques discussed in the previous chapter use the input and output as images. But in techniques discussed in this chapter, the inputs are images and the outputs are attributes extracted from these images. In this chapter, segmentation based on the two basic properties such as gray level discontinuity and similarity are discussed. It begins with the discussion of methods detecting gray level discontinuities such as isolated points, lines, and edges. These detection algorithms form the basic tool for image segmentation. The two-dimensional gradient operator plays a vital role in detecting the edges in the images. The digital form of masks such as Sobel and Prewitts are well explained with suitable illustrative diagrams. The various edge points must also be linked in order to form a smooth and complete edge. One of the procedures to link the edge points is graph theoretic approach which is dealt in detail. The gray level aggregation and region splitting and merging are the two important segmentation techniques that use gray level similarities and concepts, these are explained with suitable diagrams. Image segmentation is an essential step in pattern recognition applications. It is also used in scene analysis problems. The motion-based segmentation techniques are also given in this chapter.

The choice of one segmentation technique over another is characterized by the type of problem being considered. This can be readily seen from the range of examples presented. Although the level of the presentation is introductory in nature, the depth of the material covered is sufficient to serve as the basis for independent reading.

Review Questions

Short Type Questions

  1. How will you detect isolated points in an image?
  2. Give the masks used for detecting horizontal, vertical, and ±45° slanting lines.
  3. What is the role of gradient operator and Laplacian operator in segmentation?
  4. Distinguish between global, local, and dynamic thresholding.
  5. How will you obtain large symmetrical peaks of equal heights with deep valley in between in deciding threshold value for segmenting the image?
  6. Why do we need edge linking procedure?
  7. State the basic rules used for region-oriented segmentation.
  8. How will you select the seed points for region growing?
  9. What are the rules used in region splitting and merging approach?

    Descriptive Type Questions


  10. Explain with suitable illustrations, the procedure used for line detection.
  11. Describe the edge linking and boundary detection procedure in detail.
  12. Sketch and explain the gradient of the image shown in Figure 6.22.



    FIGURE 6.22


  13. Use the graph theoretic approach to find the edge corresponding to the minimum cost path for the subimage shown in Figure 6.23.



    FIGURE 6.23


  14. Explain the thresholding approach of segmenting of an image.
  15. Explain in detail the basic principle used in region growing by pixel aggregation with an illustrative example.
  16. How will you use the region splitting and merging concepts for segmenting the given image.
  17. Apply the region splitting and merging procedure to segment the object given in the Figure 6.24.



    FIGURE 6.24


  18. With suitable illustrative examples explain how motion can be used to segment an image.
  19. What are the types of accumulative difference images? Explain in detail how they can be obtained.