Define and explain the Big-O, Big-Ω and Big-Θ notations with examples.
Answer:
Asymptotic Analysis
All three notations are tools for asymptotic analysis. This means we analyze how an algorithm’s runtime or space requirements grow as the input size (n) grows towards infinity. We don’t care about the exact number of milliseconds; we care about the rate of growth.
Imagine sorting a list:
For 10 items, Algorithm A might take 10 ms and Algorithm B might take 20 ms.
For 1000 items, Algorithm A might take 1000 ms, but Algorithm B might take only 500 ms.
Asymptotic analysis helps us see that Algorithm B scales better, even if it was slower for a very small input.
1. Big-O Notation (Upper Bound / "Worst Case")
Definition:Big-O gives an upper bound on the growth of a function. It describes the worst-case scenario for an algorithm’s time complexity.
Formally, we say a function f(n) is O(g(n)) if there exist positive constants c and n₀ such that: 0 <= f(n) <= c * g(n) for all n >= n₀.
In plain English: "The runtime of the algorithm grows no faster than g(n) for large n."
Example: Sequential Search
Imagine a function that searches for a value in an unsorted list by checking each element one by one.
Best case: The item is first (O(1)).
Worst case: The item is last or not present (O(n)).
We describe this algorithm as O(n). Even though it’s sometimes faster, we are certain it will never be slower than linear time. It’s an upper bound.
Common Big-O Complexities (from best to worst):
O(1): Constant Time (e.g., accessing an array element by index)
O(log n): Logarithmic Time (e.g., binary search)
O(n): Linear Time (e.g., simple loop, sequential search)
O(n log n): Linearithmic Time (e.g., efficient sorting like Merge Sort, QuickSort)
O(n²): Quadratic Time (e.g., nested loops, Bubble Sort)
O(2ⁿ): Exponential Time (e.g., naive recursive Fibonacci, brute-force algorithms)
Question:-1(b)
Explain the string matching problem with an example.
Answer:
The String Matching Problem
Definition:
The String Matching Problem is the task of finding all occurrences of a shorter string P (called the pattern) within a longer string T (called the text).
Input:
A TextT of length n (e.g., n = 20 characters)
A PatternP of length m (e.g., m = 5 characters), where m <= n.
Output:
All starting positions (indices) i in the text T such that the substring T[i ... i+m-1] is exactly equal to the pattern P.
A Concrete Example
Let’s use a simple example to illustrate the problem.
Text T: "ABABDABACDABABCABAB" (Let’s say n = 19)
Pattern P: "ABABC" (m = 5)
Our goal is to find every position in T where the pattern "ABABC" begins.
Step-by-Step Manual Matching (The Naive Approach):
We will slide the pattern over the text and check for a match at each possible starting position (from index 0 to index n - m, which is 14).
‘B’ != ‘A’ ❌ Mismatch (We can stop on the first character)
Try position i = 2:
T[2:6] = "ABDAB"
Compare with P = "ABABC"
‘A’‘A’, ‘B’‘B’ -> ‘D’ != ‘A’ ❌ Mismatch
Try position i = 3:
T[3:7] = "BDABA"
Compare with P = "ABABC"
‘B’ != ‘A’ ❌ Mismatch
Try position i = 4:
T[4:8] = "DABAC"
Compare with P = "ABABC"
‘D’ != ‘A’ ❌ Mismatch
Try position i = 5:
T[5:9] = "ABACD"
Compare with P = "ABABC"
‘A’‘A’, ‘B’‘B’, ‘A’==’A’ -> ‘C’ != ‘B’ ❌ Mismatch
Try position i = 6:
T[6:10] = "BACDA"
Compare with P = "ABABC"
‘B’ != ‘A’ ❌ Mismatch
Try position i = 7:
T[7:11] = "ACDAB"
Compare with P = "ABABC"
‘A’==’A’, -> ‘C’ != ‘B’ ❌ Mismatch
Try position i = 8:
T[8:12] = "CDABA"
Compare with P = "ABABC"
‘C’ != ‘A’ ❌ Mismatch
Try position i = 9:
T[9:13] = "DABAB"
Compare with P = "ABABC"
‘D’ != ‘A’ ❌ Mismatch
Try position i = 10:
T[10:14] = "ABABC"
Compare with P = "ABABC"
‘A’‘A’, ‘B’‘B’, ‘A’‘A’, ‘B’‘B’, ‘C’==’C’ ✅ Match!
Output the starting index: 10
Try position i = 11:
T[11:15] = "BABCA"
Compare with P = "ABABC"
‘B’ != ‘A’ ❌ Mismatch
…and so on until i = 14. No other matches are found.
Final Result: The pattern "ABABC" was found starting at index 10 of the text.
Visualization of the Match:
Text T: A B A B D A B A C D A B A B C A B A B
Indices: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Pattern P: A B A B C
At index 10: A B A B C <-- Perfect Match
Algorithms for String Matching
The naive approach we used in the example (checking every single position) is simple but inefficient for long texts, with a worst-case time complexity of O(n*m).
Many more efficient algorithms have been developed to solve this problem faster:
Knuth-Morris-Pratt (KMP) Algorithm: Preprocesses the pattern to avoid unnecessary comparisons. Complexity: O(n + m).
Rabin-Karp Algorithm: Uses hashing to quickly check for potential matches. Complexity: Average O(n + m), worst-case O(n*m).
Boyer-Moore Algorithm: Checks from the end of the pattern and uses smart shifting rules. Often works in sub-linear time (O(n/m) in the best case).
Aho-Corasick Algorithm: Used for matching multiple patterns simultaneously (e.g., checking a document against a whole dictionary of words).
Question:-1(c)
Explain the Longest Common Subsequence problem with an example.
Answer:
Longest Common Subsequence (LCS) Problem
The Longest Common Subsequence (LCS) problem involves finding the longest subsequence common to two given sequences, where the elements of the subsequence appear in the same relative order but not necessarily consecutively in the original sequences. A subsequence is formed by deleting some or no elements from a sequence without altering the order of the remaining elements. The indices of the elements in the subsequence must be in a strictly increasing order in both sequences.
Definition
Given two sequences S1S1 and S2S2, a sequence ZZ is a common subsequence if it is a subsequence of both S1S1 and S2S2. The LCS is the longest such sequence. For example, if:
S1={B,C,D,A,A,C,D}S1 = \{B, C, D, A, A, C, D\}
S2={A,C,D,B,A,C}S2 = \{A, C, D, B, A, C\}
A common subsequence must maintain the order of elements as they appear in both sequences. For instance, {A,D,B}\{A, D, B\} is not a valid subsequence of S1S1 because the order (A after D after B) does not match the order in S1S1.
Example
Consider:
S1={B,C,D,A,A,C,D}S1 = \{B, C, D, A, A, C, D\}
S2={A,C,D,B,A,C}S2 = \{A, C, D, B, A, C\}
Some common subsequences include:
{B,C}\{B, C\}
{C,D}\{C, D\}
{A,C}\{A, C\}
{C,D,A,C}\{C, D, A, C\}
{A,A,C}\{A, A, C\}
Among these, {C,D,A,C}\{C, D, A, C\} is the longest common subsequence, with a length of 4.
Using Dynamic Programming to Find the LCS
To efficiently compute the LCS, we use a dynamic programming (DP) approach, which avoids the exponential complexity of a recursive solution by storing intermediate results in a table.
Steps:
Create a DP Table: Initialize a table LCS[m+1][n+1]LCS[m+1][n+1], where mm and nn are the lengths of S1S1 and S2S2, respectively. The first row and column are filled with zeros, representing empty sequences.
Fill the Table:
For each cell LCS[i][j]LCS[i][j], compare the characters S1[i-1]S1[i-1] and S2[j-1]S2[j-1].
If they match (S1[i-1]==S2[j-1]S1[i-1] == S2[j-1]), set LCS[i][j]=LCS[i-1][j-1]+1LCS[i][j] = LCS[i-1][j-1] + 1.
If they differ, set LCS[i][j]=max(LCS[i-1][j],LCS[i][j-1])LCS[i][j] = \max(LCS[i-1][j], LCS[i][j-1]).
Retrieve the LCS Length: The value in LCS[m][n]LCS[m][n] gives the length of the LCS.
Backtrack to Find the Sequence: Starting from LCS[m][n]LCS[m][n], trace back through the table to construct the LCS:
If S1[i-1]==S2[j-1]S1[i-1] == S2[j-1], include the character in the LCS and move diagonally up-left (i-1,j-1i-1, j-1).
Otherwise, move in the direction of the larger value (up or left).
Example DP Table for S1={B,C,D,A,A,C,D}S1 = \{B, C, D, A, A, C, D\}, S2={A,C,D,B,A,C}S2 = \{A, C, D, B, A, C\}:
ε
A
C
D
B
A
C
ε
0
0
0
0
0
0
0
B
0
0
0
0
1
1
1
C
0
0
1
1
1
1
2
D
0
0
1
2
2
2
2
A
0
1
1
2
2
3
3
A
0
1
1
2
2
3
3
C
0
1
2
2
2
3
4
D
0
1
2
3
3
3
4
Explanation:
For LCS[2][2]LCS[2][2] (C vs. C): Match, so LCS[2][2]=LCS[1][1]+1=1LCS[2][2] = LCS[1][1] + 1 = 1.
For LCS[2][1]LCS[2][1] (B vs. A): No match, so LCS[2][1]=max(LCS[1][1],LCS[2][0])=0LCS[2][1] = \max(LCS[1][1], LCS[2][0]) = 0.
For LCS[6][6]LCS[6][6] (C vs. C): Match, so LCS[6][6]=LCS[5][5]+1=4LCS[6][6] = LCS[5][5] + 1 = 4.
The value LCS[7][6]=4LCS[7][6] = 4 indicates the LCS length is 4.
Backtracking:
Starting from LCS[7][6]LCS[7][6]:
At (7,6)(7,6), C = C, include C, move to (6,5)(6,5).
At (6,5)(6,5), no match, move to (5,5)(5,5) (larger value).
At (5,5)(5,5), A = A, include A, move to (4,4)(4,4).
At (4,4)(4,4), no match, move to (4,3)(4,3).
At (4,3)(4,3), D = D, include D, move to (3,2)(3,2).
At (3,2)(3,2), C = C, include C, move to (2,1)(2,1).
The LCS is {C,D,A,C}\{C, D, A, C\}.
Why Dynamic Programming?
A recursive approach to LCS has a time complexity of O(2^(max(m,n)))O(2^{\max(m,n)}) due to redundant computations. Dynamic programming stores results in a table, reducing the time complexity to O(mn)O(mn), where mm and nn are the lengths of the sequences. The space complexity is also O(mn)O(mn), though it can be optimized to O(min(m,n))O(\min(m,n)).
Algorithm
Input: Sequences S1 and S2
Initialize LCS[m+1][n+1] with zeros
For i from 1 to m:
For j from 1 to n:
If S1[i-1] == S2[j-1]:
LCS[i][j] = LCS[i-1][j-1] + 1
Else:
LCS[i][j] = max(LCS[i-1][j], LCS[i][j-1])
LCS[m][n] is the length of the LCS
Backtrack to find the sequence
Code Example (Python)
deflcs_algo(S1, S2, m, n):
L = [[0for _ inrange(n+1)] for _ inrange(m+1)]
# Build the DP tablefor i inrange(m+1):
for j inrange(n+1):
if i == 0or j == 0:
L[i][j] = 0elif S1[i-1] == S2[j-1]:
L[i][j] = L[i-1][j-1] + 1else:
L[i][j] = max(L[i-1][j], L[i][j-1])
# Backtrack to find the LCS
index = L[m][n]
lcs = [""] * index
i, j = m, n
while i > 0and j > 0:
if S1[i-1] == S2[j-1]:
lcs[index-1] = S1[i-1]
i -= 1
j -= 1
index -= 1elif L[i-1][j] > L[i][j-1]:
i -= 1else:
j -= 1print(f"S1 : {S1}\nS2 : {S2}")
print(f"LCS: {''.join(lcs)}")
# Test
S1 = "BCDAA"
S2 = "ACDBAC"
m = len(S1)
n = len(S2)
lcs_algo(S1, S2, m, n)
Output:
S1 : BCDAA
S2 : ACDBAC
LCS: CDA
This code computes the LCS efficiently and outputs both the sequences and the result. The dynamic programming approach ensures optimal performance for practical applications like text comparison or bioinformatics.