OOD
Linux Find Command
Basic
Q: Design a Unix File Search API that allows users to search for files based on various criteria, such as:
- File Extension: Search for files with specific extensions (e.g.,
.txt
,.jpg
). - File Size: Search for files within a specified size range (e.g., greater than 1MB, less than 100KB).
- File Name: Search for files that match a specific name or pattern.
The design should be flexible and maintainable, allowing for easy addition of new search criteria in the future (e.g., by date modified, by owner). Additionally, the API should support combining search criteria using logical operators, such as AND and OR, to enable complex queries.
A: We can use the Specification Pattern combined with the Builder Pattern. This design will ensure that the API is maintainable and extensible for future requirements.
Code Template
Abstract Class And Subclasses
- This abstract class
FileSpec
defines the interface for all spec. It contains a methodis_satisfied_by
that checks if a given file meets the speci. - Subclasses inherit from FileSpec and implements the
is_satisfied_by
method.
Two Specifications
AndSpec
class allows the combination of multiple spec using a logical AND operation. It checks if a file satisfies all the provided spec.OrSpec
class allows the combination of multiple spec using a logical OR operation. It checks if a file satisfies at least one of the provided spec.
File Search
FileSearcher
class responsible for searching files based on a given speci. It takes a list of files and uses the speci to filter them.FileSearchBuilder
class provides a fluent interface for constructing complex file search spec. It allows users to add spec incrementally and build either an AND or OR combination.
Example Usage
- The use of the
FileSearchBuilder
allows for easy modification and chaining of search criteria, making the code modular and reusable.
class FileSpec:def is_satisfied_by(self, file: File) -> bool:# Abstract method to be implemented by subclassespassclass ExtensionSpec(FileSpec):def __init__(self, extension: str):self.extension = extensiondef is_satisfied_by(self, file: File) -> bool:# Check if the file's extension matches the specified extensionreturn file.extension == self.extensionclass SizeSpec(FileSpec):def __init__(self, size: int, comparator: str):self.size = sizeself.comparator = comparatordef is_satisfied_by(self, file: File) -> bool:# Create a dictionary mapping operators to boolean expressionsif self.comparator not in ops:raise ValueError(f"Invalid comparator: {self.comparator}")ops = {'>': file.size > self.size,'<': file.size < self.size,'=': file.size == self.size}return ops.get(self.comparator, False)
Common Questions
How can new search criteria be added to the API?
- Extensibility: To add a new search criterion (e.g., by date modified), simply create a new class that implements FileSpec:
class DateModifiedSpecification(FileSpec):def __init__(self, date):self.date = datedef is_satisfied_by(self, file):return file.date_modified >= self.date
- Integration: This new speci can be integrated into the FileSearchBuilder without modifying existing code, adhering to the Open/Closed Principle. Users can now chain this new criterion:
search_spec = builder.with_extension('.txt').with_size(1024).with_date_modified('2023-01-01').build()
What are the potential drawbacks of this design?
- Performance: If the file list is large, filtering may become slow. Each spec checks every file, which can lead to performance bottlenecks.
- Complexity: Introducing many spec can complicate the codebase, making it harder to understand and maintain. It's important to document each spec clearly.
How would you modify this design to add caching for frequently used searches?
- Cache Decorator: Wrap the FileSearcher with a caching layer to store results of frequently executed searches:
class CachedFileSearcher(FileSearcher):def __init__(self, searcher):self.searcher = searcherself.cache = {}def search(self, spec):key = str(spec) # Generate a unique key based on specif key in self.cache:return self.cache[key]results = self.searcher.search(spec)self.cache[key] = resultsreturn results
- LRU Cache: Implement an LRU (Least Recently Used) cache for search results to manage memory effectively.
- Cache Key: The cache key can be based on the parameters used in the spec.
- Cache Invalidation: Ensure to invalidate the cache when files change to maintain accuracy.
How does this design handle error cases and invalid inputs?
- Input Validation: Validate inputs in the constructors of specifications to ensure they are valid:
class SizeSpec(FileSpec):def __init__(self, size):if size < 0:raise ValueError("Size must be non-negative")self.size = size
- Error Handling: Implement proper error handling in the
is_satisfied_by
methods to catch unexpected issues. - Custom Exceptions: Use custom exceptions for different error cases to provide clearer feedback.
- Builder Validation: Add validation in the Builder methods to ensure that the search criteria are valid before execution.
How would you make this design thread-safe?
- Immutable File Objects: Make File objects immutable to prevent concurrent modifications.
- Thread-Safe Collections: Use thread-safe collections (like queue.Queue) for the file list.
- Synchronization: Implement synchronization in FileSearcher if needed to prevent race conditions.
How would you optimize the search performance for large file systems?
- Indexing: Implement indexing for common search criteria to speed up lookups.
class FileIndexer:def __init__(self):self.index = {}def index_file(self, file):# Index by extensionif file.extension not in self.index:self.index[file.extension] = []self.index[file.extension].append(file)
- Parallel Processing: Use parallel processing (e.g., concurrent.futures) to handle large file lists efficiently.
- Lazy Loading: Implement lazy loading for file attributes to reduce memory usage.
- Early Termination: Add early termination options for searches to stop processing once a certain condition is met, improving performance.