Project Overview
This project involved developing a Hacker News Article Sort Validator
using Node.js and Microsoft's Playwright framework. The assignment required building
a web scraping tool to fetch the newest articles from Hacker News and validate their chronological ordering,
demonstrating proficiency in test automation, data validation, and quality assurance principles.
Assignment Scope & Requirements
- Fetch exactly the first 100 articles from Hacker News/newest using Playwright.
- Validate that articles are sorted from newest to oldest based on their timestamps.
- Report any ordering inconsistencies with detailed error information.
- Handle edge cases including invalid timestamps, pagination navigation, and error pages.
- Provide configurable output options for flexibility (number of articles, titles to display, verbose mode).
- Implement robust error handling and logging throughout the application.
Key Objectives & Skills Demonstrated
- Web Scraping with Playwright: Navigated multi-page content, extracted dynamic data, and handled browser interactions.
- Test Automation & Validation: Implemented validation logic to detect and report sorting errors with precision.
- Data Processing & Timestamp Handling: Parsed ISO timestamp strings and compared them for chronological order.
- Modular Code Architecture: Organized code into separate, reusable modules (CLI parser, validator, logger, result handler).
- Error Handling & Edge Cases: Gracefully handled pagination, invalid data, network issues, and error pages.
- Command-Line Interface Design: Built a user-friendly CLI with optional flags, default values, and help documentation.
- Testing & Quality Assurance: Created comprehensive unit tests using Playwright's test framework to validate all components.
- Attention to Detail: Implemented detailed logging, verbose output modes, and test-error injection for validation.
Technology Stack:
- Node.js - Runtime environment
- Playwright - Browser automation and web scraping
- JavaScript (ES6+) - Implementation language
- @playwright/test - Testing framework
Implementation Highlights
Modular Architecture
The solution is structured into focused modules:
- index.js - Main orchestrator that coordinates scraping, validation, and output.
- cli.js - Parses command-line arguments and provides help documentation.
- validator.js - Core validation logic for timestamp parsing and error detection.
- result.js - Result class that encapsulates article data and provides analysis methods.
- logger.js - Singleton logger for collecting and reporting warnings.
- debug.js - Test error injection utilities for validation testing.
Key Features
- Pagination Handling: Automatically navigates through Hacker News pages until target article count is reached.
- Timestamp Validation: Parses ISO format timestamps and identifies invalid or malformed entries.
- Error Reporting: Provides detailed error information with article indices and comparison data.
- Flexible Output: Supports displaying article titles, verbose error details, and execution timing.
- Test Mode: Includes intentional error injection for validating the sorting algorithm.
Usage Examples
# Basic usage - fetch and validate 100 articles
node index.js
# Show first 5 article titles with validation
node index.js 100 5
# Verbose mode with detailed error reporting
node index.js 100 5 --verbose
# Test mode with intentional sorting errors
node index.js 100 0 --test-error
# Display help information
node index.js --help
Personal Contributions
As the sole developer of this assignment, I was responsible for:
- Architecting the overall solution and module structure.
- Implementing web scraping logic with Playwright, including pagination and dynamic content extraction.
- Developing the timestamp parsing and validation algorithm to detect sorting errors.
- Designing the CLI interface with argument parsing and flag handling.
- Creating comprehensive logging and error reporting mechanisms.
- Writing unit tests to validate all components and edge cases.
- Implementing test-error injection for validating the validation logic itself.
Testing & Quality Assurance
The project includes comprehensive test coverage:
- CLI Parser Tests: Validates argument parsing, default values, and flag handling.
- Validator Tests: Tests timestamp parsing and error detection with various edge cases.
- Result Tests: Verifies the Result class methods and error reporting.
- Integration Tests: End-to-end validation of the entire workflow.
- Test-Error Tests: Ensures error injection works correctly for validation testing.
All tests are run using Playwright's testing framework with the command: npx playwright test
Video Walkthrough
Watch a complete demonstration of the project, including the motivation for joining QA Wolf
and a live walkthrough of the code and successful execution: