Output File Management ====================== This document describes the output file management system integrated into the Rift toolkit. Overview -------- The output file management system provides tools for handling output files in the system. Output files are processed from a source directory and deployed to a target directory with specific ownership and permissions using atomic copy operations to prevent early access by other system processes. After successful processing, source files are moved to a processed directory to prevent reprocessing in subsequent runs. .. warning:: **File Expiration Policy**: Files in ``/var/abyss/output`` and ``/var/abyss/input/processed`` are automatically deleted after 24 hours (configurable) to save disk space. This expiration process runs with the output cron job and **permanently deletes files** that cannot be recovered. If you need longer retention, configure ``FILE_EXPIRATION_HOURS`` to a higher value or implement your own backup strategy. Directory Structure ------------------- Source Directory ~~~~~~~~~~~~~~~~ - **Path**: ``/opt/exports/abyss-default/outputs/dataExporterinspection`` (configurable via ``OUTPUT_SOURCE_DIR`` environment variable) - **Purpose**: Source location where output files are generated by the system Target Directory ~~~~~~~~~~~~~~~~ - **Path**: ``/var/abyss/output`` (configurable via ``OUTPUT_TARGET_DIR`` environment variable) - **Purpose**: Destination directory where output files are atomically copied for consumption Processed Directory ~~~~~~~~~~~~~~~~~~~ - **Path**: ``/opt/exports/abyss-default/outputs/dataExporterinspection/processed`` (configurable via ``OUTPUT_PROCESSED_DIR`` environment variable) - **Purpose**: Archive directory where successfully processed files are moved to prevent reprocessing - **Auto-creation**: Directory is automatically created with proper ownership and permissions if it doesn't exist File Properties ~~~~~~~~~~~~~~~ - **Owner**: UID 500:500 (configurable via ``OUTPUT_OWNER_UID`` and ``OUTPUT_OWNER_GID`` environment variables) - **Permissions**: 644 (configurable via ``OUTPUT_PERMISSIONS`` environment variable) - **File Types**: All file types (not limited to specific extensions) - **File Expiration**: Files older than 24 hours are automatically deleted (configurable via ``FILE_EXPIRATION_HOURS`` environment variable) User Configuration ~~~~~~~~~~~~~~~~~~ - **Default User**: ``rift`` (configurable via ``RIFT_USER`` environment variable) - **User Requirements**: Must have passwordless sudo access Prerequisites ------------- Sudo Access ~~~~~~~~~~~ All output file management operations require sudo access because the target directory operations need elevated privileges. The user executing these scripts must: - Be a sudoer - Have passwordless sudo configured for automated operations - Have sudo access to the target directories Passwordless Sudo Setup ~~~~~~~~~~~~~~~~~~~~~~~~ For automated operations, configure passwordless sudo for the RIFT_USER (default: ``rift``) by adding to ``/etc/sudoers``: .. code-block:: text rift ALL=(ALL) NOPASSWD: /bin/cp, /bin/mv, /bin/rm, /bin/chown, /bin/chmod, /usr/bin/find, /usr/bin/stat, /usr/bin/test Or for broader access: .. code-block:: text rift ALL=(ALL) NOPASSWD: ALL To use a different user, set the ``RIFT_USER`` environment variable: .. code-block:: bash export RIFT_USER=myuser Manual Commands --------------- Adding Output Files ~~~~~~~~~~~~~~~~~~~ .. code-block:: bash # Add all output files from source directory (requires sudo) ./rift output-add # Add with verbose output ./rift output-add --verbose # Show help ./rift output-add --help The ``output-add`` command: - Uses sudo for all file operations - Finds all files in the source directory (any file type) - Copies files atomically to prevent early access by other processes - Sets proper ownership and permissions on copied files - Moves successfully processed files to the processed directory - Uses temporary files with atomic move operations for safety Atomic Copy Process ------------------- The output file management system ensures atomicity by: 1. **Temporary File Creation**: Files are first copied to a temporary location with a unique name (``.filename.tmp.$$``) 2. **Permission Setting**: Ownership and permissions are set on the temporary file 3. **Atomic Move**: The temporary file is moved to the final location using ``mv``, which is atomic on most filesystems 4. **Source File Archival**: After successful copy, the original source file is moved to the processed directory 5. **Cleanup**: If any step fails, temporary files are cleaned up automatically This process prevents other system processes from accessing incomplete or improperly configured files, and ensures files are not processed multiple times. Configuration ------------- All configuration can be customized using environment variables: .. code-block:: bash # Source directory for output files export OUTPUT_SOURCE_DIR="/custom/source/path" # Target directory for output files export OUTPUT_TARGET_DIR="/custom/target/path" # Processed directory for archived files (defaults to ${OUTPUT_SOURCE_DIR}/processed) export OUTPUT_PROCESSED_DIR="/custom/processed/path" # File ownership (UID:GID) export OUTPUT_OWNER_UID=1000 export OUTPUT_OWNER_GID=1000 # File permissions (octal) export OUTPUT_PERMISSIONS=755 # File expiration time in hours (default: 24) export FILE_EXPIRATION_HOURS=48 # User running the script export RIFT_USER=myuser Differences from Input File Management -------------------------------------- The output file management system is very similar to input file management but differs in key ways: 1. **Source Directory**: Output files come from the system's export directory instead of a staging area 2. **Target Directory**: Output files go to ``/var/abyss/output`` instead of the input service directory 3. **Purpose**: Handles system-generated output files for consumption rather than user-provided input files Similarities include: - **Source Archival**: Files are moved to a processed directory after copying (like input files) - **Single Target**: Files are copied to one target directory - **File Types**: Accepts all file types, not just specific extensions - **Atomic Operations**: Uses temporary files and atomic moves for enhanced safety - **Default User**: Uses ``rift`` user by default - **Reprocessing Prevention**: Processed directory prevents files from being processed multiple times Automated Processing (Cron) ---------------------------- For automated output file processing, use the ``output-cron.sh`` script: .. note:: For comprehensive cron automation documentation including installation, configuration, and troubleshooting, see :doc:`cron-automation`. Cron Script Features ~~~~~~~~~~~~~~~~~~~~ - **Lock-based execution**: Prevents multiple instances from running simultaneously - **Log rotation**: Automatically rotates log files when they exceed 10MB - **System health checks**: Validates sudo access and disk space - **Comprehensive logging**: Detailed logging with timestamps to ``/var/log/output-processing.log`` - **Signal handling**: Graceful cleanup on script termination Cron Setup ~~~~~~~~~~ 1. **Copy the cron script to a system location**: .. code-block:: bash sudo cp tools/output-cron.sh /usr/local/bin/ sudo chmod +x /usr/local/bin/output-cron.sh 2. **Set up log file with proper permissions**: .. code-block:: bash sudo touch /var/log/output-processing.log sudo chown rift:rift /var/log/output-processing.log 3. **Add cron job for the rift user**: .. code-block:: bash # Switch to rift user and edit crontab sudo -u rift crontab -e # Add this line to run every 5 minutes */5 * * * * /usr/local/bin/output-cron.sh >> /var/log/output-processing.log 2>&1 Alternative Cron Frequencies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash # Every minute * * * * * /usr/local/bin/output-cron.sh >> /var/log/output-processing.log 2>&1 # Every 10 minutes */10 * * * * /usr/local/bin/output-cron.sh >> /var/log/output-processing.log 2>&1 # Every hour 0 * * * * /usr/local/bin/output-cron.sh >> /var/log/output-processing.log 2>&1 Monitoring Cron Jobs ~~~~~~~~~~~~~~~~~~~~~ 1. **Check if cron job is running**: .. code-block:: bash sudo -u rift crontab -l 2. **Monitor log file**: .. code-block:: bash tail -f /var/log/output-processing.log 3. **Check for running instances**: .. code-block:: bash ps aux | grep output-cron cat ${TMPDIR:-/tmp}/rift-cron/output-cron.pid 2>/dev/null 4. **View recent processing activity**: .. code-block:: bash grep "$(date '+%Y-%m-%d')" /var/log/output-processing.log Error Handling -------------- The system provides comprehensive error handling: - Directory validation before processing - Sudo access verification - Individual file operation error tracking - Cleanup of temporary files on failure - Detailed logging with timestamps - Summary reporting of processed files and errors - Lock file management to prevent concurrent execution - Automatic log rotation to prevent disk space issues Integration with Rift ---------------------- The output file management commands are fully integrated into the main Rift script: .. code-block:: bash # Show all available commands (includes output-add) ./rift help # Use output commands through main rift script (as default user rift) ./rift output-add # Use output commands with custom user RIFT_USER=myuser ./rift output-add File Workflow ------------- 1. **Generation**: Output files are generated by the system in ``/opt/exports/abyss-default/outputs/dataExporterinspection`` 2. **Processing**: Cron job (every 5 minutes) or manual command processes files 3. **Deployment**: Files are copied to ``/var/abyss/output`` with atomic operations 4. **Archival**: Source files are moved to processed directory after successful deployment 5. **Consumption**: Target applications can safely consume files from ``/var/abyss/output`` 6. **Expiration**: Files older than configured threshold (default 24 hours) are automatically deleted from target and processed directories 7. **Logging**: All operations are logged with timestamps This system ensures reliable, automated processing of output files with comprehensive logging and error handling, preventing data loss and ensuring files are available for downstream consumption. The automatic expiration helps maintain disk space by removing old files.