Shell script not working when run as wsrep_notify_cmd by Galera Cluster for MariaDB

3 min read 28-08-2024
Shell script not working when run as wsrep_notify_cmd by Galera Cluster for MariaDB


Troubleshooting Galera Cluster's wsrep_notify_cmd Shell Script Errors: A Practical Guide

Galera Cluster's wsrep_notify_cmd provides a powerful way to trigger actions based on cluster state changes. However, setting up and troubleshooting these scripts can be challenging, as they run under a specific context and environment. This article will analyze a common error encountered with wsrep_notify_cmd shell scripts and provide solutions based on real-world experiences from Stack Overflow.

The Problem: File Not Found Errors

The original question on Stack Overflow highlights a common issue: a shell script working when executed manually but failing with "File Not Found" errors when triggered by wsrep_notify_cmd. This usually stems from a mismatch in the environment or permissions within which the script is executed.

Understanding the Context

The script is intended to be run as the wsrep_notify_cmd in the Galera configuration. This means the script isn't executed by the user directly, but rather by the Galera cluster management process under a potentially different user and environment. This explains why the script works manually but fails when invoked by Galera.

Analyzing the Errors

Here's a breakdown of the error messages and their possible causes:

  • cannot create /home/ubuntu/Notification.txt: Directory nonexistent: This indicates that the script is unable to create the file because the parent directory home/ubuntu doesn't exist. This can happen if the user running the script through Galera's wsrep_notify_cmd doesn't have access to the user's home directory, or the home directory structure doesn't match.
  • /home/ubuntu/Notification.txt: No such file or directory: This implies that the script can't find the specified file. It's possible that the script is attempting to access the file in a location that's not accessible to the Galera process, or the file doesn't exist at all.

Troubleshooting Steps and Solutions

  1. Check File Permissions: Ensure that the script and its target directory have the appropriate permissions.

    • Original Post's Example: The script has -rw-rw-rw- permissions, meaning everyone has read and write access. However, this might not be enough if the Galera process runs under a different user.
    • Solution: Grant the Galera process user at least read and write access to the script file and its parent directory. This can be achieved through the chown and chmod commands.
  2. Verify Working Directory: The script execution environment might differ from your manual execution environment. The Galera process might not be in the same working directory as you.

    • Solution: Use the pwd command within the script to determine the actual working directory when executed through Galera's wsrep_notify_cmd. This can help you understand where the script is trying to access files and identify any path discrepancies.
  3. Use Absolute Paths: Avoid relying on relative paths in your scripts. Always use absolute paths to ensure consistent file access regardless of the working directory.

    • Example: Instead of echo " STATUS=$2" > Notification.txt, use echo " STATUS=$2" > /home/ubuntu/Notification.txt.
  4. Check for Environment Variables: Scripts executed through wsrep_notify_cmd might have a different set of environment variables than a user-executed script.

    • Solution: Print the environment variables within the script using env to understand the available variables and their values when running through wsrep_notify_cmd. This can help identify missing or different environment variables that might impact script behavior.
  5. Utilize set -x: This debugging technique prints each command executed by the script, which can help pinpoint the exact point of failure. Add set -x at the beginning of your script to enable verbose execution.

Additional Considerations:

  • Service User: The user that the Galera cluster service runs under may be different from your own. Ensure that the service user has access to the necessary directories and files.
  • Log Files: Configure Galera's logging level to provide more detailed information about the wsrep_notify_cmd execution and potential errors.

Example Modification:

#!/bin/bash -eu

# Print the execution environment
env

echo "Node Status Change:" > /home/ubuntu/Notification.txt

while [ $# -gt 0 ]
do
   case $1 in
      --status)
         echo "  STATUS=$2" >> /home/ubuntu/Notification.txt 
         shift 
         ;;
      --uuid)
         echo "  CLUSTER_UUID=$2" >> /home/ubuntu/Notification.txt
         shift 
         ;;
      --primary)
         echo "  PRIMARY=$2" >> /home/ubuntu/Notification.txt
         shift 
         ;;
      --index)
         echo "  INDEX=$2" >> /home/ubuntu/Notification.txt
         shift 
         ;;
      --members)
         echo "  MEMBERS=$2" >> /home/ubuntu/Notification.txt
         shift 
         ;;
      *)
         echo "Unknown option: $1" >&2
         exit 1 
         ;;
   esac
   shift
done

exit 0

Key Takeaways

  • Carefully consider the environment and user context when writing scripts for wsrep_notify_cmd.
  • Use absolute paths and avoid relying on environment variables that might not be present.
  • Utilize debugging techniques like set -x and environment variable inspection.
  • Consult Galera Cluster documentation for specifics on how wsrep_notify_cmd interacts with the script execution environment.

By following these guidelines, you can effectively troubleshoot and address common errors encountered with Galera's wsrep_notify_cmd scripts, allowing you to leverage this powerful mechanism for robust cluster management.