17 KiB
Troubleshooting Guide
This guide helps you diagnose and resolve common issues you might encounter when using NodePass. For each problem, we provide possible causes and step-by-step solutions.
Connection Issues
Unable to Establish Tunnel Connection
Symptoms: Client cannot connect to the server's tunnel endpoint, or connection is immediately dropped.
Possible Causes and Solutions:
-
Network Connectivity Issues
- Verify basic connectivity with
pingortelnetto the server address - Check if the specified port is reachable:
telnet server.example.com 10101 - Ensure no firewall is blocking the tunnel port (typically 10101)
- Verify basic connectivity with
-
Server Not Running
- Verify the NodePass server is running with
ps aux | grep nodepasson Linux/macOS - Check server logs for any startup errors
- Try restarting the server process
- Verify the NodePass server is running with
-
Incorrect Addressing
- Double-check the tunnel address format in your client command
- Ensure you're using the correct hostname/IP and port
- If using DNS names, verify they resolve to the correct IP addresses
-
TLS Configuration Mismatch
- If server requires TLS but client doesn't support it, connection will fail
- Check server logs for TLS handshake errors
- Ensure certificates are correctly configured if using TLS mode 2
Data Not Flowing Through the Tunnel
Symptoms: Tunnel connection established, but application data isn't reaching the destination.
Possible Causes and Solutions:
-
Target Service Not Running
- Verify the target service is running on both server and client sides
- Check if you can connect directly to the service locally
-
Port Conflicts
- Ensure the target port isn't already in use by another application
- Use
netstat -tulnto check for port usage
-
Protocol Mismatch
- Verify you're tunneling the correct protocol (TCP vs UDP)
- Some applications require specific protocol support
-
Incorrect Target Address
- Double-check the target address in both server and client commands
- For server-side targets, ensure they're reachable from the server
- For client-side targets, ensure they're reachable from the client
Connection Stability Issues
Symptoms: Tunnel works initially but disconnects frequently or becomes unresponsive.
Possible Causes and Solutions:
-
Network Instability
- Check for packet loss or high latency in your network
- Consider a more stable network connection for production deployments
-
Resource Constraints
- Monitor CPU and memory usage on both client and server
- Adjust pool parameters if resources are being exhausted (see Performance section)
- Check file descriptor limits with
ulimit -non Linux/macOS
-
Timeout Configuration
- Adjust
NP_UDP_DIAL_TIMEOUTif using UDP with slow response times - Increase
readparameter in URL for long-running transfers (default: 0) - Consider adjusting
NP_TCP_DIAL_TIMEOUTfor unstable network conditions
- Adjust
-
Overloaded Server
- Check server logs for signs of connection overload
- Adjust
maxparameter andNP_SEMAPHORE_LIMITto handle the load - Consider scaling horizontally with multiple NodePass instances
Certificate Issues
TLS Handshake Failures
Symptoms: Connection attempts fail with TLS handshake errors.
Possible Causes and Solutions:
-
Invalid Certificate
- Verify certificate validity:
openssl x509 -in cert.pem -text -noout - Ensure the certificate hasn't expired
- Check that the certificate is issued for the correct domain/IP
- Verify certificate validity:
-
Missing or Inaccessible Certificate Files
- Confirm file paths to certificates and keys are correct
- Verify file permissions allow the NodePass process to read them
- Check for file corruption by opening certificates in a text editor
-
Certificate Trust Issues
- If using custom CAs, ensure they are properly trusted
- For self-signed certificates, confirm TLS mode 1 is being used
- For verified certificates, ensure the CA chain is complete
-
Key Format Problems
- Ensure private keys are in the correct format (usually PEM)
- Check for passphrase protection on private keys (not supported directly)
Certificate Renewal Issues
Symptoms: After certificate renewal, secure connections start failing.
Possible Causes and Solutions:
-
New Certificate Not Loaded
- Restart NodePass to force loading of new certificates
- Check if
RELOAD_INTERVALis set correctly to automatically detect changes
-
Certificate Chain Incomplete
- Ensure the full certificate chain is included in the certificate file
- Verify chain order: your certificate first, then intermediate certificates
-
Key Mismatch
- Verify the new certificate matches the private key:
openssl x509 -noout -modulus -in cert.pem | openssl md5 openssl rsa -noout -modulus -in key.pem | openssl md5 - If outputs differ, certificate and key don't match
- Verify the new certificate matches the private key:
Performance Optimization
High Latency
Symptoms: Connections work but have noticeable delays.
Possible Causes and Solutions:
-
Pool Configuration
- Increase
minparameter to have more connections ready - Decrease
MIN_POOL_INTERVALto create connections faster - Adjust
NP_SEMAPHORE_LIMITif connection queue is backing up
- Increase
-
Network Path
- Check for network congestion or high-latency links
- Consider deploying NodePass closer to either the client or server
- Use a traceroute to identify potential bottlenecks
-
TLS Overhead
- If extreme low latency is required and security is less critical, consider using TLS mode 0
- For a balance, use TLS mode 1 with session resumption
-
Resource Contention
- Ensure the host system has adequate CPU and memory
- Check for other processes competing for resources
- Consider dedicated hosts for high-traffic deployments
High CPU Usage
Symptoms: NodePass process consuming excessive CPU resources.
Possible Causes and Solutions:
-
Pool Thrashing
- If pool is constantly creating and destroying connections, adjust timings
- Increase
MIN_POOL_INTERVALto reduce connection creation frequency - Find a good balance for
minandmaxpool parameters
-
Excessive Logging
- Reduce log level from debug to info or warn for production use
- Check if logs are being written to a slow device
-
TLS Overhead
- TLS handshakes are CPU-intensive; consider session caching
- Use TLS mode 1 instead of mode 2 if certificate validation is less critical
-
Traffic Volume
- High throughput can cause CPU saturation
- Consider distributing traffic across multiple NodePass instances
- Vertical scaling (more CPU cores) may be necessary for very high throughput
Memory Leaks
Symptoms: NodePass memory usage grows continuously over time.
Possible Causes and Solutions:
-
Connection Leaks
- Ensure
NP_SHUTDOWN_TIMEOUTis sufficient to properly close connections - Check for proper error handling in custom scripts or management code
- Monitor connection counts with system tools like
netstat
- Ensure
-
Pool Size Issues
- If
maxparameter is very large, memory usage will be higher - Monitor actual pool usage vs. configured capacity
- Adjust capacity based on actual concurrent connection needs
- If
-
Debug Logging
- Extensive debug logging can consume memory in high-traffic scenarios
- Use appropriate log levels for production environments
UDP-Specific Issues
UDP Data Loss
Symptoms: UDP packets are not reliably forwarded through the tunnel.
Possible Causes and Solutions:
-
Buffer Size Limitations
- If UDP packets are large, increase
UDP_DATA_BUF_SIZE - Default of 8192 bytes may be too small for some applications
- If UDP packets are large, increase
-
Timeout Issues
- If responses are slow, increase
NP_UDP_DIAL_TIMEOUT - Adjust
readparameter for longer session timeouts - For applications with variable response times, find an optimal balance
- If responses are slow, increase
-
High Packet Rate
- UDP is handled one datagram at a time; very high rates may cause issues
- Consider increasing pool capacity for high-traffic UDP applications
-
Protocol Expectations
- Some UDP applications expect specific behavior regarding packet order or timing
- NodePass provides best-effort forwarding but cannot guarantee UDP properties beyond what the network provides
UDP Connection Tracking
Symptoms: UDP sessions disconnect prematurely or fail to establish.
Possible Causes and Solutions:
-
Connection Mapping
- Verify client configurations match server expectations
- Check for firewalls that may be timing out UDP session tracking
-
Application UDP Timeout
- Some applications have built-in UDP session timeouts
- May need to adjust application-specific keepalive settings
DNS Issues
DNS Resolution Failures
Symptoms: Connections fail with "no such host" or DNS lookup errors.
Solutions:
-
Verify System DNS Configuration
- Verify resolution works:
nslookup example.com - Check system's DNS settings (NodePass uses system resolver)
- Ensure network connectivity is working
- Verify resolution works:
-
Network Connectivity
- Check if firewall blocks UDP port 53
- Verify domain reachability
- Test with alternative domains to isolate issue
DNS Caching Problems
Symptoms: Resolution returns stale IPs, connections go to wrong endpoints.
Solutions:
-
Adjust Cache TTL (default 5 minutes)
- Dynamic environments:
dns=1m - Stable environments:
dns=30m
- Dynamic environments:
-
Load Balancing Scenarios
- Use shorter TTL:
dns=30s - Or use IP addresses directly to bypass DNS caching
- Use shorter TTL:
DNS Performance Optimization
Symptoms: High connection latency, slow startup.
Solutions:
-
Optimize Cache TTL
- Increase TTL for stable environments:
dns=1h - Reduce TTL for dynamic environments:
dns=1m - Balance between freshness and performance
- Increase TTL for stable environments:
-
Reduce DNS Queries
- Use IP addresses directly for performance-critical scenarios
- Increase TTL for stable hostnames
- Pre-resolve addresses when possible
Master API Issues
API Accessibility Problems
Symptoms: Cannot connect to the master API endpoint.
Possible Causes and Solutions:
-
Endpoint Configuration
- Verify API address and port in the master command
- Check if the API server is bound to the correct network interface
-
TLS Configuration
- If using HTTPS (TLS modes 1 or 2), ensure client tools support TLS
- For testing, use
curl -kto skip certificate validation
-
Custom Prefix Issues
- If using a custom API prefix, ensure it's included in all requests
- Check URL formatting in API clients and scripts
Instance Management Failures
Symptoms: Cannot create, control, or delete instances through the API.
Possible Causes and Solutions:
-
JSON Format Issues
- Verify request body is valid JSON
- Check for required fields in API requests
-
URL Parsing Problems
- Ensure instance URLs are properly formatted and URL-encoded if necessary
- Verify URL parameters use the correct format
-
Instance State Conflicts
- Cannot delete running instances without stopping them first
- Check current instance state with GET before performing actions
-
Permission Issues
- Ensure the NodePass master has sufficient permissions to create processes
- Check file system permissions for any referenced certificates or keys
Data Recovery
Master State File Corruption
Symptoms: Master mode fails to start showing state file corruption errors, or instance data is lost.
Possible Causes and Solutions:
-
Recovery using automatic backup file
- NodePass automatically creates backup file
nodepass.gob.backupevery hour - Stop the NodePass master service
- Copy backup file as main file:
cp nodepass.gob.backup nodepass.gob - Restart the master service
- NodePass automatically creates backup file
-
Manual state file recovery
# Stop NodePass service pkill nodepass # Backup corrupted file (optional) mv nodepass.gob nodepass.gob.corrupted # Use backup file cp nodepass.gob.backup nodepass.gob # Restart service nodepass "master://0.0.0.0:9090?log=info" -
When backup file is also corrupted
- Remove corrupted state files:
rm nodepass.gob* - Restart master, which will create new state file
- Need to reconfigure all instances and settings
- Remove corrupted state files:
-
Preventive backup recommendations
- Regularly backup
nodepass.gobto external storage - Adjust backup frequency: set environment variable
export NP_RELOAD_INTERVAL=30m - Monitor state file size, abnormal growth may indicate issues
- Regularly backup
Best Practices:
- In production environments, recommend regularly backing up
nodepass.gobto different storage locations - Use configuration management tools to save text-form backups of instance configurations
QUIC-Specific Issues
QUIC Connection Failures
Symptoms: QUIC tunnel fails to establish when quic=1 is enabled.
Possible Causes and Solutions:
-
UDP Port Blocked
- Verify UDP port is accessible on both server and client
- Check firewall rules:
sudo ufw allow 10101/udp(Linux example) - Test UDP connectivity with
nc -u server.example.com 10101 - Some ISPs or networks block or throttle UDP traffic
-
TLS Configuration Issues
- QUIC requires TLS to be enabled (minimum
tls=1) - If
quic=1is set but TLS is disabled, system auto-enablestls=1 - For production, use
tls=2with valid certificates - Check certificate validity for QUIC connections
- QUIC requires TLS to be enabled (minimum
-
Client-Server QUIC Mismatch
- Both server and client must use same
quicsetting - Server with
quic=1requires client withquic=1 - Server with
quic=0requires client withquic=0 - Check logs for "QUIC connection not available" errors
- Both server and client must use same
-
Mode Compatibility
- QUIC only works in dual-end handshake mode (mode=2)
- Not available in single-end forwarding mode (mode=1)
- System will fall back to TCP pool if mode incompatible
QUIC Performance Issues
Symptoms: QUIC tunnel has lower performance than expected or worse than TCP pool.
Possible Causes and Solutions:
-
Network Path Issues
- Some networks deprioritize or shape UDP traffic
- Check if network middleboxes are interfering with QUIC
- Consider testing with TCP pool (
quic=0) for comparison - Monitor packet loss rates - QUIC performs better with low loss
-
Pool Capacity Configuration
- Increase
minandmaxparameters for higher throughput - QUIC streams share single UDP connection - adequate capacity needed
- Monitor stream utilization with
log=debug - Balance between stream count and resource usage
- Increase
-
Certificate Overhead
- TLS 1.3 handshake (mandatory for QUIC) can add initial latency
- Use 0-RTT resumption for faster reconnection
- Ensure proper certificate chain to avoid validation delays
-
Application Compatibility
- Some applications may not work optimally over QUIC streams
- Test with both TCP and QUIC pools to compare performance
- Consider TCP pool for applications requiring strict ordering
QUIC Stream Exhaustion
Symptoms: "Insufficient streams" errors or connection timeouts when using QUIC.
Possible Causes and Solutions:
-
Pool Capacity Too Low
- Increase
maxparameter on server side - Increase
minparameter on client side - Monitor active stream count in logs
- Default capacity may be insufficient for high-concurrency scenarios
- Increase
-
Stream Leaks
- Check application properly closes connections
- Monitor stream count over time for gradual increase
- Restart instances to clear leaked streams
- Review application code for connection handling
-
QUIC Connection Dropped
- Check keep-alive settings (configured via
NP_REPORT_INTERVAL) - Monitor for "QUIC connection not available" errors
- NAT timeout may drop UDP connection - adjust NAT settings
- Increase connection timeout if network latency is high
- Check keep-alive settings (configured via
QUIC vs TCP Pool Decision
When to Use QUIC (quic=1):
- Mobile networks or frequently changing network conditions
- High-latency connections (satellite, long-distance)
- NAT-heavy environments where UDP traversal is better
- Real-time applications benefiting from stream independence
- Scenarios where 0-RTT reconnection provides value
When to Use TCP Pool (quic=0):
- Networks that block or severely throttle UDP traffic
- Applications requiring strict TCP semantics
- Corporate environments with UDP restrictions
- Maximum compatibility requirements
- When testing shows better performance with TCP
Comparison Testing:
# Test TCP pool performance
nodepass "server://0.0.0.0:10101/backend:8080?quic=0&mode=2&log=event"
nodepass "client://server:10101/127.0.0.1:8080?quic=0&mode=2&log=event"
# Test QUIC pool performance
nodepass "server://0.0.0.0:10102/backend:8080?quic=1&mode=2&log=event"
nodepass "client://server:10102/127.0.0.1:8081?quic=1&mode=2&log=event"
Monitor traffic statistics and choose based on observed performance.
Next Steps
If you encounter issues not covered in this guide:
- Check the project repository for known issues
- Increase the log level to
debugfor more detailed information - Review the How It Works section to better understand internal mechanisms
- Consider joining the community discussion for assistance from other users