Cassansra Failure detector is responsible to mark one node up or down based on the heart beat value.Basically it keeps track of tje heart beat values thats coming through gossip for each node and periodically checks if heart beat value is monotonously increasing integer value..if the value is stagnant for over a period of time,FD mark the node as down. My question is what is the period or how many heartbeat values are checked to determine if the node is down… Is there any such logic that FD does check so many periodic hb values in every x second…
The only configuration option I’ve found is the phi_convict_threshold in cassandra.yaml, which tunes the probability of marking a specific node as unavailable.
There’s a suggestion in the documentation to increase the value, if you’re working in a unreliable environment:Default value is 8 but you can increase the threshold in case there are many false positives.
Increasing it above 12 is not suggested.
Decreasing to lower than 5 is also not recommended.
Follow these links for further explanation
Cassandra failure detection – official documentation
Phi convict threshold – cassandra.yaml reference
Functionality and Phi value explained in this whitepaper