git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cluster configuration issue


Hi Romain,

I have removed the swap and set to 8Gb the heap

Tried also to set log level to TRACE but the result is the same.

In attach

the system and debug files.


Regards

Francesco Messere




On 09/11/2018 20:13, Romain Hardouin wrote:
128GB RAM -> that's a good news, you have plenty of room to increase Cassandra heap size. You can start with, let's say, 12GB in jvm.options or 24GB if you use G1 GC. Let us know if the node starts and if DEBUG/TRACE is useful. 

You can also try "strace -f -p ..." command to see if the process is doing something when it's stuck, but Cassandra has a lots of threads...

Le vendredi 9 novembre 2018 à 19:13:51 UTC+1, Francesco Messere <f.messere@xxxxxxxxxxxxxx> a écrit :


Hi Roman

yes  I modified the .yaml after the issue.

The problem  is this, if I restart a node in DC-FIRENZE than it not startup I tried first one node and then the second one

with the same results.


these are the server resources

memory 128Gb


free
              total        used        free      shared  buff/cache   available
Mem:      131741388    13858952    72649704      124584    45232732   116825040
Swap:      16777212           0    16777212


cpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    1
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               1213.523
CPU max MHz:           2900.0000
CPU min MHz:           1200.0000
BogoMIPS:              4399.97
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23


There is nothing in server logs.

On monday I will activate debug and try again to startup cassandra node

Thanks

Francesco Messere




On 09/11/2018 18:51, Romain Hardouin wrote:
Ok so all nodes in Firenze are down. I thought only one was KO. 

After a first look at cassandra.yaml the only issue I saw is seeds: the line you commented out was correct (one seed per DC). But I guess you modified it after the issue. 

You should fix the swap issue. 

Also can you add more heap to Cassandra? By the way, what are the specs of servers (RAM, CPU, etc)? 

Did you check Linux system log? And Cassandra's debug.log?
You can even enable TRACE logs in logback.xml ( https://github.com/apache/cassandra/blob/cassandra-3.11.3/conf/logback.xml#L100 ) then try to restart a node in Firenze to see where it blocks but if it's due to low resources, hardware issue or swap it won't be useful. Let's give a try anyway.



Le vendredi 9 novembre 2018 à 18:20:57 UTC+1, Francesco Messere <f.messere@xxxxxxxxxxxxxx> a écrit :


Hi Romain,

you are right, is not possible to work in these towns furtunally I live in Pisa :-).

I sow the errors and corrected them, except the swap one.

The process stuks, I let it run for 1 day without results.

This is the output of nodetool status from the nodes that are up and running (DC-MILANO)

/conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 192.168.71.210 -p 17052 status
Datacenter: DC-FIRENZE
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns (effective)  Host ID                               Rack
DN  192.168.204.175  ?          256          100.0%            a3c8626e-afab-413e-a153-cccfd0b26d06  RACK1
DN  192.168.204.176  ?          256          100.0%            67738ca8-f1f5-46a9-9d23-490bbebcffaa  RACK1
Datacenter: DC-MILANO
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.71.210   5.95 GiB   256          100.0%            210f0cdd-abee-4fc0-abd3-ecdab618606e  RACK1
UN  192.168.71.211   5.83 GiB   256          100.0%            96c30edd-4e6c-4952-82d4-dfdf67f6a06f  RACK1

and this is describecluster command output

/conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 192.168.71.210 -p 17052 describecluster
Cluster Information:
        Name: CASSANDRA_3
        Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                6bdd4617-658e-375e-8503-7158df833495: [192.168.71.210, 192.168.71.211]

                UNREACHABLE: [192.168.204.175, 192.168.204.176]

In attach the cassandra.yaml file

Regards
Francesco Messere



On 09/11/2018 17:48, Romain Hardouin wrote:
Hi Francesco, it can't work! Milano and Firenze, oh boy, Calcio vs Calcio Storico X-D

Ok more seriously, "Updating topology ..." is not a problem. But you have low resources and system misconfiguration:

  - Small heap size: 3.867GiB
From the logs: "Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root."

 - System settings: Swap shoud be disabled, bad system limits, etc.
From the logs: "Cassandra server running in degraded mode. Is swap disabled? : false,  Address space adequate? : true,  nofile limit adequate? : true, nproc limit adequate? : false"


You said "Cassandra node did not startup". What is the problem exactly? The process is stuck or does it dies?
What do you see with "nodetool status" on nodes that are up and running? 

Btw cassandra-topology.properties is not required with GossipingPropertyFileSnitch (unless your are migratig from PropertyFileSnitch).


Best,

Romain


Le vendredi 9 novembre 2018 à 11:34:16 UTC+1, Francesco Messere <f.messere@xxxxxxxxxxxxxx> a écrit :


Hi to all,

I have a problem with distribuited cluster configuration.
This is a test environment 
Cassandra version is 3.11.3
2 site Milan and Florence
2 servers on each site

1 common "cluster-name" and 2 DC

First installation and startup goes ok all the nodes are present in the cluster.

The issue startup after a server reboot in FLORENCE DC

Cassandra node did not startup and in system.log last line written is

INFO  [ScheduledTasks:1] 2018-11-09 10:36:54,306 TokenMetadata.java:498 - Updating topology for all endpoints that have changed



The only way to correct the thing I found is to cleanup the node, remove from cluster and re-join it.

How can I solve it?


here are configuration files

less cassandra-topology.properties
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Cassandra Node IP=Data Center:Rack
192.168.204.175=DC-FIRENZE:RACK1
192.168.204.176=DC-FIRENZE:RACK1
192.168.71.210=DC-MILANO:RACK1
192.168.71.211=DC-MILANO:RACK1

# default for unknown nodes
default=DC-FIRENZE:r1

# Native IPv6 is supported, however you must escape the colon in the IPv6 Address
# Also be sure to comment out JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"
# in cassandra-env.sh
#fe80\:0\:0\:0\:202\:b3ff\:fe1e\:8329=DC1:RAC3


cassandra-rackdc.properties

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
dc=DC-FIRENZE
rack=RACK1

# Add a suffix to a datacenter name. Used by the Ec2Snitch and Ec2MultiRegionSnitch
# to append a string to the EC2 region name.
#dc_suffix=

# Uncomment the following line to make this snitch prefer the internal ip when possible, as the Ec2MultiRegionSnitch does.
# prefer_local=true


In attach the system.log file

Regards

Francesco Messere



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx

--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@xxxxxxxxxxxxxxxxxxxx For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx

Attachment: cassandra_log.zip
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx