Configuring Cassandra



If you have installed cassandra on one node then basically you don't need to do any configuration for one node installation.  But its good to know what configuration are available and how can you use them according to your requirement. 
Finding location of main configuration file :-
Main file for configuration of cassandra database is cassandra.yaml .  The location of  cassandra.yaml file can be different based on that how installation was performed. 
 
- For tarbal installation configuration files lies as below 
 if you have extracted cassandra tar binaries files at '/home/cassandra/'   then config files will be at below location /home/cassandra/apache-cassandra-3.5/conf 

- Installation done through package manager (apt-get, yum etc..)
  location : /etc/cassandra/
node1:~$ls -ltr /etc/cassandra/cassandra.yaml
-rw-r--r-- 1 root root 49332 Nov 29 16:02 /etc/cassandra/cassandra.yaml  




Main Parameters to configure :- 
  • cluster_name :   This would be the name of cluster.  Default name is "Test Cluster". You can change this name as per your requirement. For all nodes in cluster cluster_name must be same.
                   node1:~$grep cluster_name /etc/cassandra/cassandra.yaml
                   cluster_name: 'Test Cluster'

  •  listen_address :     This value depends on type of configuration of cassandra. 

    (Single node install)

    if it is single node installation then you don't need to do anything. By default value is "localhost" .  Just make sure that node name is properly configured.  even you can provide hostname of the server but make sure that hostname is resolving to host ip. 

    (Multinode Install )

     If you are going to have more than one node in cluster then best option is set the ip address of host or hostname.

  •  listen_interface :
    Do not set this parameter if you have set listen_address.  This parameter is used to set the name of interface name (default : eth0 ) where cassandra listen for other nodes in cluster. This interface muse resolve to only one ip.
 
  • data_file_directories :
             This is the location where all of your data recides. Be default it will keep all data at "/var/lib/cassandra/data"  location if you have install cassandra using packages manager . If you have installed from source code then it would be install_location/data.

you can change your data location to some different mount point as well.  Because /var mount point is used for lot of other purpose, like it is default location for storing all logs for any software/application. So its a good idea to seperate you database I/O from logs I/O so that it database will have its own dedicated mount point to store all data and responsible for its own I/O on that storage system. 

In my case I have created a mount point /cassadra to store all of cassandra database data. Its totally up to you where do you want to store your data, you can keep it at default location as well. 

       Note : make sure that /cassandra/data directory exists.
 
  •   commitlog_directory :


    This is location where all your commit logs will be stored.
      
    • Cassandra package installations: /var/lib/cassandra/commitlog
    • Cassandra tarball installations: install_location/data/commitlog
     

    If you are using normal HDD to store data then it is recommended to separate your commit log and data directory to 2 different mount points.  if you are using RAID or SDD then you can keep commit log and data directory at same mount point as well.  Idea behind separating both directory to distribute I/O for better performance. 

    In my case I have separate mount point to store commit log, i.e /cassandra/commitlog




 
  • endpoint_snitch
(Default: org.apache.cassandra.locator.SimpleSnitch) Set to a class that implements the IEndpointSnitch interface. Cassandra uses the snitch to locate nodes and route requests.


  • SimpleSnitch
    Use for single-datacenter deployment or single-zone deployment in public clouds. Does not recognize datacenter or rack information. Treats strategy order as proximity, which can improve cache locality when you disable read repair.

  • GossipingPropertyFileSnitch
    Recommended for production. Reads rack and datacenter for the local node in cassandra-rackdc.properties file and propagates these values to other nodes via gossip. For migration from the PropertyFileSnitch, uses the cassandra-topology.properties file if it is present.

  • PropertyFileSnitch
    Determines proximity by rack and datacenter, which are explicitly configured in cassandra-topology.properties file.

  • Ec2Snitch
    For EC2 deployments in a single region. Loads region and availability zone information from the Amazon EC2 API. The region is treated as the datacenter and the availability zone as the rack and uses only private IP addresses. For this reason, it does not work across multiple regions. 

  • Ec2MultiRegionSnitch
    Uses the public IP as the broadcast_address to allow cross-region connectivity. This means you must also set seed addresses to the public IP and open the storage_port or ssl_storage_port on the public IP firewall. For intra-region traffic, Cassandra switches to the private IP after establishing a connection.

  • RackInferringSnitch:
    Proximity is determined by rack and datacenter, which are assumed to correspond to the 3rd and 2nd octet of each node's IP address, respectively. Best used as an example for writing a custom snitch class (unless this happens to match your deployment conventions).

  • GoogleCloudSnitch:
    Use for Cassandra deployments on Google Cloud Platform across one or more regions. The region is treated as a datacenter and the availability zones are treated as racks within the datacenter. All communication occurs over private IP addresses within the same logical network.

  • CloudstackSnitch
    Use the CloudstackSnitch for Apache Cloudstack environments.

    To know more about Snitch visist : Snitches

      
  •  rpc_address
Default value for this parameter is localhost. This is  listen address for client connections.  You can set below values :-
  • IP address
  • hostname
 

No comments:

Post a Comment