If
you have installed cassandra on one node then basically you don't need
to do any configuration for one node installation. But its good to know
what configuration are available and how can you use them according to
your requirement.
Finding location of main configuration file :-
Main
file for configuration of cassandra database is cassandra.yaml . The
location of cassandra.yaml file can be different based on that how
installation was performed.
- For tarbal installation configuration files lies as below
if
you have extracted cassandra tar binaries files at '/home/cassandra/'
then config files will be at below location
/home/cassandra/apache-cassandra-3.5/conf
- Installation done through package manager (apt-get, yum etc..)
location : /etc/cassandra/
node1:~$ls -ltr /etc/cassandra/cassandra.yaml
-rw-r--r-- 1 root root 49332 Nov 29 16:02 /etc/cassandra/cassandra.yaml
-rw-r--r-- 1 root root 49332 Nov 29 16:02 /etc/cassandra/cassandra.yaml
Main Parameters to configure :-
-
cluster_name : This
would be the name of cluster. Default name is "Test Cluster". You can
change this name as per your requirement. For all nodes in cluster
cluster_name must be same.
node1:~$grep cluster_name /etc/cassandra/cassandra.yaml
cluster_name: 'Test Cluster'
cluster_name: 'Test Cluster'
- listen_address : This value depends on type of configuration of cassandra.
(Single node install)
if it is single node installation then you don't need to do anything. By default value is "localhost" . Just make sure that node name is properly configured. even you can provide hostname of the server but make sure that hostname is resolving to host ip.
(Multinode Install )
If you are going to have more than one node in cluster then best option is set the ip address of host or hostname.
- listen_interface :Do not set this parameter if you have set listen_address. This parameter is used to set the name of interface name (default : eth0 ) where cassandra listen for other nodes in cluster. This interface muse resolve to only one ip.
- data_file_directories :
This is the location where all of your data recides. Be default it will keep all data at "/var/lib/cassandra/data"
location if you have install cassandra using packages manager . If you
have installed from source code then it would be install_location/data.
you
can change your data location to some different mount point as well.
Because /var mount point is used for lot of other purpose, like it is
default location for storing all logs for any software/application. So
its a good idea to seperate you database I/O from logs I/O so that it
database will have its own dedicated mount point to store all data and
responsible for its own I/O on that storage system.
In
my case I have created a mount point /cassadra to store all of
cassandra database data. Its totally up to you where do you want to
store your data, you can keep it at default location as well.
Note : make sure that /cassandra/data directory exists.
- commitlog_directory :
This is location where all your commit logs will be stored.- Cassandra package installations: /var/lib/cassandra/commitlog
- Cassandra tarball installations: install_location/data/commitlog
If you are using normal HDD to store data then it is recommended to separate your commit log and data directory to 2 different mount points. if you are using RAID or SDD then you can keep commit log and data directory at same mount point as well. Idea behind separating both directory to distribute I/O for better performance.
In my case I have separate mount point to store commit log, i.e /cassandra/commitlog
- endpoint_snitch
- (Default: org.apache.cassandra.locator.SimpleSnitch) Set to a class that implements the IEndpointSnitch interface. Cassandra uses the snitch to locate nodes and route requests.
- SimpleSnitchUse for single-datacenter deployment or single-zone deployment in public clouds. Does not recognize datacenter or rack information. Treats strategy order as proximity, which can improve cache locality when you disable read repair.
- GossipingPropertyFileSnitch
Recommended for production. Reads rack and datacenter for the local node in cassandra-rackdc.properties file and propagates these values to other nodes via gossip. For migration from the PropertyFileSnitch, uses the cassandra-topology.properties file if it is present.
- PropertyFileSnitchDetermines proximity by rack and datacenter, which are explicitly configured in cassandra-topology.properties file.
- Ec2SnitchFor EC2 deployments in a single region. Loads region and availability zone information from the Amazon EC2 API. The region is treated as the datacenter and the availability zone as the rack and uses only private IP addresses. For this reason, it does not work across multiple regions.
- Ec2MultiRegionSnitchUses the public IP as the broadcast_address to allow cross-region connectivity. This means you must also set seed addresses to the public IP and open the storage_port or ssl_storage_port on the public IP firewall. For intra-region traffic, Cassandra switches to the private IP after establishing a connection.
- RackInferringSnitch: Proximity is determined by rack and datacenter, which are assumed to correspond to the 3rd and 2nd octet of each node's IP address, respectively. Best used as an example for writing a custom snitch class (unless this happens to match your deployment conventions).
- GoogleCloudSnitch:Use for Cassandra deployments on Google Cloud Platform across one or more regions. The region is treated as a datacenter and the availability zones are treated as racks within the datacenter. All communication occurs over private IP addresses within the same logical network.
- CloudstackSnitchUse the CloudstackSnitch for Apache Cloudstack environments.To know more about Snitch visist : Snitches
- SimpleSnitch
- rpc_address
- Default value for this parameter is localhost. This is listen address
for client connections. You can set below values :-
- IP address
- hostname
No comments:
Post a Comment