Cassandra 的 Python driver 基本使用,timeout 與 retry 設定

Cassandra(或雙生的 Scylla)是能夠提供低延遲高吞吐量的大規模資料存取的資料庫系統。Python 可以透過 DataStax 提供的 driver (或 Scylladb 的 driver)進行存取。

最基本的使用方式像這樣:

from cassandra.cluster import Cluster

hosts=['localhost']

cluster = Cluster(hosts)
session = cluster.connect()
local_query = 'SELECT rpc_address FROM system.local'
for _ in cluster.metadata.all_hosts():
    print(session.execute(local_query).one())
Row(rpc_address='127.0.0.2')

如果要做比較精細的設定,就會需要設定 Execution Profiles

可以設定多個不同的 profile 使用,不指定時預設的 profile 可用 EXEC_PROFILE_DEFAULT 指定。

例如設定 request_timeout 為 3 秒。

from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT

hosts=['localhost']

profile = ExecutionProfile(request_timeout=3)

cluster = Cluster(hosts, execution_profiles={EXEC_PROFILE_DEFAULT: profile})
session = cluster.connect()
local_query = 'SELECT rpc_address FROM system.local'
for _ in cluster.metadata.all_hosts():
    print(session.execute(local_query).one())

如要設定 retry 機制要自己實作 RetryPolicy

from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT
from cassandra.policies import RetryPolicy

hosts=['localhost']

class MyRetryPolicy(RetryPolicy):
    def on_read_timeout(self, query, consistency, required_responses,
                        received_responses, data_retrieved, retry_num):
        # Retry 到第 3 次就放棄
        if retry_num >= 3:
            return self.RETHROW, None

        # Server 有回應且還沒收到結果才 retry
        if received_responses >= required_responses and not data_retrieved:
            return self.RETRY, consistency

        return self.RETHROW, None

profile = ExecutionProfile(request_timeout=3,
                           retry_policy=MyRetryPolicy())

cluster = Cluster(hosts, execution_profiles={EXEC_PROFILE_DEFAULT: profile})
session = cluster.connect()
local_query = 'SELECT rpc_address FROM system.local'
for _ in cluster.metadata.all_hosts():
    print(session.execute(local_query).one())