Assigning Multiple Applications/annotations to Objects
Copying table values to the Clipboard
Querying Logs
Troubleshooting Linux Acquisition Unit Problems
Troubleshooting Windows Acquisition Unit Problems
Researching Data Collector Errors
Data Collector Support Matrix
Data ONTAP 9.2 and later versions. For best performance, use a Data ONTAP version where
this issue
is fixed.
SMB protocol version 3.1 and earlier.
NFS protocol version 4.0 and earlier
Flexgroup is supported from ONTAP 9.4 and later versions
ONTAP Select is supported
SVM has several sub-types. Of these, only
default
,
sync_source
, and
sync_destination
are supported.
An Agent
must be configured
before you can configure data collectors.
Make sure that you have a properly configured User Directory Connector, otherwise events will show encoded user names and not the actual name of the user (as stored in Active Directory) in the “Activity Forensics” page.
For optimal performance, you should configure the FPolicy server to be on the same subnet as the storage system.
Set a password for the SVM vsadmin user. If using custom user or cluster admin user, skip this step.
clustershell::>
security login password -username vsadmin -vserver svmname
Unlock the SVM vsadmin user for external access. If using custom user or cluster admin user, skip this step.
clustershell::>
security login unlock -username vsadmin -vserver svmname
Ensure the firewall-policy of the data LIF is set to ‘mgmt’ (not ‘data’). Skip this step if using a dedicated management lif to add the SVM.
clustershell::>
network interface modify -lif <SVM_data_LIF_name> -firewall-policy mgmt
When a firewall is enabled, you must have an exception defined to allow TCP traffic for the port using the Data ONTAP Data Collector.
See
Agent requirements
for configuration information. This applies to on-premise Agents and Agents installed in the Cloud.
When an Agent is installed in an AWS EC2 instance to monitor a Cloud ONTAP SVM, the Agent and Storage must be in the same VPC. If they are in separate VPCs, there must be a valid route between the VPC’s.
Permissions when adding via
Cluster Management IP
:
If you cannot use the Cluster management administrator user to allow Workload Security to access the ONTAP SVM data collector, you can create a new user named “csuser” with the roles as shown in the commands below. Use the username “csuser” and password for “csuser” when configuring the Workload Security data collector to use Cluster Management IP.
To create the new user, log in to ONTAP with the Cluster management Administrator username/password, and execute the following commands on the ONTAP server:
security login role create -role csrole -cmddirname DEFAULT -access readonly
security login role create -role csrole -cmddirname "vserver fpolicy" -access all
security login role create -role csrole -cmddirname "volume snapshot" -access all -query "-snapshot cloudsecure_*"
security login role create -role csrole -cmddirname "event catalog" -access all
security login role create -role csrole -cmddirname "event filter" -access all
security login role create -role csrole -cmddirname "event notification destination" -access all
security login role create -role csrole -cmddirname "event notification" -access all
security login role create -role csrole -cmddirname "security certificate" -access all
security login create -user-or-group-name csuser -application ontapi -authmethod password -role csrole
security login create -user-or-group-name csuser -application ssh -authmethod password -role csrole
Permissions when adding via
Vserver Management IP
:
If you cannot use the Cluster management administrator user to allow Workload Security to access the ONTAP SVM data collector, you can create a new user named “csuser” with the roles as shown in the commands below. Use the username “csuser” and password for “csuser” when configuring the Workload Security data collector to use Vserver Management IP.
To create the new user, log in to ONTAP with the Cluster management Administrator username/password, and execute the following commands on the ONTAP server. For ease, copy these commands to a text editor and replace the <vservername> with your Vserver name before and executing these commands on ONTAP:
security login role create -vserver <vservername> -role csrole -cmddirname DEFAULT -access readonly
security login role create -vserver <vservername> -role csrole -cmddirname "vserver fpolicy" -access all
security login role create -vserver <vservername> -role csrole -cmddirname "volume snapshot" -access all
security login create -user-or-group-name csuser -application ontapi -authmethod password -role csrole -vserver <vservername>
Cluster / SVM Management IP Address
The IP address for the cluster or the SVM, depending on your selection above.
SVM Name
The Name of the SVM (this field is required when connecting via Cluster IP)
Username
User name to access the SVM/Cluster
When adding via Cluster IP the options are:
1. Cluster-admin
2. ‘csuser’
3. AD-user having similar role as csuser.
When adding via SVM IP the options are:
4. vsadmin
5. ‘csuser’
6. AD-username having similar role as csuser.
Password
Password for the above user name
Filter Shares/Volumes
Choose whether to include or exclude Shares / Volumes from event collection
Enter complete share names to exclude/include
Comma-separated list of shares to exclude or include (as appropriate) from event collection
Enter complete volume names to exclude/include
Comma-separated list of volumes to exclude or include (as appropriate) from event collection
Monitor Folder Access
When checked, enables events for folder access monitoring. Note that folder create/rename and delete will be monitored even without this option selected. Enabling this will increase the number of events monitored.
Set ONTAP Send Buffer size
Sets the ONTAP Fpolicy send buffer size. If an ONTAP version prior to 9.8p7 is used and performance issue is seen, then the ONTAP send buffer size can be altered to get improved ONTAP performance. Contact NetApp Support if you do not see this option and wish to explore it.
At any moment of time, one data collector should be in running, another will be in error.
The current ‘running’ SVM’s data collector will show as
Running
. The current ‘stopped’ SVM’s
data collector will show as
Error
.
Whenever there is a switchover, the state of the data collector will change from ‘running’ to ‘error’ and vice versa.
It will take up to two minutes for the data collector to move from Error state to Running state.
If using service policy from ONTAP version 9.9.1, in order to connect to the Data Source Collector, the
data-fpolicy-client
service is required along with the data service
data-nfs
, and/or
data-cifs
.
Example:
Testcluster-1::*> net int service-policy create -policy only_data_fpolicy -allowed-addresses 0.0.0.0/0 -vserver aniket_svm
-services data-cifs,data-nfs,data,-core,data-fpolicy-client
(network interface service-policy create)
In versions of ONTAP prior to 9.9.1,
data-fpolicy-client
need not be set.
If the Data Collector is in
Running
state, you can Pause collection. Open the "three dots" menu for the collector and select PAUSE. While the collector is paused, no data is gathered from ONTAP, and no data is sent from the collector to ONTAP. This means no Fpolicy events will flow from ONTAP to the data collector, and from there to Cloud Insights.
Note that if any new volumes, etc. are created on ONTAP while the collector is Paused, Workload Security won’t gather the data and those volumes, etc. will not be reflected in dashboards or tables.
Keep the following in mind:
EMS events (like ONTAP ARP) won’t be processed on a paused collector. This means if ONTAP identifies a ransomware attack, Cloud Insights Workload Security won’t be able to acquire that event.
Health notifications emails will NOT be sent for a paused collector.
Manual or Automatic actions (such as Snapshot or User Blocking) will not be supported on a paused collector.
On agent or collector upgrades, agent VM restarts/reboots, or agent service restart, a paused collector will remain in
Paused
state.
If the data collector is in
Error
state, the collector cannot be changed to
Paused
state. The Pause button will be enabled only if the state of the collector is
Running
.
If the agent is disconnected, the collector cannot be changed to
Paused
state. The collector will go into
Stopped
state and the Pause button will be disabled.
Data Collector runs for some time and stops after a random time, failing with: "Error message: Connector is in error state. Service name: audit. Reason for failure: External fpolicy server overloaded."
The event rate from ONTAP was much higher than what the Agent box can handle. Hence the connection got terminated.
Check the peak traffic in CloudSecure when the disconnection happened. This you can check from the
CloudSecure > Activity Forensics > All Activity
page.
If the peak aggregated traffic is higher than what the Agent Box can handle, then please refer to the Event Rate Checker page on how to size for Collector deployment in an Agent Box.
If the Agent was installed in the Agent box prior to 4 March 2021, run the following commands in the Agent box:
echo 'net.core.rmem_max=8388608' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 2097152 8388608' >> /etc/sysctl.conf
sysctl -p
Restart the collector from the UI after resizing.
Collector reports Error Message: “No local IP address found on the connector that can reach the data interfaces of the SVM”.
This is most likely due to a networking issue on the ONTAP side. Please follow these steps:
1. Ensure that there are no firewalls on the SVM data lif or the management lif which are blocking the connection from the SVM.
2. When adding an SVM via a cluster management IP, please ensure that the data lif and management lif of the SVM are pingable from the Agent VM. In case of issues, check the gateway, netmask and routes for the lif.
You can also try logging in to the cluster via ssh using the cluster management IP, and ping the Agent IP. Make sure that the agent IP is pingable:
network ping -vserver <vserver name> -destination <Agent IP> -lif <Lif Name> -show-detail
If not pingable, make sure the network settings in ONTAP are correct, so that the Agent machine is pingable.
3. If you have tried connecting via Cluster IP and it is not working, try connecting directly via SVM IP. Please see above for the steps to connect via SVM IP.
4. While adding the collector via SVM IP and vsadmin credentials, check if the SVM Lif has Data plus Mgmt role enabled. In this case ping to the SVM Lif will work, however SSH to the SVM Lif will not work.
If yes, create an SVM Mgmt Only Lif and try connecting via this SVM management only Lif.
5. If it is still not working, create a new SVM Lif and try connecting through that Lif. Make sure that the subnet mask is correctly set.
6. Advanced Debugging:
a) Start a packet trace in ONTAP.
b) Try to connect a data collector to the SVM from CloudSecure UI.
c) Wait till the error appears. Stop the packet trace in ONTAP.
d) Open the packet trace from ONTAP. It is available at this location
https://<cluster_mgmt_ip>/spi/<clustername>/etc/log/packet_traces/
e) Make sure there is a SYN from ONTAP to the Agent box.
f) If there is no SYN from ONTAP then it is an issue with firewall in ONTAP.
g) Open the firewall in ONTAP, so that ONTAP is able to connect the agent box.
7. If it is still not working, please consult the networking team to make sure that no external firewall is blocking the connection from ONTAP to the Agent box.
8. Verify that port 7 is open.
9. If none of the above solves the issue, open a case with
Netapp Support
for further assistance.
Message: "Failed to determine ONTAP type for [hostname: <IP Address>. Reason: Connection error to Storage System <IP Address>: Host is unreachable (Host unreachable)"
1. Verify that the correct SVM IP Management address or Cluster Management IP has been provided.
2. SSH to the SVM or the Cluster to which you are intending to connect. Once you are connected ensure that the SVM or the Cluster name is correct.
Error Message: "Connector is in error state. Service.name: audit. Reason for failure: External fpolicy server terminated."
1. It is most likely that a firewall is blocking the necessary ports in the agent machine. Verify the port range 35000-55000/tcp is opened for the agent machine to connect from the SVM. Also ensure that there are no firewalls enabled from the ONTAP side blocking communication to the agent machine.
2. Type the following command in the Agent box and ensure that the port range is open.
sudo iptables-save | grep 3500*
Sample output should look like:
-A IN_public_allow -p tcp -m tcp --dport 35000 -m conntrack -ctstate NEW -j ACCEPT
3. Login to SVM, enter the following commands and check that no firewall is set to block the communication with ONTAP.
system services firewall show
system services firewall policy show
Check firewall commands
on the ONTAP side.
4. SSH to the SVM/Cluster which you want to monitor. Ping the Agent box from the SVM data lif (with CIFS, NFS protocols support) and ensure that ping is working:
network ping -vserver <vserver name> -destination <Agent IP> -lif <Lif Name> -show-detail
If not pingable, make sure the network settings in ONTAP are correct, so that the Agent machine is pingable.
5.If a single SVM is added twice added to a tenant via 2 data collectors, then this error will be shown. Delete one of the data collectors thru the UI. Then restart the other data collector thru the UI. Then the data collector will show “RUNNING” status and will start receiving events from SVM.
Basically, in a tenant, 1 SVM should be added only once, via 1 data collector. 1 SVM should not added twice via 2 data collectors.
6. In instances where the same SVM was added in two different Workload Security environments (tenants), the last one will always succeed. The second collector will configure fpolicy with its own IP address and kick out the first one. So the collector in the first one will stop receiving events and its "audit" service will enter into error state.
To prevent this, configure each SVM on a single environment.
7. This error may also occur if service policies are not configured correctly. With ONTAP 9.8 or later, in order to connect to the Data Source Collector, the data-fpolicy-client service is required along with the data service data-nfs, and/or data-cifs. Additionally, the data-fpolicy-client service must be associated with the data lif(s) for the monitored SVM.
No events seen in activity page.
1. Check if ONTAP collector is in “RUNNING” state. If yes, then ensure that some cifs events are being generated on the cifs client VMs by opening some files.
2. If no activities are seen, please login to the SVM and enter the following command.
<SVM>event log show -source fpolicy
Please ensure that there are no errors related to fpolicy.
3. If no activities are seen, please login to the SVM. Enter the following command
<SVM>fpolicy show
Check if the fpolicy policy named with prefix “cloudsecure_” has been set and status is “on”. If not set, then most likely the Agent is unable to execute the commands in the SVM. Please ensure all the prerequisites as described in the beginning of the page have been followed.
SVM Data Collector is in error state and Errror message is “Agent failed to connect to the collector”
1. Most likely the Agent is overloaded and is unable to connect to the Data Source collectors.
2. Check how many Data Source collectors are connected to the Agent.
3. Also check the data flow rate in the “All Activity” page in the UI.
4. If the number of activities per second is significantly high, install another Agent and move some of the Data Source Collectors to the new Agent.
SVM Data Collector shows error message as "fpolicy.server.connectError: Node failed to establish a connection with the FPolicy server "12.195.15.146" ( reason: "Select Timed out")"
Firewall is enabled in SVM/Cluster. So fpolicy engine is unable to connect to fpolicy server.
CLIs in ONTAP which can be used to get more information are:
event log show -source fpolicy which shows the error
event log show -source fpolicy -fields event,action,description which shows more details.
Check firewall commands
on the ONTAP side.
Error Message: “Connector is in error state. Service name:audit. Reason for failure: No valid data interface (role: data,data protocols: NFS or CIFS or both, status: up) found on the SVM.”
Ensure there is an operational interface (having role as data and data protocol as CIFS/NFS.
The data collector goes into Error state and then goes into RUNNING state after some time, then back to Error again. This cycle repeats.
This typically happens in the following scenario:
1. There are multiple data collectors added.
2. The data collectors which show this kind of behavior will have 1 SVM added to these data collectors. Meaning 2 or more data collectors are connected to 1 SVM.
3. Ensure 1 data collector connects to only 1 SVM.
4. Delete the other data collectors which are connected to the same SVM.
Connector is in error state. Service name: audit. Reason for failure: Failed to configure (policy on SVM svmname. Reason: Invalid value specified for 'shares-to-include' element within 'fpolicy.policy.scope-modify: "Federal'
The share names need to be given without any quotes. Edit the ONTAP SVM DSC configuration to correct the share names.
Include and exclude shares
is not intended for a long list of share names. Use filtering by volume instead if you have a large number of shares to include or exclude.
There are existing fpolicies in the Cluster which are unused. What should be done with those prior to installation of Workload Security?
It is recommended to delete all existing unused fpolicy settings even if they are in disconnected state. Workload Security will create fpolicy with the prefix "cloudsecure_". All other unused fpolicy configurations can be deleted.
CLI command to show fpolicy list:
fpolicy show
Steps to delete fpolicy configurations:
fpolicy disable -vserver <svmname> -policy-name <policy_name>
fpolicy policy scope delete -vserver <svmname> -policy-name <policy_name>
fpolicy policy delete -vserver <svmname> -policy-name <policy_name>
fpolicy policy event delete -vserver <svmname> -event-name <event_list>
fpolicy policy external-engine delete -vserver <svmname> -engine-name <engine_name>
After enabling Workload Security, ONTAP performance is impacted: Latency becomes sporadically high, IOPs become sporadically low.
While using ONTAP with Workload Security sometimes latency issues can be seen in ONTAP. There are a number of possible reasons for this as noted in the following:
1372994
,
1415152
,
1438207
,
1479704
,
1354659
. All of these issues are fixed in ONTAP 9.13.1 and later; it is strongly recommended to use one of these later versions.
Data collector is in error, shows this error message.
“Error: Connector is in error state. Service name: audit. Reason for failure: Failed to configure policy on SVM svm_test. Reason: Missing value for zapi field: events. “
Start with a new SVM with only NFS service configured.
Add an ONTAP SVM data collector in Workload Security. CIFS is configured as an allowed protocol for the SVM while adding the ONTAP SVM Data Collector in Workload Security.
Wait until the Data collector in Workload Security shows an error.
Since the CIFS server is NOT configured on the SVM, this error as shown in the left is shown by Workload Security.
Edit the ONTAP SVM data collector and un-check CIFs as allowed protocol. Save the data collector. It will start running with only NFS protocol enabled.
Data Collector shows the error message:
“Error: Failed to determine the health of the collector within 2 retries, try restarting the collector again (Error Code: AGENT008)”.
1. On the Data Collectors page, scroll to the right of the data collector giving the error and click on the 3 dots menu. Select
Edit
.
Enter the password of the data collector again.
Save the data collector by pressing on the
Save
button.
Data Collector will restart and the error should be resolved.
2. The Agent machine may not enough CPU or RAM headroom, that is why the DSCs are failing.
Please check the number of Data Collectors which are added to the Agent in the machine.
If it is more than 20, please increase the CPU and RAM capacity of the Agent machine.
Once the CPU and RAM is increased, the DSCs will get into Initializing and then to Running state automatically.
Look into the sizing guide on
this page
.