This guide is made specifically for the Island's operator, so as they can find fast answer and be aware of the best practices for the problems resolution. This guide will be divided according to the CMFs.

CMFs:OCFOXAOMF

Also, to help the operator to find the proper answer, we will summarize the common problems below:

OCF

Components:Expedient FlowVisorNetFPGAOXA


FlowVisor

The user's slice is unable to start. What should I do?

#1 - Verify if the Flowvisor process that have stopped of working.

1.1 Analysis: The Flowvisor process have stopped.

Some times the user will be unable to start the slice. This is mostly due to the Flowvisor process that have stopped.

To verify if the the Flowvisor has stopped, do the following procedure:

ps ax | grep -i flowvisor

If the command does not return the flowvisor processes, it means that the flowvisor has crashed and it's needed to start it again.

1.1 Solution:

Instead of just starting the service, for safety, we usually restart the Flowvisor service: 

/etc/init.d/flowvisor restart

1.2 Analysis: The process is jammed at 99% of CPU usage

Also, the process can be jammed at 99% of CPU usage, to verify if this is the case, we strongly recommend the use of a tool called top. 

top

If there's a Java process being executed at 99% of CPU usage, it means that the Flowvisor process is jammed.

1.2 Solution:

Following the previous analysis, restart the Flowvisor service, so as to resolve the matter.

/etc/init.d/flowvisor restart

1.3 Analysis: The Flowvisor is not working properly,

There is some cases that the Flowvisor will be listed at the process and won't be jammed at 99% of CPU usage, but it won't work properly. So as to verify this kind of situation, execute the following command:

fvctl-xml getLinks

If there's no output, it means the Flowvisor service has stopped of working.

1.3 Solution:

If there is no output at the results of the analysis, the Flowvisor service must be restarted.

/etc/init.d/flowvisor restart

 

#2 - The hard disk of the Flowvisor VM is full.

2.1 Analysis:

This situation happens because the log of the Flowvisor process has filled all the hard disk.

In case the Operator does not know how to verify the space left in disk, use this command:

df -h

But it's highly advisable to use a monitoring tool, like ZenOSS or Zabbix, or the NOC's Monitoring tool.

2.2 Solution:

So as to solve this situation, the following procedure must be done:

Stop the Flowvisor service:

/etc/init.d/flowvisor stop

And access the Flowvisor's log directory and remove the logs:

cd /var/log/flowvisor/
rm *.log

Also it's advised to configure the logrotate tool so as to avoid the problem. Below it's shown a example of configuration for the logrotate:

Create a file at this directory:

touch /etc/logrotate.d/flowvisor

After that, use this configuration as your template for the logrotate operation.

/var/log/flowvisor/flowvisor-db.log /var/log/flowvisor/flowvisor-stderr.log {
   weekly
   size 1M
   copytruncate
   rotate 10
   compress
   maxage 100
   missingok
}

Be aware, when this kind of situation happens in a federated environment, at least three (03) Flowvisors must be verified:

  • Island A
  • NOC
  • Island B

In case of your Flowvisor not being the fault one, you must contact the NOC operator or the other Island's Operator.

 

Expedient

The user experiments start the slice but the experiment doesn't work. What should I do?

#1 - Verify the user's experiment

1.1 Analysis: The user is creating a loop topology.

Though this is not exactly a problem, depending on the desired application running over the controller (example.: Learning Switch), this may be the cause of the experiment not being able to execute.

1.1 Solution:

If necessary, try to guide the user for the correct creation of the slice (First Experiment Doc to be created).

1.2 Analysis: The controller is not using the correct port

Depending of the chosen controller, the default port may be different than 6633.

1.2 Solution:

For the list of the correct ports and supported controllers, click on this link (doc of controllers to be created).

1.3 Analysis: The chosen VLAN is incorrect

Depending where the experiment is running, VLAN restrictions may be applied. Below is listed the rules for this scenario: 

1.3 Solution:

It's necessary to rebook the OpenFlow resources, choose the correct VLAN, and update the slice again.

After doing all the analysis and the experiment is still not working, other scenarios must be verified, like problems with the NetFPGA servers, and the ToR switch and the Pronto switch may be jammed.

 

OXA

The user's virtual machine won't start. What should I do?

 #1 - Verify the OCF configuration

1.1 Analysis:

Some times the cause of the miss behaviour of the user's virtual machine, may be related of how was done the configuration at the OCF's Virtual Aggregate Manager (VTAM) and the Ofelia Xen Agent (OXA). So to verify this matter, it's necessary to compare the configuration of both components.

At the OCF's VM (10.XXX.0.100, where XXX stands for the Island ID), verify the VTAM's configuration:

 

vim /opt/ofelia/vt_manager/src/python/vt_manager/mySettings.py

And verify these fields:

XMLRPC_USER = "admin"
XMLRPC_PASS = "12345678"
VTAM_IP = "10.XXX.0.100"
VTAM_PORT = "8445"

As previously mentioned, it's necessary to verify OXA's configuration:

 vim /opt/ofelia/oxa/bin/mySettings.py

Verify these fields:

VTAM_IP = "10.XXX.0.100"
VTAM_PORT = "8445"
XMLRPC_USER = "admin"
XMLRPC_PASS = "12345678"

If those configuration files doesn't have the same configuration at the fields mentioned above, this is the probable cause of the user's Virtual Machine miss behaviour.

1.1 Solution:

In case of misconfiguration in any of the fields , it's necessary to make the configuration files match each other.

 

#2 - Verify the OXA service
Some times the OXA service, may crash


In case of procedures above doesn't work out, it may be necessary to restart the whole virtualization server (dom0). Use this option as the last resort.

Below are listed the procedure for this scenario:

  • Inform at our Maintaince Callendar about the expected maintaince time.
  • Inform at our Operator's and User's list, about the stoppage of Island's service.
  • Stop all the Virtual Machines.
  • Reboot the server.


NetFPGA

The user experiments start the slice but the experiment doesn't work. What should I do?

#1 - Verify NetFPGA's OpenFlow service