Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • CPU Usage: Determine where BrightScript code execution is happening
  • Wall-Clock time: Determine where the most time (execution and waiting) is being spent in when a channel channel is running 
  • Function call counts
  • Memory usage, including memory leak detection – Available since firmware version 9 

...

Manifest Key
Value
Type
Legal
Values
Default
Value
Required
Description
bsprof_data_destenumlocal , networklocalYes

If this entry value is local, profiling data is collected on the device and can be downloaded from the Application Installer after the channel terminates. This is the default.

If this entry value is network, the profiling data is sent over the network rather than being stored on the device. See the section on Retrieving Profiling Data for details.

bsprof_enableboolean0 , 10YesTurns on BrightScript Profiling when the channel is runrunning. This is a the master flag and must be set to 1 for any other profiling options to take effect.
bsprof_enable_memboolean0 , 10If using memory profiling

Turns on memory profiling. Only has effect if  bsprofbsprof_enable is set to 1 (bsprof_enable=1).

If this is enabled, bsprof_sample_ratio is forced to be 1.0.

bsprof_pause_on_startboolean0 , 10No

Immediately upon starting after launching the channel, profiling is paused until manually resumed with the bsprof-resume command on the port 8080 debug console. Only has effect if bsprof_enable=1

This is useful for profiling isolated parts of a channel's UI or operations, rather than profiling the entire startup sequence of the channel.

bsprof_sample_ratiofloat0.001 to 1.01.0Yes

Sets how often profiling samples are taken, while the channel is running. Only has effect if bsprof_enable=1

Asampleratio A sample ratio of 1.0 causes every BrightScript statement to be measured and integrated into the profiling data for the channel. Unfortunately, a sample ratio of 1.0 may cause some complex channels to run very slowly when profiling is enabled, making them difficult to test. Choosing a lower sample ratio for such channels can make them more usable , while profiling is enabled. A Although, a higher ratio will yield yields more accurate data, so it is therefore recommended that the ratio should must be set at as 1.0 whenever possible.

If bsprof_enable_mem=1, this value is forced to be 1.0


The bs_prof_sample_ratio can be adjusted from 0.001 to 1.0. A sample ratio of 1.0 is the default and will measure measures every BrightScript statement. A sample ratio of 1.0 will have  has some performance impact, but in most cases, it won’t doesn’t affect the usability of your the channel and will provide provides the most accurate data. However, if your the channel is overly sluggish with a ratio of 1.0, you can reduce  reduce the ratio to reduce lessen the profiler’s overhead. A lower sample ratio will provide less accurate data, so it’s recommended that you use the highest ratio that still allows your channel to be usable. 

Running the profiler on a channel

To initiate the memory profiler, sideload, run, and then exit the channel. The profiling data is complete only after the channel exits. Note that memory data can be streamed using over to a network. The advantage to of streaming the data over the to a network is that it consumes significantly less memory on the device while the channel is running.

...

Channel profiling can be paused and resumed at any time. Use the following two commands on port 8080 to either pause or resume the memory profiler:

...

If the profiler is paused, very little data is written regardless of the data destination. This allows the profiling data (generally, the data relevant to specific parts of a channel's UI or other operation) to be collected and analyzed later analyzed. These two commands are particularly useful when combined with the bsprof_pause_on_start manifest entry.

...

For example, if starting video playback is slow or seems to cause memory leaks, the bsprof_pause_on_start=1 entry can be set in the channel's manifest. After the channel is launched, but prior to video playback, execute the bsprof-resume command on port 8080 , to begin collecting profiling data. After performing the UI operations to be profiled, execute the bsprof-pause command to suspend the storing operation of the profiling data. Then, exit the channel to make the profiling data available for analysis. In this scenario, the profiling data will pertain specifically to the operations performed between bsprof-resume and bsprof-pause.

...

The channel's manifest entry bsprof_data_dest determines how the profiling data is retrieved from the device. The data can be stored locally on the device and downloaded after the channel has exitedfinishes running and exits, or it can be streamed over a network connection while the channel is running. 

...

Local data storage is the default mode of operationstorage for profiling, though it can be explicitly selected by adding bsprof_data_dest=local to the channel's manifest. When operating in using this mode destination, the data becomes available in on the device's Application Installer after the channel has exitedexits:

  1. Launch the channel as you normally would and run through your the test cases. Once you exit the channel , open the page to your finishes running and exits, open Roku device's Developer Settings and click on Utilities.



  2. After you have run through your channel's test cases, click Click Profiling Data to generate a .bsprof file and a link to download the data from your Roku device.



    Note
    The .bsprof format is unique to Roku to ensure the format is as efficient and small as possible and easy to generate even on low-end Roku devices.

...

In order to stream a channel's profiling data to the a network while the channel is running, add bsprof_data_dest=network to the channel's manifest. Streaming data over the network is especially useful when profiling a channel's memory usage because all memory operations are included in the profiling data, and the amount of space necessary to store the data can be very large. By streaming the data to the to a network, the data size is limited primarily by the host computer receiving the data, and not by the available memory on the device itself. Even while streaming the profiling data to the network, there are still additional demands placed on the device's resource resources while profiling as compared to running a channel without profiling. However, the use of resources on the device is significantly reduced.

When this feature is enabled, the start of the channel is delayed until a network connection is received by the device, which will be is the destination for the data. When the channel is launched, a message similar to this will appear the following appears on the port 8085 developer console:

08-31 23:15:29.542 [scrpt.prof.connect.wait] Waiting for connection to http://192.168.1.1:8090/bsprof/channel.bsprof

The URL should be is used with wget, curl, or a web browser. Once a connection is received from one of those programs, this the following message will appear appears on the developer console:

08-31 23:15:38.939 [scrpt.prof.connect.ok] profiler connected

When the channel is exitedexits, this the following message will appear appears on the developer console:

...

Once that message is seen, the profiling connection is closed by the device and the remote file is completepopulated with profiling data.

Processing the data

After downloading the .bsprof file, the data can be viewed using the BrightScript Profiler Visualization Tool

...

  • self refers to the CPU/wall-clock time the function consumes itself
  • callees is the amount of time consumed by any functions called by the original function.
  • total is the amount of time consumed by the original function (self) and any callee functions.

Function call paths

This section of the profiling data contains the function calls in each thread. For SceneGraph applications, each thread corresponds to either the main BrightScript thread or a single instance of a <component>.

For example, if you have a Task node that is instantiated multiple times, each instance will appear as a separate thread. The results will be are the same for any custom <component> in your  in the channel that is instantiated multiple times. The main BrightScript thread (Thread main) is also represented as a single thread even though it has no <component>.

...

Note
CPU time refers to the number of operations each function takes to complete and this number should be equal on the low end and high-end Roku devices.

Wall-clock time

The next 3 columns wall-clock time lists:

  • the amount of "wall-clock" time for the function, 
  • its callees, 
  • and the total. 

Note
Wall-clock time refers to the real world time that a function takes to complete. This value can vary across different Roku devices. For example, a function may take an equal number of operations to complete across different Roku devices but low-end Roku devices can take more real-world time to complete one operation than a high-end Roku device.

Function call counts

As the name implies, this column Function call counts lists the number of times they the functions were called when the channel ran with profiling enabled.

Values from memory profiling 

Calls

Number of times this a function was called

Cpu.self

CPU* used in this a function, itself

Cpu.callees

CPU* used in functions called by this a function

Cpu.total

Cpu.self + cpu.callees

Mem.self

Memory allocated within this a function itself, but not freed (leaks)

Mem.callees

Memory allocated by functions called by this a function, but not freed (leaks)

Mem.total

Mem.self + mem.callees

Tm.self

Real (wall-clock) time ** spent in this on a function, itself

Tm.callees

Real (wall-clock) time ** spent in on functions called by this a function

Tm.total

Tm.self + tm.callees

Avg.cpu_self

Avg.cpu_callees

Avg.cpu_total

Avg.mem_self

Avg.mem_total

Avg.tm_self

Avg.tm_callees

Avg.tm_total

Average of the metric, over the number of calls (e.g., if cpu.self=100 and calls=2, avg_cpu_self will be 50).

Info

A “memory leak” is simply any memory that is allocated, but not freed while the profiler was running. If memory is freed while profiling is paused, the free

...

memory is not

...

tracked and the memory may show up as “leaked.”

Time is measured as if a stopwatch were used to time the action. For example, if a function makes a network call, there may be very little CPU time used, but a significant amount of time waiting for the network response.


 

If any of these metrics appear in a call path, there they are specific to that call path. For example, in this call path:

 

<root>: cpu.self=0,cpu.callees=14700,tm.self=0.000,tm.callees=1.989,mem.self=0,mem.callees=324452,calls=0
 
+- func1(): pkg:/components/file1.brs:83,cpu.self=200,cpu.callees=14500,tm.self=0.728,tm.callees=1.261,mem.self=5840,mem.callees=318612,calls=1
 
|  +- func2(): pkg:/components/file2.brs:22,cpu.self=14500,cpu.callees=0,tm.self=1.261,tm.callees=0.000,mem.self=31800,mem.callees=612,calls=1

The metrics for func2() are specific to when it is called from func1().

 

However, in this the table below:

 

------------- BEGIN: TOP CONSUMERS: CPU.SELF -----------------
 
  1: func1(): pkg:/components/file1.brs:83,cpu.self=300,cpu.total=450,tm.self=0.001,tm.total=0.001,mem.self=0,mem.total=0,calls=5
 
  2: func2(): pkg:/components/file2.brs:22,cpu.self=55430,cpu.total=80500,tm.self=0.126,tm.total=0.126,mem.self=0,mem.total=0,calls=3
 
-------------- END: TOP CONSUMERS: CPU.SELF -----------------

 

The metrics displayed are the totals for all calls to each function, on any call path.

Using this data

Here are a few key points on how to use this data to improve channel performance:

Data TypeDefinition and Best Use
High wall-clock time but low CPU timeThis pattern shows a function is consistently waiting, whether it be for input or a response from an external source. These functions are best suited for Task nodes so that it doesn't block the main thread.
Complex functionsTry to simplify the functions as much as possible. If a function handles multiple tasks, consider breaking it out into several functions to further isolate how much CPU or wall-clock time is consumed by each task.
Functions that consume a large amount of CPU or wall-clock timeTry to reduce the number of calls to these functions as much as possible. Move functions to Task nodes, if they are consistently waiting. A function can be determined to be waiting if it's wall-clock time is high, but its CPU cost is low.

...