Thursday, June 11, 2015

OS X Yosemite sshd disconnection

If you have long running ssh sessions into your OS X machine*, you need to disable App Nap on sshd:

defaults write sshd NSAppSleepDisabled -bool YES

*I often mount an OS X host share over ssh from virtual machines.

Friday, January 2, 2015

Flush a file cached in memory on Windows

If you're trying to benchmark the IO cost of file reads on Windows, you'll often find the contents of a file are cached in memory.

Here are two methods to flush this cache and ensure your code runs as slowly as possible:

Flush the entire file cache:

The command is 'Empty Standby List'

Flush a specific file before you read it:


Tuesday, November 11, 2014

Direct3D9 D3DImage sample code problem with depth buffering

Microsoft's sample code [1] for D3DImage will not work for depth buffering as-is. There is an article on CodeProject [2] which has the same problem.

ZeroMemory(&d3dpp, sizeof(d3dpp));
d3dpp.Windowed = TRUE;
d3dpp.BackBufferFormat = D3DFMT_UNKNOWN;
d3dpp.BackBufferHeight = 1; // invalid
d3dpp.BackBufferWidth = 1; // invalid

You must set the presentation parameter's backbuffer size to the size of your render target.
d3dpp.BackBufferHeight = width; // correct
d3dpp.BackBufferWidth = height; // correct

[1] Walkthrough: Creating Direct3D9 Content for Hosting in WPF

[2] Introduction to D3DImage

Monday, November 18, 2013

Local Mercurial, Remote Subversion

I have a new remote developer who works overseas. The typical Subversion workflow is cripplingly slow even over a local network, so we use Git's git-svn plugin.  The overseas developer wants to use Mercurcial, and fortunately HgSubversion exists.

We extract some Subversion revision properties as part of our build process through either
svn info or git svn info.

However, hg svn info will try to contact the remote repository every time! To avoid this, you can use the svnrev and svnpath keywords added to Hg's log by HgSubversion:

hg log
--limit 1
--template="Revision: {svnrev}\nPath: {svnpath}\nRepository Root: Root\nURL: Root{svnpath}"

Thursday, January 24, 2013

FFTW 3.3.3 on ARM

Performance increases roughly an order of magnitude from ARM soft float to ARM hard float, and another from ARM hard float to Intel.

Toradex's T20: NVIDIA Tegra 2 without NEON:

ib256 = In-place (input array overwritten with output) backwards 256
ob256 = Out-of-place (input array is preserved)  backwards 256
// gcc version 4.5.2 (Sourcery G++ Lite 2011.03-41)
// Soft float, -fPIC
./bench -s ib256
Problem: ib256, setup: 7.16 s, time: 143.97 us, ``mflops'': 71.127

//gcc version 4.7.3 20121001 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2012.10-20121022 - Linaro GCC 2012.10) 
// Hard float, -fPIC
Problem: ib256, setup: 7.20 s, time: 13.83 us, ``mflops'': 740.52
Problem: ob256, setup: 3.38 s, time: 12.45 us, ``mflops'': 822.8

Toradex's T30: NVIDIA Tegra 3 with NEON:

//gcc version 4.7.3 20121001 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2012.10-20121022 - Linaro GCC 2012.10) 
// Hard float, -fPIC, -mfpu=neon
Problem: ib256, setup: 6.98 s, time: 10.85 us, ``mflops'': 943.9
Problem: ob256, setup: 3.25 s, time: 9.87 us, ``mflops'': 1037.6
// Hard float, -fPIC, -mfpu=neon, --enable--neon
Problem: ib256, setup: 10.09 s, time: 8.14 us, ``mflops'': 1258.2
Problem: ob256, setup: 5.15 s, time: 7.18 us, ``mflops'': 1425.1

An i7-950 running an Ubuntu VM:

// --enable-sse2, -fPIC, -m32
Problem: ib256, setup: 42.86 ms, time: 845.89 ns, ``mflops'': 12106
Problem: ob256, setup: 21.93 ms, time: 665.65 ns, ``mflops'': 15383

Friday, August 31, 2012

High resolution timers on ARM

Our embedded device currently runs Linux without high resolution timers and CONFIG_HZ=100.

I compiled the 2.6.33 kernel with CONFIG_HIGH_RES_TIMERS=y and no other changes.

I wanted to know a little more about how clock interrupts are handled with this kernel option.  Clock interrupts must happen more frequently than 10 ms in order to support higher resolution timers, or else nanosleep must busy wait in the kernel if and only if no other processes are runnable.

checking the interrupt count in /proc/interrupts between sleeps:
cat /proc/interrupt; sleep 1; cat /proc/interrupt
 26:     628010          SC  ost0                
 26:     628114          SC  ost0                

Reports 100 interrupts in a second, or 10 ms, the jiffy time.  The clock event source seems have the same interrupt frequency as before.  That can't be right, so...

I forced interrupts to happen with greater frequency: (see end of post for code)

 26:     642285          SC  ost0                            
kernel timer interrupt frequency is approx. 4016 Hz or higher
 26:     643307          SC  ost0                       

Reports ~1000 interrupts during the program execution, far greater than the jiffy.  Clearly the interrupt generating clock must be reprogrammed when a wait time less than a jiffy is requested.

01#include <signal.h>
02#include <stdio.h>
03#include <stdlib.h>
04#include <string.h>
05#include <sys/time.h>
08#define USECREQ 250
09#define LOOPS   1000
11void event_handler (int signum)
13 static unsigned long cnt = 0;
14 static struct timeval tsFirst;
15 if (cnt == 0) {
16   gettimeofday (&tsFirst, 0);
17 }
18 cnt ++;
19 if (cnt >= LOOPS) {
20   struct timeval tsNow;
21   struct timeval diff;
22   setitimer (ITIMER_REAL, NULL, NULL);
23   gettimeofday (&tsNow, 0);
24   timersub(&tsNow, &tsFirst, &diff);
25   unsigned long long udiff = (diff.tv_sec * 1000000) + diff.tv_usec;
26   double delta = (double)(udiff/cnt)/1000000;
27   int hz = (unsigned)(1.0/delta);
28   printf ("kernel timer interrupt frequency is approx. %d Hz", hz);
29   if (hz >= (int) (1.0/((double)(USECREQ)/1000000))) {
30     printf (" or higher");
31   }      
32   printf ("\n");
33   exit (0);
34 }
37int main (int argc, char **argv)
39 struct sigaction sa;
40 struct itimerval timer;
42 memset (&sa, 0, sizeof (sa));
43 sa.sa_handler = &event_handler;
44 sigaction (SIGALRM, &sa, NULL);
45 timer.it_value.tv_sec = 0;
46 timer.it_value.tv_usec = USECREQ;
47 timer.it_interval.tv_sec = 0;
48 timer.it_interval.tv_usec = USECREQ;
49 setitimer (ITIMER_REAL, &timer, NULL);
50 while (1);