Recently in Computer Category

iAudio X5 Tagging

| 2 Comments | No TrackBacks
The Cowon iAudio X5 is, in many ways, the ideal music player for the discerning geek.  It's Linux friendly, behaving as a normal USB mass storage device requiring no special software nor drivers; it plays Ogg/Vorbis and FLAC files; it has a line in connector; it has fantastic sound quality; it organises your files by folder, so you can have your files organised how you really want them.  It also has a battery which lasts ages, particularly the X5L version.  When my X5L was younger, a full charge would last for about two weeks of fairly constant use at work.  Now it's several years old - I've actually forgotten how old exactly - and it still lasts for many days.

It does, however, have a particular weirdness when handling Ogg/Vorbis files.  Its tag parser is extremely basic, and apparently just skips one character (the equals sign) after finding the letters "ARTIST" in the tags.  But most recent ripping software adds an extra field, called ARTISTSORT.  The X5 often finds that instead of ARTIST, and thinks the name of the artist is something like "ORT=Haza, Ofra" instead of "Ofra Haza".

So, I wrote a little script to remove the ARTISTSORT tags altogether.  It's a cheap solution - it'd probably be better just to move that tag to the end (possibly the start, I didn't really test) of the tags field, but this field isn't used for sorting on the X5 anyway.

Here's the script ("ogg-retag").  It's really simple:

#!/bin/sh

if [ -e ogg.tags ]; then
        echo "ogg.tags file exists.  Please remove it first."
        exit 1
fi

for FILENAME in "$@"; do

        echo "Processing "$FILENAME

        # Dump tags to file, removing ARTISTSORT tag
        vorbiscomment -l "$FILENAME" | grep -v "ARTISTSORT=" \
                > ogg.tags

        # Write tags back
        vorbiscomment -w -c ogg.tags "$FILENAME"

done

rm -f ogg.tags

To run, simply "cd" into the folder containing the problematic Ogg/Vorbis files, then run "ogg-retag *.ogg" or similar.

Easier Partial "DCommitting"

| No Comments | No TrackBacks
I've been making extensive use of Git-SVN recently, because I vastly prefer the user interface of Git to that of SVN, yet am collaborating on a project which uses SVN.  My workflow consists of keeping a set of commits on top of the commits from SVN, which do things like add .gitignore files (hint: use git svn create-ignore to make these automatically), or contain code that I'm not ready to show other people just yet, or don't ever want to release.

This isn't the most advanced use of git-svn possible, but it's quite neat, simple and difficult to make mistakes with.  When I described the method on IRC to someone who was asking about exactly this situation, it sounded like my method was of interest.  I promised to do a short write-up, and here it is.

My workflow consists of three simple(ish) aliases.  The first one is the simplest, for updating to the latest SVN trunk:

git stash && git svn rebase && git stash pop

This just stashes any local changes to the working tree, then rebases the unpublished, "floating" commits on top of the latest commits from SVN.

The second is used when I have one or more commits to add to the SVN repository. Having previously created a branch "for-commit" (it doesn't matter where it's based initially), I open "gitk" and make sure it's up to date.  It probably looks something like this:

before.png
Then I invoke the second alias, which is:

git stash && git checkout for-commit && git reset --hard remotes/git-svn

This stashes local changes, checks out the "for-commit" branch and resets it to the most recent known head of the Subversion trunk.  Then I refresh my gitk window, which then looks like this:

during-1.pngNothing too exciting has happened, but we're now on the "for-commit" branch which points in a sensible place.  I right-click* each commit that's to be committed to Subversion, and cherry-pick them onto the "for-commit" branch:

during-2.png
So, now the "for-commit" branch has diverged from "master" because the commits have been re-written by the cherry-pick operation.  Now comes the clever(ish) part, which is the third alias:

git svn dcommit && git checkout master && git svn rebase && git stash pop

This commits the contents of the for-commit branch to SVN, then switches back to master and rebases it on top of the latest SVN branch, which now includes the commits just added.  Git-SVN is clever enough to realise that some of the commits, but not all, are now in SVN.  Gitk looks like this after a final update:

after.png
There are a couple of "sidings" in the history.  The lower of the two is the old location of the "floating" commits on master.  The upper one is the previous location of the for-commit branch.  You could tidy them up by restarting gitk, but I find they're useful for keeping track of what I did.  If you look closely, you'll notice that someone else added a commit to the Subversion branch between the last time I updated and the final commit, and git-svn dealt with that too.

So there you go.  Maybe that's useful to someone.

* I don't actually "right-click", since I'm left-handed.  I also use a trackball instead of a mouse.  You should get one - they're awesome.

OpenCL Calculation and Reduction

| No Comments | No TrackBacks
Otherwise known as "calculating a long list of numbers then adding them all up".  For a GPGPU (OpenCL) simulation program at work, I needed to calculate around 160 numbers which would be averaged to produce one result for storage in a 1024x1024 element array.  That's 160 numbers for each of 1024x1024 pixels, which would be a lot to store as an intermediate result for a later step of averaging on the GPU, or (heaven forbid) to be copied back to system memory.

The magic word to search for in tackling this is reduction, and there's plenty of hardcore compsci knowledge about how to make it go as fast as possible in a parallel environment.  But, basically the trick is to have 160x1024x1024 threads operate in groups of 160 (one group for each of the 1024x1024 overall elements).  Threads cooperating like this can share memory, and each thread writes its individual value to an array in that local memory.  Then, one of the 160 threads adds up all the values and does a single write of the final average value to the global array.  For the kernel to test if it's running the "chosen thread" is as simple as something like this:

if ( get_local_id(0) == 0 )

The only bit of "funny business" is that each of the 160 threads has to have finished calculating before the results can be added.  That's done with this statement, which guarantees that all previous local memory writes have completed for all threads: 

barrier(CLK_LOCAL_MEM_FENCE);

This is a really simple example: for one thread to do all of the averaging is a waste of resources when the reduction itself could be parallelised.  In that case, one thread would (say) add up values 0-79 while another added up 80-159, then one of those threads would (after another barrier) add up the remaining two values.  It's easy to see how it can be broken down more and more, and there are variations which make better use of the GPU resources, avoid memory conflicts, and so on.

So, if you'd ever heard of the thread groups and local memory used in OpenCL (also CUDA) and wondered what they were good for, now you know..

NVidia's OpenCL Programming Guide has a lot of discussion of this topic, and there's loads more to be found around the web.

OpenStreetMap's Role in Haiti

| No Comments | No TrackBacks
This is very cool.  Volunteer contributors used satellite imagery, which was made available especially, to create detailed maps of Haiti which the aid workers used to help find their way.

Gitolite

| No Comments | No TrackBacks
I've just spent a few hours rearranging my Git repositories to use Gitolite, something I've been meaning to do for a long time. This gives me much better control over access permissions, potentially letting me give other people access to my repository without handing over shells and/or real user accounts.

I've done some testing, but things might have broken.  If they have, address your complaints to the usual address..

New Toys

| 2 Comments | No TrackBacks
So, I'm now the proud owner of a Lenovo Thinkpad T500 with WSXGA+ screen (1680x1050), 7200rpm HDD, Core 2 Duo, UK keyboard and 9-cell battery. Exactly to specification, and within budget :D

I've also ordered myself a second Freerunner.  This was a bit less of a budgeted expense, but I have no real shortage of money at the moment (thanks to not drinking regularly).  I realised that the reason for my recent lack of productivity wasn't time as such, rather the faff involved with switching into a very unstable environment for development then having to go back to a usable setup at the end of a "session".  Developing in "sessions" like this seems to be a nice way to avoid getting anything done at all - I had the same problem at the start of my DRM work when we changed to Linux 2.6.29 from 2.6.24.  It's much better to be able to work in a semi-continuous stream as time allows.

There's another reason for this purchase though.  I'm affected strongly by the infamous Freerunner buzz problem in Germany, whereas I didn't notice it back in the UK.  I was going to send my FR in to get both the buzz and #1024 (standby time) fixes done, but I've decided instead just to buy a new one with both fixes already.  Then I'll use the new one day-to-day while my current one becomes a development platform, installed with all the latest and most unstable software I can find, so that I can stomp on the nastiest bugs with some degree of comfort.

And there's one more new toy:  A 32TB RAID6 array with 4 optical fibre channel connections for storing and analysing our data on at work.  All my analyses just went from being I/O limited to being firmly CPU bound..

Code Offsetting

| No Comments | No TrackBacks
I just came across The Alliance for Code Excellence, where you can offset bad code you've written in the past with donations which support open-source projects which "decrease the propagation of bad code".

New Laptop Time

| 2 Comments | No TrackBacks
{I am getting} {scary messages} in {curly brackets} in my laptop's {dmesg}.  The universal warning signal for imminent hard drive death and data loss.  I get dropouts of about 30 seconds at a time with no hard drive activity (before the kernel realises and reset the link), during which time the computer is mostly frozen (no HDD I/O possible), and this seems to be happening more and more often.  In addition, the power connector is broken - the central pin in the laptop's connector snapped off.  Since the pin stays fixed in the hole in the adaptor's plug, it still just about works if it's carefully pushed in and the cable wrapped round to put pressure in the right way.  However, I don't know how long either of these will hold out.

Of course I'm backed up to the hilt with distributed version control, so I'm not in immediate danger of losing anything particularly important.  However, it's apparent that I'll need to buy a new laptop in the near future.  At the moment I'm looking at a Lenovo Thinkpad T500 with WSXGA+ (1680x1050) screen and Radeon graphics, but does anyone have any other suggestions?  My non-negotiable requirements are:

  • Linux-friendly wifi and graphics.
  • Dual core, or at least HT.  This really does make a huge difference.
  • Widescreen.  1680x1050 with a 15.4" screen gives a resolution I like.
  • UK keyboard layout (i.e. UK market, ideally with delivery to Germany possible).
  • DVD drive.
Ideally it would also have:

  • Decent battery life, or the possibility to buy spare or larger batteries during the next few years once the original one becomes a plastic box of jelly.
  • VGA output.
  • Fast-ish hard drive (7200rpm or higher.  I'm not sure how much of a difference this makes, but I do a lot of compiling and so on.  No need to go overboard with solid-state disks for hundreds of extra pounds.
I'm not too bothered about:

  • Bluetooth (I don't use it at the moment).
  • Huge hard drive - I get on fine with only about 80Gb at the moment.
Any suggestions on a comment or email to this address..
While debugging something different late last night, I noticed some flags in one of Glamo's registers which looked interesting: FIFO settings for the LCD engine.  This reminded me of an observation by Lars a few weeks ago that the LCD engine seems to conflict with Glamo's 2D engine on memory accesses, leading to slower performance of accelerated 2D operations when the screen is switched on.  So I turned the FIFO up to "8 stages" (from 1) to see what happened.  The result was much faster 2D operations - literally twice the speed!

At "8 stages", the price of this was that the display became jittery and unstable.  However, the same speed improvement is seen at the "4 stages" setting. I've also seen some occasional artifacts with this setting, so I'm using 2 stages at the moment, where the speed is still right up there.  I'll be testing some more and seeing if things can be tuned even more.

Because we don't make the maximum use possible of the 2D engine, this doesn't immediately translate into a huge increase in the UI speed.  But the differences are very obvious with x11perf and some of my test programs. The program I showed in the screenshot recently jumped from 45-48fps right up to 95-98fps!

"Look Ma, No Busywaits!"

| 5 Comments | No TrackBacks
When the CPU needs to do something which depends on a result which the GPU is currently working on, it has to wait for the GPU to catch up.  One of the biggest problems with the current architecture of xf86-video-glamo, both DRM and non-DRM versions, is that they do this waiting by spinning in a tight loop, each time checking the current status of the GPU, until it's caught up.  This isn't great for a few reasons.  It makes no use of the parallelism between the CPU and the GPU, so precious CPU time is being wasted while something more useful could be being done.  If there's nothing else to do, then the CPU could be sleeping - reducing power consumption.

Most GPUs, including Glamo, have a mechanism for being a little smarter.  The kernel can ask the chip to trigger an interrupt when a certain point in the command queue has been reached.  When a process needs to wait, the kernel can send it to sleep and watch out for the interrupt.  When it happens, the process can be quickly woken back up in a low-latency fashion, meaning that the process gets back to work with very little latency.

This week, I've been implementing this kind of thing for the Glamo DRM driver.  It goes a bit like this:

  • Process submits some rendering commands via one of the command submission ioctls.
  • Kernel driver places rendering commands on Glamo's command queue.
  • Process needs to wait for the GPU to catch up, so calls the wait ioctl.
  • Kernel driver puts an extra sequence of commands, called a fence, onto the command queue.  A unique number is associated with the fence.  The number is recorded by the kernel.
  • When the GPU processes the fence, it raises the interrupt and places a unique number into a certain register.
  • The interrupt handler checks this number, and wakes up the corresponding process.
I wrote a test program which tells Glamo to fill the whole screen with colour as fast as it can, waiting for the GPU to catch up each time.  The task was to make the program run with close to zero CPU usage while still getting the full framerate that I could get using busywaits.  The task was achieved successfully, and here's a screenshot to prove it.  The framerate - just below 50fps when doing fills of the entire VGA screen - was exactly the same with busywaits.  It even went up a little (to 50-51fps) when I improved the interrupt handling.

Things aren't always so great.  When the command sequence to be executed is very short, the overheads of fencing and scheduling become significant, and the overall rate drops.  However, it shouldn't be too difficult to design some kind of heuristic to use busywaits as a low-latency strategy in such cases.

There are still a few problems to iron out.  The fence mechanism seems to be able to fall out of sync with things, leading to processes waiting for too long (or even forever).  But when it works, some things do seem to feel a little faster in general use.

Geeks may be interested in the actual code.

June 2010

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

About this Archive

This page is an archive of recent entries in the Computer category.

Everything Else is the next category.

Find recent content on the main index or look in the archives to find all content.