Allocating Camera memory faster on Android Part Two

March 28, 2014 by Edison Wang in android

In Part 1, I talked about how to avoid the GCs so you can get reasonable speeds when using the frames from Android Camera's onPreviewFrame method and process them without losing any, it was basically as follows: (let's call this Method A)

1. Get faster memory allocation with tricks mentioned for small pieces of memories (bytes[]). The number of byte[] needed for the slowest device is the maximum number of frames to process (N);

2. Put the frame from onPreviewFrame to another thread.

3. The other thread process the data, and then give it back to the Camera.

It turns out, there is another way to do it that's much faster, Method B:

1. Get faster memory allocation with tricks mentioned for small pieces of memories (bytes[]). The number of byte[] needed for the slowest device is about 10.

2. Put the frame from onPreviewFrame into a shared large ByteBuffer Queue that's big enough to fit maximum number of frames to process (Generate this Queue with ByteBuffer.allocateDirect(N*singleFrameSize) and then give back the buffer.

3. Another thread will manage the queue independent of the onPreviewFrame thread (process frames, drop frames when on pressure, etc.)

Comparison on cold launch:

Method A: Requires generation of N byte[] in Java, and a total of N * singleFrameSize bytes.

Method B: Requires generation of 10 byte[] in Java, (N + 1)* singleFrameSize in memory block allocation, and a total of (N + 11) * singleFrameSize bytes.

If done right, Method A can trigger lots of GCs, average about (0.1 * N), so for 180 frames and assuming only the last 20% would GC with the large chunk first trick will make it about 3s. Method B, the allocation will basically be about 0.1 * 11, making it only use about 1s time.

Allocating Camera memory faster on Android Part One

September 08, 2013 by Edison Wang in android

Part 1 Part 2

One thing that was learned while building the capturing part of Vine for Android was dealing with all the raw buffers in order to satisfy the stop motion requirements. (According to Instagram, they were able to use the native MediaRecorder with 700ms+ delay on start time and a minimum duration, but Vine can't afford that in order to do stop motion) And because we can't use MediaRecorder, there are other libraries that are linked in order to do the encodings.

In order to use the raw buffers, setPreviewCallbackBuffer will be used in place of setPreviewCallback and addCallbackBuffer must be called with a minimum number of frames added prior/during to preview. This way buffers will not be generated during run time so that there is no lag during recording (which causes serious frame drops). For Vine, we take the frames, put them on a concurrent queue, another thread will take the buffers from the queue, process that frame, and then put the buffer back to Camera. So for a 6 second 30fps video, a maximum of 180 frames will be needed if the user records one single long clip. There goes the problem, 180 frames of raw bytes is pretty big to allocate at first as each frame is about 1MB big to allocate them at once will likely cause OOM and turns out to be really slow. But let's look the iteration that we did to minimize the problem as well as how to make everything else faster.

Naive solution: Add 180 frames prior to startPreview, guarantee 180 frames for all phones. Doing all the allocations and initialization of classes and objects. when user starts recording.

Result: GC_ALLOC happens, OOM happens on some phones, and frag increase of heap causes the allocation to go up to 10 - 30 seconds on certain phones. Takes 1 - 2 seconds before allocation happens.

First thing I tried was to identify the bottlenecks during recording so that we don't need that many frames. Can process be faster so that we don't need that many frames?

Processing a frame really consists of four small steps so it was not hard to time them.

(all the times are relative to the paragraph and to each other instead of real times since it varies by device)

1. Convert a NV21 frame to a Bitmap for manipulation. (Time: 50x)

2. Doing bitmap manipulation on the converted Bitmap. (Time: 5x)

3. Encode the bitmap. (Time: 20x)

4. Write to the container. (Time: 1x)

Optimize processing:

1. If conversion in Java takes about 50x, can we do it better in native? Or is there a better solution. It turns out, if we do color conversion on GPU via an intrinsic RenderScript (super optimized conversion script), we can make it go from 50x to 1x with just a few lines of code. Unfortunately, this is Android 4.2+ only at time of writing but a support library may come in the future to back port this to older Android devices.

2. All the bitmap manipulations were separated (rotation, clip, inversion), if we use a single Matrix, time was modified from 5x to 2x.

3. Encoding, there isn't that much we can do here since the encoding algorithm is already optimized. If we use MediaCodec, time would be down from 20x to 10x, but this is 4.1+ and there is no sign that a support library may support this in the future.

4. Writing it to the container is super fast, nothing to be done here for now.

What did this was that we can now cut down from 180 requirement to a 140 requirement on certain devices, and 120 requirement on 4.2+ devices. (We have a device profiling system for this).

Improved processing solution: Add 140 frames prior to startPreview, guarantee 140 frames for all phones. Doing all the allocations and initialization of classes and objects. when user starts recording.

Result: GC_ALLOC happens less, OOM happens on some phones but less, and frag increase of heap causes the allocation to go up to 5 seconds on certain phones before they can start recording. Takes 1 - 2 seconds before allocation happens. (The big improvement here happen because GC on the last 40 frames is usually the slowest).

This is still unacceptable.

---

Improve allocation speed: Lying to get more memory is good.

Why does GC happen? Why is growing heap even needed if we know how much we need?

GC happens when the allocated heap is hitting about 70% capacity. And heap grows in frag because we only asks for a small byte[] at a time.

It turns out, right before adding small buffers, I can add the following code to make it 100x faster:

temp = new byte[140 * requiredSize * 1.5] ;

temp[0] = 1;

temp = null; //Explicit.

This makes GCALLOC happens much much less (sometimes only once) and no more heap growing more than once.

Result: GC_ALLOC happens much less, OOM happens faster, allocation time to go up to 2 seconds on certain phones before they can start recording. Takes 1 - 2 seconds before allocation happens.

Much better, but can we do better?

---

The rest of the improvements that we did we around using a service that maintains class loadings, using a bytebuffer queue when they restart recording so that we don't have to allocate more buffers, eventually bring the OOMs down to a very very small number, and allocation times to about 1.5s. The details are not important but what's important is that there was so much room for improvements and at many places that we did not expect to make a huge impact. Timing the execution and using GMAT like tools were very important at first for us to identify the bottle necks.

Ajax loading: No Hashbangs, just HTML5

February 21, 2012 by Edison Wang

Hashbangs (#!) has been around the web for a while now since Twitter started using it for Ajax page loads. I was rather annoyed by the unnecessary page refreshes from time to time. But noooow...Twitter is finally starting to move away from hashbangs! I remember talking to the front-end engineer at Twitter a few months ago at their San Francisco headquarter with it and asked them why they did it. It was when I was working on the menu's for SleepBot. After I came back to New York, I wanted to implement it for SleepBot. However, it turned out that it is not that simple to parse out the URL's hashbang after all and press was there for us to release it on time.

So I ended up using History.js, a Javascript wrapper for HTML5 history management: on the links that uses ajax load, we bind the click with the following if statement:

if(History.enabled){
//Do the ajax load, and then use History.pushState(....) to update the url
}else{
//IE users, I'm sorry, you are getting a cold refresh.
}

This worked at first, however, as the project grows, problem happens with Ajax load: bindings stopped working! It was caused by my lack of understanding for jQuery's bind live and delegate functions. If you are serious about web development using jQuery, that is a must read.

A change I made to my screen to increase productivity

September 27, 2009 in tech news

Requirement:

1. Running Windows 7 and no other dock/software lunching bars. 2. A huge wide-screen monitor. (21"+)

Steps: 1. Move the taskbar to the right-side and dock your most frequently used softwares there.

Benefits over on the bottom: 1. The icons are relatively horizontal at your eye sight so you don't have to look down. 2a. More practical viewing spaces for programs/web pages. --For the non-taskbar space. This is because of the ratio adjustment will just make things look nicer. 2b. For the taskbar itself, the icons seem bigger and nicer and aero boxes appears on the right seem to be nicer/more logical.

Time to do it: 1. 1 second.

Garena Host Drop Method...Investigated..~

July 21, 2009 in life

Recently I started playing W3C TFT on Garena since my (friend mike's ) CD-key were banned due to excessive 'dual use'. And that Garena actually keeps records of your gaming stauts. Sometimes, however, in Garena the Host can just drop people from the other team when he is lossing.

I thought for a while and tried my guess on how it works.

The way to do it without any tool is fairly simple:

1. While lossing, just tell people that you will remaking. 2. TURN OFF YOUR WIFI/UNPLUG YOUR ETHENET CABLE. 3. Tab back to the game, wait for 30sec-1min. 4. Keep pushing...the game is urs!

This is cheating.....so yea...same policy as other cheating things I made, They are just for entertaining purposes, I am not taking the responsibility of irresponsibility.

Back to the ideas... So I wonder how do you only kick certain people? Then I realized that Garena just mimicks LAN, so that you can simply Block some users from accessing your computer,...as simple as that ......um~I do believe general firewalls can achieve those in Advanced Settings.

However, if you are a computer noob, there is what i suggest you do: 1. Tell your teammates / friend to stay ~ 2. Then excute the drop method.

anyhoO~~~..just a simple hack for fun~

The new Iphone 3GS

June 08, 2009 in tech news

Yay, new Iphone! New Cheap Iphone!...Nha, it is not cheaper, though it is much better.

The new generation of iphone offers both software and hardware improvements, some key points are as follows: 0." The iPhone 3GS has a new processor built-in. Apple claims that it is up to two times faster than the previous generation: Launching messages is 2.1 faster, load the NY Times in Safari: 2.9 times faster. It also consumes less, which has an impact on the improved battery life."[Gizmodo] 1. 3M Camera, which is better than most mainstream phones in the U.S market. *still far from the 6-10MP of the Asian market. But with the software improvements that come with it the quality should increase in a noticeable amount 2. Support for 7.2 Mbps 3G standard. None of the current U.S. networks supports it, therefore, personally I think this is a feature for international users since new generations will be out by the time AT&T and such have it supported. 3. New graphic engine: better gaming experience. We will see it eventually beats PSP in handhold gaming. Well, at least, DSI might have the ability to compete with it. 4. Magnetometer to use with map applications etc.. Use your imaginations to see what the application makers will do with it. I am totally expected to use this in app-games. 5. Increased battery life. Apple say its is 20%-50% better than 3G depending on what you do. Well, personally I believe that will be enough for most people since at least 5hrs(gaming non-stop) should be enough for a day, and then...you just charge it ...lolz 6. Voice control. I think apple held this feature back because of the new shuffle. Yea..it's business that stops techonologies from happening sometimes.

New Price: $199 with contract for the lowest storage one, goes up by 100 each time you double the storage. AT&T Users: "Gizmodo:

If you are "a valued AT&T customer," AT&T offers an "early iPhone upgrade with a new 2-yr commitment and an $18 upgrade fee." The price? $399.00 for the 16GB iPhone 3G S and $499.00 for the 32GB iPhone 3G S. It gets worse: For non-qualified customers, including existing AT&T customers who want to upgrade from another phone or replace an iPhone 3G, the price with a new two-year agreement is $499 (8GB), $599 (16GB), or $699 (32GB).

Insane. Way to go AT&T."

For a complete coverage on iPhone 3GS, visit the following links [Gizmodo]:

iPhone 3GS • iPhone 3GS Complete Feature Guide • iPhone 3G vs. iPhone 3GS Comparison Chart • iPhone 3GS Video Walkthrough (Quick 4-Minute Version) • iPhone 3GS Gets Voice Control