Google releases Omnitone what changed for VR audio production

Lei feng’s network (search for “Lei feng’s network” public attention) by writer Zhao Longxi, senior investment manager, graduated from Department of chemistry, Peking University, Columbia University, Dr polymer study. Concern VR/AR, culture, entertainment, business services and directions.

Recently, Google has released a new open source project–Omnitone, the project aims to achieve normal headphones 3D panoramic sound effects in a VR environment, allowing users to get a better VR immersion.

• VR technology increasingly affect our lives iWatch Case

Introduction

CJ is over, we’ll talk to a VR.

VR can be understood as a human process of pursuing more natural human-computer interaction. From the start you can only by programming language and computer era Say Hi 286 (imagine robot girlfriend of that era), to Win95 first generation visualization window of operation, and now wears a helmet into a 360-degree virtual space. Interaction between man and machine is becoming more and more natural, Virtual World more and more like the Reality, this is virtual reality (Virtual Reality).

Images of our helmets, next turn around, you can see the different images, it is natural for human-computer interaction; voice, too, if we turned around, suddenly the sound of footsteps behind increases or decreases, listening, next door to the creaking sound you can hear more clearly, then, realistic virtual worlds may not far from us.

So today, “Let us talk about VR voice.”

VR voice

Simple to introduce, first of all, there are a number of 3D sound field solutions. You may think that, in fact, very simple, being a virtual space, the space in any of the sources, the three dimensional coordinates calibration, and then import the VR helmet coordinate data, then output the sound, you have a perfect “natural” sound.

This right, on the assumption that: your audio source is little enough and fast enough speed. Known State of the Art Dolby cinema programme can support up to 128 sources … … But can’t talk on the mobile phone, PC environments can be cinema. (And approximated by this sound field is impossible in the real world, you can understand why when you are close to the real world, a sharp rise in the number of the sound field, like Crystal’s atomic number is 10^23 level, to characterize crystal structure, preferably with wave functions) is now the most common Quad Binaural and Ambisonic respectively in two ways.

Quad Binaural

Quad is a Binaural sound field by 0, 90, 180, 270 degree four direction to characterization.

For example, if you are recording a sound stage, I was the point in each of the four directions (actually around) sound of taped, two-channel in each direction, eventually eight channels. Then when I turn any direction not just around, say, 45 degrees, with the voice of the four directions to make a weighted sound in new directions.

The problem with this is that, in addition to outside the plane turned around, when you are up or down, the sound is not changed. Good simple decoding is very easy, for example, we naturally think about 45 degrees when sound is half of the 0 + half of the 90-degree (although the actual situation is more complex), and compared to the now commonly used first order Ambisonic (FOA,First Order Ambisonic), the horizontal sensitivity higher.

Ambisonic

Ambisonic is starting from the spherical harmonics, (n+1) ^2 a channel to represent the sound field.

For example, Google released Omnitone is first order Ambisonic, so (1+1) ^2=4 a-channel, as shown in the following figure, w,x,y,z. W can be thought of as the background, x,y,z is the sound coming from three directions of rectangular coordinate system, respectively.

• The higher order ambisonic decoding function of the more complex

At present the top five order Ambisonic functions, 36 Components (Gao Jieqiu physical orientation for harmonic function is hard to think about, so substitute Components), with 32 microphones capture, through the portfolio and matrix transform to approximate these 36 Components. Ambisonic has the advantage, with the z (vertical) direction, up or down in the VR world voices are differentiated, and as the promotion of your computing power using higher order Ambisonic function can obtain better results. Disadvantage is that decoding is more complicated.

Lift a simple of example, if you of upper and lower before and after around six a direction respectively corresponds to singer ABCDEF six a people, Quad Binaural of four a double channel in respectively will irrigation into CDEF of voice, each a and while contains part of A,B of voice, acoustic recording finished Hou, if you head from just forward, slightly reverse 30 degrees, so you will from initially c accounted for led, to heard more of d of voice, but regardless of you if looked up or bow, A,B of voice are not change.

And first order Ambisonic is in x-mix C,E voice, y with D,F sound, z with A,B voices, w for background noise, as your head turning, change weight of XYZ. However clever you found, the first order Ambisonic (FOA) because the sound coming from the opposite direction (C,D) and mixed in with the x, so by the time you turn, at some point of the listener on direction sensitivity is better than Quad Binaural.

Consider to: (1) singer in you hip Xia singing of situation compared rare, so only a, of situation Xia, vertical direction FOA also is far is better than Quad Binaural, (2) If you care abroad has making out of concert video, will found making party to improve voice of azimuth sense, often put sound source put in are Qian or are around direction, this is FOA and QB are to solution of problem. However, with the increase of computing power, higher order Ambisonic (HOA) can overcome this problem, this is the reason why Google chose to FOA.

Google Omnitone

Back to Google, announced on its official blog page Omnitone VR audio system the technical details of the project. Omnitone project is an open source spatial audio rendering cross-browser support, and major support now more commonly used in the industry of the FOA (First Order Ambisonic) format, which is the main Quan Jingsheng YouTube App recommended format.

▲ Omnitone audio processing diagram

Can be seen from the figure, Omnitone system Google Ambisonic decoder algorithm uses industry-leading processes, by the sensor’s orientation, rotation a rotation operator is used to implement sound field, then two-channel output.

Given the dominance of Google Chrome browser in the market, we can be very optimistic forecasts FOA Quan Jingsheng will compare in Web-side high-speed penetration, the VR content creators throughout the VR industry is a huge plus.

Of course, the excitement we can still see a lot of problems.

Existing problems Lunatik iWatch cover

One problem is that FOA sound files come from. VR content producers with more famous VR Jaunt use of Tetra Mic (CoreSound production) is expensive, just the microphone itself is $ 1000, plus additional lines and dedicated to buy recording equipment. While other brands of sound field microphone (such as SoundField series) are even more expensive. These recording devices are very convenient to carry, and require professional training to use, restrict the mass adoption of FOA audio.

▲ Tetra Mic

Now companies are trying to address this situation. Expand the spirit of the times launched a commercial portable field recorder, native support for FOA (4 channel WXYZ) audio format. Output of audio software or device that can directly support the FOA played. Believe that with the decrease of production thresholds, based on the contents of the FOA will usher in an explosive growth.

▲ Expand the spirit of the times portable sound recorder

In addition, because both Quad Binaural and Ambisonic is just an idea, how to connect real-world source files and final play 3D sound, how to make 3D sound field with this line of thinking, making engine and that we still need a VR audio playback system. (Just like Google release Hadoop papers, large companies use this theory to guide them in developing their own algorithms. ) In this connection, extension spirit walk in the forefront of its sound engine, not only cross-platform support for multiple Video, game engines, conversions from underlying support for the Quad between Binaural and Ambisonic, VR can be said to be connection shortcut to content producers with 3D sound field.

Lei feng’s network authorized by the author in this paper published, without permission prohibited reprint!

iPhone Selfie Stick

Google releases Omnitone what changed for VR audio production

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply