iOS Divide audio from URL into frames

I am working on a simple internet radio app in iOS with a very simple speech/music discrimination. The main idea is a radio which plays a signal from url and at the same time it checks what type of signal is being broadcast. When it detects a speech it change the channel and so on.

I wrote a simple iOS app using storyboards and AVFoundation for Player. I have a problem with implementation of a speech detection. I wrote a Matlab code for an algorithm, but I'm not sure how to do it in Xcode.

clear all
close all
[s, fs] = audioread('nagranie.wav');
length = length(s)/fs;
lengthofframe20ms = 0.2*fs;
numberofframes20ms = round(length(s)/lengthofframe20ms);
s1 = zeros(lengthofframe20ms*numberofframes20ms,1);
for i=1:1:length(s(:,1))
s1(i,1)=s(i,1);
end
frame20ms=zeros(numberofframes20ms,lengthofframe20ms);
for i=1:1:numberofframes20ms
for j=1:1:lengthofframe20ms
frame20ms(i,j)=s1(j+3200*(i-1),1);
end
end
lengthofframe260ms = 2.6*fs;
numberofframes260ms = round(length(s)/lengthofframe260ms);
s2 = zeros(lengthofframe260ms*numberofframes260ms,1);
for i=1:1:length(s(:,1))
s2(i,1)=s(i,1);
end
frame260ms=zeros(numberofframes260ms,lengthofframe260ms);
for i=1:1:numberofframes260ms
for j=1:1:lengthofframe20ms
frame260ms(i,j)=s1(j+41600*(i-1),1);
end
end
En = zeros(numberofframes20ms,1);
for i=1:1:numberofframes20ms
L=length(frame20ms(i,:));
En(i)=(norm(frame20ms(i,:))^2)/L;
end
Ek = zeros(numberofframes260ms,1);
for i=1:1:numberofframes260ms
L=length(frame260ms(i,:));
Ek(i)=(norm(frame260ms(i,:))^2)/L;
end
sumN = 0;
for i=1:1:length(En)
sumN=sumN+En(i);
end
sumK = 0;
for i=1:1:length(Ek)
sumK=sumK+Ek(i);
end
EnP = zeros(numberofframes20ms,1);
for i=1:1:numberofframes20ms
EnP(i)=((En(i))/sumK);
end
treshold = 0.5;
lambda=treshold*sumN;

M=numberofframes20ms/numberofframes260ms;
coff=zeros(numberofframes20ms,1);
for i=1:1:numberofframes20ms
if (En(i)<lambda)
for k=1:1:numberofframes260ms
if (((k-1)*M+1)<i) && (i<k*M)
coff(i)=1;
end
end
end
end

As you can see first of all we have to divide signal into 20ms frame and bigger 260 ms frames, then we calculate energy of a every 20ms frame, do some more math and at the last point we check the conditions, when it fits it's speech and when it doesn't frame is sorted as music.

I don't know how to start doing the discrimination part. Which frameworks should I use? I think it can't be really hard, becouse it took me like 20 minutes to write it in MatLab. :)

This is how my app play radio stations:

{
RadioInfo *sharedRadio = [RadioInfo sharedRadio];
NSString *program = [NSString stringWithFormat:@"%@",sharedRadio.list[value]];
NSURL *url = [NSURL URLWithString:program];
AVPlayerItem *playerItem = [AVPlayerItem playerItemWithURL:url];
self.playerItem = [AVPlayerItem playerItemWithURL:url];
self.player = [AVPlayer playerWithPlayerItem:playerItem];
self.player = [AVPlayer playerWithURL:url];
[self.player pause];
[self.player play];

}

This is my first post here so please be kind. I will appreciate and help. I'm stuck on this part.

If you want to get audio data and play it you can use lower level API than AvPlayer. AVPlayer is too high-level it doesn't give you access to audio data. You can retrieve your audio over HTTP and play it with AvAudioPLayer framework after analysis. But of course you have to implement many things yourself.

  NSURL *url = [NSURL URLWithString:@"http://devimages.apple.com/iphone/samples/bipbop/bipbopall.m3u8"];
  NSData *soundData = [NSData dataWithContentsOfURL:url];

  // analyze sound data here and switch URL if needed

  audioPlayer = [[AVAudioPlayer alloc] initWithData:soundData  error:NULL];
  audioPlayer.delegate = self;
  [audioPlayer play];