Archive: 2017

Elastic movie encoding farm backed by AWS EC2 SpotFleet

Background: How do you keep your recordings ?

Recording TV into HDD is common use today. In addition to tech-savvies, many non-professional people always uses hdd recording. In Japan, almost all TV programs provided with MPEG2-TS. With Full HD qualiy, its bandwidth is going to be 12-20 Mbps. It seats roughly 20GB disk space when you are going to save 2 hours movie or sports program. As a result, many people suffer from a lot of occupied space in home’s hdd space. Including me. Some broadcast company may deny, but it is definitely true that stored movie is the part of his memory. He wants to keep them as long as possible. Saving them to HDD or cloud: seems straight forward, but not works.

Most easy and straight forward solution is move your recordings into another bulk HDD or cloud storage. But as a result of a few years activity, data size of your recordings may grow 10-50 TB. Many people are going to offload their recordings into monthly fixed rate cloud storage, so customer who do not pay much consumes a large part of storage capacity. It greatly exceeds cloud vendor’s forecast. I’ve heard that some user ware going to save 70TB data into cloud storage. Today, almost all cloud vendor declines monthly fixed rate cloud backup solutions and offers only “pay as you go” plans. Storing your recordings into bulk hdd is more reasonable from viewpoint of cost. But your huge recordings wastes your time so much. When you are going to move some “old” recordings into bulk hdd, it consomes a lot of time (It sometimes takes several days). When you’ve completed to move your old recordings, you may start to think about how to keep your backup healthy and can be read everytime. You may take a few hdd up into RAID in order to avoid data lost in case of HDD failure. As a result, you have to run several hdd 24/365 in addition to main disks. These disks looks ugly, makes some noise in your home, and make your home some geeky(Your friends may put off from you !).

Motivation: Heuristic Compression

Movie compression is one of hot topic among tech domain. Recentry, they have released cutting-edge H.265 compression codec it archivess rouchly 90% compression rate compaired with MPEG2-TS. If you have 20TB movie archive, it gonna be 2TB ! Disadvantage about H.265 is extraordinary long compression time. When you’re going to compress 2h movie, it will take about 50-80 hours. Fortunately, almost all movie compression task can be separated and run in parallel and can takes your compression time shorter. So here comes a need for multicore and multinode processing environment. As you know, heuristic compression is quite efficient, but it requires a lot computing resources. Its hard to estimate how much computing resources are required to do it done, but I think that you can make sense about you need for elastic computing cluster to do that. (My estimation is written below of this article.)

Torque and AWS EC2 Spot Fleet

Adaptive computing is long providing TORQUE batch scheduler which make each single computer up as cluster computer, and provides user aggregated computing resources. You can find TORQUE can do for at http://www.adaptivecomputing.com/products/open-source/torque/ . AWS EC2 Spot Fleet is capable for taking several EC2 Spot Instance into elastic cluster. You may know that EC2 Spot Instance is provided as “spare” AWS’s computing capacity, so it is provided at discounted pricing. You can use relatively higher computing resource with reasonable cost. You can see about AWS EC2 Spot Fleet here http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html . I gonna mix them up.

The picture

Here is a picture I’ve compiled. Upper side show AWS. It consists from “API server” and “Encoding instances”. API Server acts as administrator of encoding servers and delivers certificate for each encoding instance in order to make VPN connection into my home. Lower side shows my home. TORQUE server and the data source is here. Nas provides source file to each encoding instance via VPN and receives compressed movie.

REST is modern RPC

When you’re going to take several instances properly configured, you have to communicate with each servers with proper manner, but your time is limited to develop actual code. You cannot choose complicated framework to do so. I believe that you want to make it easy about framework. Today, I think that REST API is most easy solution to communicate among each servers. REST uses http to communicate and you can call API with curl command. It’s super easy. On the server side, I use Flask( http://flask.pocoo.org/ ) to build API server. Flask is python library which can make api server so easy. You can write just 5 lines code and can work as http api server. I write two types of api server. 2. VPN certificate administrator I uses OpenVPN as VPS software. OpenVPN requires that each client should have unique certificate to make unique connection between server. So I have to manage which certificate is paid to instance, and which certificate has returned along with instance termination. 4. TORQUE computing node registrer TORQUE only has CLI interface to register/unregister incoming computing node for now. I’ve tweaked some REST api which receives http message from encoding instance and translate into CLI command to TORQUE server. I’ve successfully glued them up, and finally EC2 Spot instances starts working as a part of existing my home’s TORQUE cluster ! I’ve taken about 50 man hours at this point.

First run: take base info for estimation

I select AWS Ohio region as a place to expand spot instances. The reason is that Ohio region offers most cheap price about 4-core instance. AWS almost alway offers more cheap price against “previous generation” instance. At the first, I’ve played at North Virginia region, but it doesn’t offers previous generation 4-core instances and the price has frequentry changed. So I understand that N.Virginia region is so crowded and I cannot run computing jobs over 24 hours at there. Ohio region offers stable previous generation 4-core instances with cheap price and bidding price is more stable than N.Virginia. So I decide to play around at Ohio region. I decided to pay $0.03/hour per instance. Then, I’ve configured Spot Fleet and launched just one instance. I want to measure how long my computing task takes in order to estimate my budget to complete mission. Here is my first job execution time detail. This is the result of encoding roughly 2 hours movie.

  • Input Data Transfer: 18GB, 9 hours
  • Compression: roughly 30 hours
  • Output Data Transfer: up to 2.5GB, 1 hour

And Here is billing snapshot from AWS. * Data Transfer: $0.46 + $0.000 per GB - data transfer in per month: $0.00(45.294GB) + $0.000 per GB - first 1 GB of data transferred out per month: $0.00(1GB) + $0.090 per GB - first 10 TB / month data transfer out beyond the global free tier: $0.45(5.048GB)

  • Elastic Computing Cloud: $2.19

    • $0.0116 per On Demand Linux t2.micro Instance Hour: $0.58(50.254hrs)
    • c4.xlarge Linux/UNIX Spot Instance-hour in US East (Ohio): $1.19(45hrs)
    • EBS: $0.05 per 1 million I/O requests: $0.04(860,000IOs)
    • EBS: $0.05 per GB-month of Magnetic provisioned storage: $0.33(6.5GB-Mo)
  • Total: $2.68

Cost Estimation #1

So I can say that AWS costs about $3.00 per 2 hours encodings. Next, I gonna examine how many movies I have to compress. Along with my rough examination, I realised that I have roughly 1500 movies which have about 840 hours. So I can calcurate like this.

  • Total encoding time: 840 x (40(hrs) /2) = 16800 (hrs)

When I going to encode all of movies in reasonable time window, I should prepare like this.

  • Case1) 10 servers: 16800(hrs) / 10(svrs) = 1680(hrs) = 70 days
  • Case2) 20 servers: 16800(hrs) / 20(svrs) = 840(hrs) = 35 days

Hmm… 35 days encoding time(Case2) looks nice from my viewpoint. How much does it costs ? like this.

  • Case1): (70(days) * 24(hrs)) * ($3.00 / 2(hrs)) * 10(svr) = $25200.0
  • Case2): (35(days) * 24(hrs)) * ($3.00 / 2(hrs)) * 20(svr) = $25200.0

Damn. This cost is too huge to take for me.

Cost Estimations #2

Previous estimation is based on that encoding server have 4 vCPU(core). When I increased the number of vCPUs, I’ll not need a lot of servers. Ohaio Region also offers i2.8xlarge instance which have 32 vCPU and it can take 8 server’s task in 1 server. The price of i2.8xlarge is $0.83 per hour. Then, how much does it change costs ?

  • Encoding cost: 0.83 x 45(hrs) = $37.35 / 8(parallel) = $4.66 /cost per 2hours movie

Unfortunatelly, large size instance seems not help me so much.

Consequence

This trial shows possibilities for distributed computing cluster will help your piled-up recorded movies which sits in your storage. Unfortunately, heuristic compression on public cloud environment costs so much and your budget will not meet. I’ll try different approach and write about later. Stay tuned !

My Homebrewed IPTV transmitter(gen.4)

Hi, there. Long time no see. I’ve spending super-hard time to work, unfortunatelly. Recently, I’m so lucky to have some private time, so decide to write some code to transmit TV to Kodi.

history #1(Gen.1 transmitter,around 2000-2005)

Its so long time from I had disconnected terrestrial tv cable from my display and connect into IPTV box. My first experience is Sony’s “Location Free TV( https://en.wikipedia.org/wiki/LocationFree_Player )”. It’s proprietary product and limited in order to view programs via special software (and, it worked only on windows of cource !). I remember that my father had worked in Korea and I had told him to use this and he pleased to view Japanese TV in Korea. It’s one of my happy memory. At that time(and even now !), Japanese TV company was so much nervous about people transcode TV program into IP datagram and send to other location outside of his home. There was some reason to do such thing. There was a lot of tech-native and rogue young guys rip a lot of TV programs and movies and shared through P2P softwares(such as Winny, Share, and so on). That was difinitely a challenge to existing authorities, and people were afraid of their bothered behavior. Authorities tried to limit their challenge using every means, and finally nuked and wiped them practically. IPTV was also disappeared from the market in a form that involved collateral.

history #2(Gen.2, 2010-2014)

I had so tired about japanese closed product so much and switched almost all equipment to linux box including my home TV. Japanese homebrew hardware vendor Earthsoft( https://earthsoft.jp/ ) released PT1 which can receive and decode terrestrial and can be save as raw binary file into computer. It was landsliding phenomenon among tech savvies in Japan and tried to hack aboud. Currently japanese terrestrial broadcast is encrypted with MULTI2 protocol( https://ja.wikipedia.org/wiki/MULTI2 ) and cannot be decryted in simple way, but there is some decrypting software made by volunteer hacker with legal decrypting key(B-CAS card. This is sold with legal device.). (Note: There is a lot of discussion to decript terrestrial with legal key and homebrewed software) I won’t write detail, but purchased PT3(3rd gen Earthsoft’s terrestrial receiver) and connect them up into a linux box. I’ve sent decoded stream to gstreamer and forward to other linux box. I’ve finally succeed to view live TV at outside in adition inside of my home ! I’ve written transmitter and receiver using perl and it worked so good. Disclaimer: All of my terrestrial stream is decrypted using legal key and legal equipment.

history #3(Gen.3, 2014-2017)

Gstreamer is complete suite about media handling, but have some nervousness to treat transport stream. So gstreamer missed to catch media stream from trancoded terrestrial stream which includes some other information other than media stream itself. So, I’ve switched gstreamer into ffmpeg. I’ve modified my transmitter and receiver in order to fit ffmpeg. Ffmpeg did its job nicely. I’ve added additonal feature ffmpeg can transcode transpote stream into H.264 in case of narrow bandwith connection. This feature was so nice when I went business trip and want to view my subscribed TV channel.

Now(Gen.4, 2017-)

I’ve switched my linux box into Raspberry Pi 3 and Kodi. Kodi is sweat software to handle personal media content. Kodi also has smart controller works on android smartphone. Kodi also has feature to handle realtime TV stream and network video stream service such as YouTube in addition to saved media content. I’ve rewritten again my transmitter Kodi can receive stream in proper way. Kodi can receive realtime TV as HTTP live streaming(HLS) and I’ve hacked my transmitter to work as HLS server. My transmitter can reply my subscribed channels and realtime media stream along with HLS manner. Source code is here( https://github.com/mkiuchi/epgrec-kodi-backend ). Now, I can see my tv in my home and outside, I can have my smart remote controller, and low-powered receiver !

Conclusion

IPTV for terrestrial is niche. Almost all people view authorized network tv such as Hulu, NetFlix, dTV, and so on. Additinaly, people spent their times in SNSs and tons of CGMs. So demand for transport terrestrial into IPTV gonna be still niche, I think. But I’m happy, for now.