A few months ago, I received an email from a friend requesting some technical help. The following text is a copy of the email he sent me (names have been excluded to protect the innocent, LOL)
“I have a command line tool installed through Homebrew on my laptop running High Sierra. The command is just ccextractor <filepath>
and it runs fine in a standard bash terminal. I was hoping to use Automator to be able to run it on batches of files, but I'm struggling with the syntax for the Run Shell script command. It just keeps saying ccextractor command not found. Also, the command line tool can only process one file at a time, so I guess I need some way to loop the request so it can process more than one file.”
My friend, like me, is a movie aficionado with an extensive collection of movies, many of which are foreign titles with subtitles. When copying files to systems like Plex, you need this subtitle information so you can see it when you watch the films. This is where CCExtractor comes in. CCExtractor (https://www.ccextractor.org/) is an application used to extract closed captions from video files.
The problem was that my friend couldn't figure out how to use Automator (a Mac tool) to run this command on a directory of files. An attempt was also made to use Bash with no luck.
I told him that I could probably whip something up in Python, if that would work. “Are you sure that's not too much trouble?” my friend asked. “Nah, it should be pretty simple to whip up,” I replied.
Here's what I did.
- I navigated to the
https://www.ccextractor.org/
site and downloaded the binaries and some 3.x GB sample files to my drive. - I opened my trusty text editor (https://www.sublimetext.com/) and created a new Python program.
- After a bit of Googling, I came up with this set of code:
import os
import subprocess
directory_to_import = 'D:/Data/clients/RodPaddock/CCExtractor/'
extractor_exe_path = 'D:/Data/clients/RodPaddock/CCExtractor/ccextract orwin'
for file in os.listdir(directory_to_import):
if file.endswith(".mpg"):
print(os.path.join(directory_to_import, file))
subprocess.run([extractor_exe_path,
os.path.join(directory_to_import, file)])
This code was built, debugged, and run on my Windows development box. The goal was to get it working as fast as possible on my main development box before moving it onto a Mac.
Here's a link to the Gist of the code: https://gist.github.com/rjpaddock/d53956767dd4a1fe267dee08c995c956.js.
Getting the code to run on the Mac was simple. Here's the Mac version:
import os
import subprocess
directory_to_import = '/Users/rodpaddock/ccextractor'
extractor_exe_path = 'ccextractor'
for file in os.listdir(directory_to_import):
if file.endswith(".mpg"):
print(os.path.join(directory_to_import, file))
subprocess.run([extractor_exe_path,
os.path.join(directory_to_import, file)])
As you can see, the changes were minimal, at best. I changed the path to my user directory on the Mac and got rid of the specific path to the executable. I used Homebew to install the CCExtractor on my Mac so it was in the PATH already. After installing a version of Python on my Mac, I was able to run the application as-is. No operating system-specific issues. After getting my program to work, I sent it to my friend, who simply changed the path to the files he wished to decode, and BOOM. It just worked.
Running on Windows
After marveling at how much could be accomplished with so few lines of code, I became curious to see how complex it would be to build the same application in C#. I'm using .NET Core to do this, as I want to run it cross-platform, as well. The code in Listing 1 represents the same functionality in C#.
Listing 1: The .NET Core version of my app
using System;
using System.Diagnostics;
using System.IO;
namespace ExtractorRunner
{
class Program
{
static void Main(string[] args)
{
var directory_to_import = "D:/Data/clients/
RodPaddock/CCExtractor/";
var extractor_exe_path = "D:/Data/clients/
RodPaddock/CCExtractor/ccextractorwin";
foreach (var fileName in Directory.GetFiles(
directory_to_import,"*.mpg"))
{
Console.WriteLine(fileName);
var process = new Process()
{
StartInfo = new ProcessStartInfo
{
FileName = $"{extractor_exe_path}",
Arguments = $"{fileName}",
UseShellExecute = true,
}
};
process.Start();
}
}
}
}
I'd say this wasn't too bad. Building the same application was pretty simple as a C# console application. Here's a Gist to the C# code: https://gist.github.com/rjpaddock/be601db3995082949071121d8aa992d7.
With a minimal set of code, I thought it would be fun to explore making it a bit more robust. Here's the set of features I planned to add:
- Accept an extension parameter. The original code had the extension hard-coded.
- Accept a path to the files I wished to decode.
- Accept the path to the executable as a parameter.
- Parameters should be named vs. positional, if possible.
- Run this code on Windows, Mac, and Linux.
I started with the Python program and the first feature on the list, specifying the extension as a parameter. My initial choice was to process mpg
files as the default extension. My friend immediately changed it to mp4
. With this knowledge, I realized that this would be the first thing to parameterize.
There are multiple ways this could be implemented. One way we could hack this together would be to use Python's sys.argv[]
array, which provides positional arguments to Python programs. For instance, let's say you called the program with the following statement:
python copy run_cc.py .mp4
Then you could access the .mp4
with sys.arg[0]
. Although this works, it'll cause problems in the long haul if you add or remove parameters. It's also not very intuitive. It would be better to call the program with a named parameter. For example:
run_cc.py --extension .mp4
Luckily for us, Python has a built-in library to do this exact thing. This library is known as argparse
. To implement the first option, you need to do the following:
- Add an import argparse to the imports section of the program.
- Create an argument parser object and add an argument to it. Your code will look like this:
parser = argparse.ArgumentParser ()
parser.add_argument("--extension",
help="Extension of files to convert", default='.mpg')
args = parser.parse_args()
There's a lot going on with just these few lines of code. What this set of code does is:
- Creates an argument parser.
- Adds a parameter called –extension to the command line.
This parameter will be added the args
array as a property with the name extension
. Finally, this code specifies a help description and a default parameter value. The program code now looks like Listing 2.
Listing 2: The new code
import os
import subprocess
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--extension",
help="Extension of files to convert", default='.mpg')
args = parser.parse_args()
directory_to_import ='D:/Data/clients/RodPaddock/CCExtractor/'
extractor_exe_path = 'D:/Data/clients/RodPaddock/CCExtractor/ccextractorwin'
for file in os.listdir(directory_to_import):
if file.endswith(args.extension):
print(os.path.join(directory_to_import, file))
subprocess.run([extractor_exe_path,
os.path.join(directory_to_import, file)])
The next step is to add a parameter to specify the directory you wish to read files from. Call the parameter --directory
and default it to (.) the current working directory. A sample call would be as follows:
python run_cc.py
--extension .mp4
--directory "D:/Data/clients/RodPaddock/CCExtractor/"
Your Python code now looks like Listing 3.
Listing 3: The Python code
import os
import subprocess
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--extension",
help="Extension of files to convert", default='.mpg')
parser.add_argument("--directory", help="Directory to process", default='.')
args = parser.parse_args()
extractor_exe_path = 'D:/Data/clients/RodPaddock/CCExtractor/ccextractorwin'
for file in os.listdir(args.directory):
if file.endswith(args.extension):
print(os.path.join(args.directory, file))
subprocess.run([extractor_exe_path, os.path.join(args.directory, file)])
Finally, let's get rid of the EXE path. I'm going to cheat a bit on this one. I'm simply going to add the directory where the CCExtractor application is located to my system's PATH statement. This will take care of that issue much like Homebrew did on the Mac.
To change your PATH statement in Windows, open the Environmental Variables from the Windows Start menu. find PATH in the System variables and add the path to wherever you extracted the ccextractor application. Figure 1 demonstrates how this should look.
Now the final Python program looks like Listing 4.
Listing 4: The final Python program
import os
import subprocess
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--extension",
help="Extension of files to convert",
default='.mpg')
parser.add_argument("--directory",
help="Directory to process",
default='.')
args = parser.parse_args()
# should be added to the system PATH statement
extractor_name = 'ccextractorwin'
for file in os.listdir(args.directory):
if file.endswith(args.extension):
print(os.path.join(args.directory, file))
subprocess.run([extractor_name, os.path.join(args.directory, file)])
The next step is to implement the same functionality in the C# application. Python has an argument parser built into its native libraries, but the .NET platform doesn't. Not to fear, there's a third-party library that you can install to add this needed functionality. This library is called CommandLineParser
and can be installed via a NuGet package. You can install this library via the NuGet console by issuing the following command:
Install-Package CommandLineParser -Version 2.8.0
Once you've installed this library, you need to build a class that will hold your parsed command line parameters. This class will be augmented with Attributes
provided by the command line parser. The first parameter to add is the dynamic extension. To do this, add the following class code to the program:
public class Options {
[ Option(longName:"extension",
HelpText = "Extension of files to convert",
Default = ".mpg")]
public string Extension { get; set; } = "";
}
This code has a string property called Extension
. When you pass in the -extension
parameter to your application, it's stored on this property. The more interesting aspect of this class is the [Option]
attribute.
[Option(longName:"extension",
HelpText = "Extension of files to convert",
Default = ".mpg")]
The longName
property tells the CommandLineParser
library to parse an argument with the name --extension
onto the Extension parameter. The HelpText
and Default
properties are self-explanatory.
Now that you've created this class, you can call the command line parser to populate your arguments onto an instance of the Options
class. This code demonstrates how to do this:
var parsed=Parser.Default.ParseArguments<Options>(args);
var options=((Parsed<Options>) parsed).Value;
This code takes the args
collection passed to your application, parses them, and returns a parsed
object. After parsing the argument collection, you need to cast the Value
property of the parsed
object into an instance that you can use in your programming code. Your processing logic now looks like Listing 5.
Listing 5: The new processing logic
var directory_to_import = "D:/Data/clients/RodPaddock/CCExtractor/";
var extractor_exe_path = "D:/Data/clients/RodPaddock/CCExtractor/ccextractorwin";
foreach (var fileName in Directory.GetFiles(options.Directory,
$"*{options.Extension}"))
{
Console.WriteLine(fileName);
var process = new Process()
{
StartInfo = new ProcessStartInfo
{
FileName = $"{extractor_exe_path}",
Arguments = $"{fileName}",
UseShellExecute = true,
}
};
process.Start();
}
Notice that the GetFiles()
function now uses the Extension
property of your Options
class.
The next step is to add the directory to your Options
class. To do this, simply add another property to your class with the appropriate name and options. Your class code will now look like this:
public class Options
{
[Option(longName:"extension",
HelpText = "Extension of files to convert",
Default = ".mpg")]
public string Extension { get; set; } = "";
[Option(longName: "directory",
HelpText = "Directory to process",
Default = ".")]
public string Directory { get; set; } = ".";
}
Notice that the DefaultValue
property is a single period (.). This tells the GET
files routine to simply process the current directory.
Now you can incorporate your new Directory option into your application code. Listing 6 is what the final version will look like.
Listing 6: The final version of the new Directory option
using System;
using System.Diagnostics;
using System.IO;
using CommandLine;
namespace ExtractorRunner
{
class Program
{
static void Main(string[] args)
{
var parsed = Parser.Default.ParseArguments<Options>(args);
var options = ((Parsed<Options>) parsed).Value;
var extractor_exe_path = "ccextractorwin";
foreach (var fileName in Directory.GetFiles(options.Directory,
$"*{options.Extension}"))
{
Console.WriteLine(fileName);
var process = new Process()
{
StartInfo = new ProcessStartInfo
{
FileName = $"{extractor_exe_path}",
Arguments = $"{fileName}",
UseShellExecute = true,
}
};
process.Start();
}
}
public class Options
{
[Option(longName:"extension",
HelpText = "Extension of files to convert",
Default = ".mpg"
)]
public string Extension { get; set; } = "";
[Option(longName: "directory",
HelpText = "Directory to process",
Default = ".")]
public string Directory { get; set; } = ".";
}
}
}
One item of note is that the path to the EXE is just the name of the application. This is because in the last post, I decided to add the Ccexteactorwin.exe
file to the system PATH via the System Environment variables screen.
You can now run your code from Visual Studio. When testing your code, you can call your application with arguments by opening your Project Properties window, selecting the Debug section, and specifying the command line parameters in the Arguments
section. Figure 2 shows that.
The running program will now spawn a new process that looks like Figure 3.
At this point, we have a pair of programs written in Python and C#. These programs are used to run the CCextractor program with extension and path parameters. The next step in the evolution is to run the code on other platforms, namely macOS and Linux. I'll demonstrate running code on both of those platforms.
Running on macOS
Before you start working on the code, you'll need to get your Mac set up to install the CCExtractor application and Python 3 code.
Installing the extractor is simple and is done via the Homebrew infrastructure used by Mac developers. To install the CCExtractor, do the following:
- Install Homebrew if it isn't already installed. Run this script (copied from
https://brew.sh/
) from a terminal window.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com
/Homebrew/install/master/install.sh)"
- Install the CCExtractor program by issuing the following command:
brew install CCExtractor
- Test it by typing CCExtractor from the terminal window. You should see a screen of help information.
- Now ensure that Python3 is installed. From a terminal window type: python3
If Python 3 is installed, you'll see the Python's interactive window. If not, you may be prompted to install the Command Line tools for OSX. If so, run that installer. If the Command Line Tools installer doesn't run, directions for installing Python 3 can be found here: https://docs.python-guide.org/starting/install3/osx/.
Now it's time to test the code. Clone this repo:
https://github.com/rjpaddock/ExtractorRunner.git
From a terminal window, change into the folder where you cloned that repo and run the following command:
python3 run_cc.py
--extension mpg --directory
[[INSERT YOUR DIRECTORY HERE]]
You'll be presented with the error information in Listing 7.
Listing 7: The error information
Traceback (most recent call last):
File "run_cc.py", line 15, in <module>
subprocess.run([extractor_name,
os.path.join(args.directory, file)])
File "/Library/Developer/CommandLineTools/
Library/Frameworks/Python3.framework/Versions/3.8/
lib/python3.8/subprocess.py",
line 489, in run with Popen(*popenargs, **kwargs)
as process:
File "/Library/Developer/CommandLineTools
/Library/Frameworks/Python3.framework/Versions/3.8/
lib/python3.8/subprocess.py",
line 854, in __init__
self._execute_child(args, executable,
preexec_fn, close_fds,
File "/Library/Developer
/CommandLineTools/Library
/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/subprocess.py",
line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2]
No such file or directory: 'ccextractorwin'
This error is because the name of the CCExtractor application is different in the Windows environment. Check out the last line. What's the fix for this?
To fix this, you need to call a different executable based on the operating system. Luckily for us, Python has a built-in library for just such a thing. To check which platform your code is running on, import the platform library at the top of your python file:
import platform
Next, add the following code to your script:
extractorname = ''
if platform.system() == 'Windows': extractor_name = 'ccextractorwin'
elif platform.system() == 'Darwin': extractor_name = 'ccextractor'
elif platform.system() == "Linux": extractor_name = 'ccextractor'
Now run the application. Your script should start processing files with no error. NOTE: The code in the Repository already has this change applied. You're welcome.
The next step is to get the C# code up and running on the Mac. This process was much easier than I anticipated, as Microsoft has created a Mac version of Visual Studio. The first step is to install Visual Studio Mac from the Microsoft website: https://visualstudio.microsoft.com/vs/mac/.
When installing the application, make sure to install it with the .NET Core option selected, as shown in Figure 4.
Once the installer completes, open the ExtractorRunner solution from the folder you pulled code into. Open the options dialog for the project and set the command line parameters you've been using to test, as shown in Figure 5.
Run your code now. You'll now see an error in the console window of your application, like that in Figure 6.
This is very similar to the Python error and requires the same solution. .NET Core also included a set or libraries to determine your operating system. Add the following snippet to the top of your program:
using System.Runtime.InteropServices;
Now add the following block of code to your C# program:
var extractor_exe_path = "";
if (RuntimeInformation.IsOSPlatform(OSPlatform.Windo ws))
{
extractor_exe_path = "ccextractorwin";
}
else if (RuntimeInformation.IsOSPlatform(OSPlatform.OSX))
{
extractor_exe_path = "ccextractorwin";
}
else if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux ))
{
extractor_exe_path = "ccextractor";
}
Run your code and you should see proper output in the runner window.
Now you have the same command line functionality for both the Python and C# versions of this program and can run the code on the Mac and Windows. Let's take a look at the process of running thus under Ubuntu.
Running on Linux (Ubuntu)
Before modifying the runner programs, you need to install the CCExtractor application on your Linux server. Directions for installing the CCExtractor on Linux can be found here: https://github.com/CCExtractor/ccextractor/wiki/Installation.
Basically, you pull the code from GitHub, and run the typical process of building applications in the Linux world - i.e., MAKE the application. Luckily for me, the code “just compiled” using the instructions provided. Once built, I had to make one simple change to the script and was able to execute the runner application. The branch of code to determine the proper program to run looks like this:
if platform.system() == 'Windows' : extractor_name = 'ccextractorwin'
elif platform.system() == 'Darwin': extractor_name = 'ccextractor'
elif platform.system() == "Linux": extractor_name =
'/home/azureuser/data/projects/ccextractor/linux/ ccextractor'
Now I was able to run the code using the same command line options I used on the Mac.
python3 run_cc.py --extension mpg
--directory /home/azureuser/data/sampledata/
Now that the Python code is up and running you can turn your sites onto running the C# code. To do this, you need to first install the .NET Core SDK on your Ubuntu instance. This is done by following the directions from this page: https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu.
If you're running a different flavor of Linux, you can find directions on this page: https://docs.microsoft.com/en-us/dotnet/core/install/linux.
Once you have the SDK installed, change into the folder where you cloned the GitHub repository and run the following command:
dotnet build
This builds an executable file and puts it in a sub-folder (off the root of your code) in this location /bin/Debug/netcoreapp3.1
. There's one more step. Before you can run the code, you need to change your program.cs
file to use the following executable selection code:
var extractor_exe_path = "";
if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
{
extractor_exe_path = "ccextractorwin";
}
else if (RuntimeInformation.IsOSPlatform(OSPlatform.OSX))
{
extractor_exe_path = "ccextractor";
}
else if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux ))
{
extractor_exe_path =
"/home/azureuser/data/projects/ccextractor/linux/ ccextractor";
}
Run the dotnet build command again, change into that folder, and run the following command:
./ExtractorRunner --extension mpg
--directory /home/azureuser/data/sampledata/
Figure 7 shows the runner application running on Ubuntu.
End Notes
This is how you create a totally cross-platform application in Python and C#. I was pleasantly surprised at how simple it was to build and run the C# code on Mac and Linux, which is a testament to the work that the Microsoft team has done over the last few years.